Our group has done a *lot* of work with non-standard calling conventions in `R`

.

Our tools work hard to *eliminate* non-standard calling (as is the purpose of `wrapr::let()`

), or at least make it cleaner and more controllable (as is done in the wrapr dot pipe). And even so, we *still* get surprised by some of the side-effects and mal-consequences of the over-use of non-standard calling conventions in `R`

.

Please read on for a recent example.

Consider the following calls to `stats::lm()`

. And notice the third example fails (throws an error).

# works lm("y ~ x", data = data.frame( x=1:5, y = c(1, 1, 2, 2, 2)), weights = numeric(5)+1) #> #> Call: #> lm(formula = "y ~ x", data = data.frame(x = 1:5, y = c(1, 1, #> 2, 2, 2)), weights = numeric(5) + 1) #> #> Coefficients: #> (Intercept) x #> 0.7 0.3 # works f1 <- function(w = NULL) { lm(as.formula("y ~ x"), data = data.frame( x=1:5, y = c(1, 1, 2, 2, 2)), weights = w) } f1(numeric(5)+1) #> #> Call: #> lm(formula = as.formula("y ~ x"), data = data.frame(x = 1:5, #> y = c(1, 1, 2, 2, 2)), weights = w) #> #> Coefficients: #> (Intercept) x #> 0.7 0.3 # fails f2 <- function(w = NULL) { lm("y ~ x", data = data.frame( x=1:5, y = c(1, 1, 2, 2, 2)), weights = w) } f2(numeric(5)+1) #> Error in eval(extras, data, env): object 'w' not found

According the `stats::lm()`

documentation (`help(lm)`

) the first argument must be:

an object of class “formula” (or one that can be coerced to that class): a symbolic description of the model to be fitted. The details of model specification are given under ‘Details’.

A string appears to be coerce-able into a formula, so all three examples should work. However, typing “`print(lm)`

” reveals the issue: `stats::lm()`

doesn’t take the “`weights`

” argument in a standard way (as the value of a function parameter). It instead grabs it through a sequence of `match.call()`

and `eval()`

steps. It is a complicated way to get the value, which works until it does not work. Somehow passing in the formula as a string interferes with how the value of `weights`

is found. I think we can now see the benefits of isolation and independence of concerns in code.

This over-use of direct environment copying and manipulation is what leads to a great many data-leaks in `stats::lm()`

and `stats::glm()`

. This is in addition to their weird habit of keeping a copy of all of the training data (which loses quite a few of the merits of these methods). Our group dealt with these issues a long time ago, so we are somewhat familiar with `stats::lm()`

and `stats::glm()`

.

Of course, one could (as the `stats::lm()`

documentation mentions) call `stats::lm.fit()`

. However, `stats::lm.fit()`

does not seem to accept weights and its own documentation (`help(lm.fit)`

) starts ominously:

These are the basic computing engines called by lm used to fit linear models. These should usually not be used directly unless by experienced users.

Having just finished teaching a four day intensive course covering data science in Python, I can’t help but remark that users of `sklearn.linear_model.LinearRegression()`

don’t need to worry about issues such as the above. Some of the notational flair of `R`

comes at the cost of significant opportunities for user confusion.