Posts Tagged ‘ Tutorials ’

Use the LENGTH statement to pre-set the lengths of character variables in SAS – with a comparison to R

Use the LENGTH statement to pre-set the lengths of character variables in SAS – with a comparison to R

I often create character variables (i.e. variables with strings of text as their values) in SAS, and they sometimes don’t render as expected.  Here is an example involving the built-in data set SASHELP.CLASS. Here is the code: data c1;      set sashelp.class;      * define a new character variable to classify someone as tall or […]

Read more »

Tutorial: Using seplyr to Program Over dplyr

July 22, 2017
By

seplyr is an R package that makes it easy to program over dplyr 0.7.*. To illustrate this we will work an example. Suppose you had worked out a dplyr pipeline that performed an analysis you were interested in. For an example we could take something similar to one of the examples from the dplyr 0.7.0 … Continue reading Tutorial: Using seplyr to Program Over dplyr

Read more »

seplyr update

July 19, 2017
By

The development version of my new R package seplyr is performing in practical applications with dplyr 0.7.* much better than even I (the seplyr package author) expected. I think I have hit a very good set of trade-offs, and I have now spent significant time creating documentation and examples. I wish there had been such … Continue reading seplyr update

Read more »

dplyr 0.7 Made Simpler

July 15, 2017
By

I have been writing a lot (too much) on the R topics dplyr/rlang/tidyeval lately. The reason is: major changes were recently announced. If you are going to use dplyr well and correctly going forward you may need to understand some of the new issues (if you don’t use dplyr you can safely skip all of … Continue reading dplyr 0.7 Made Simpler

Read more »

Better Grouped Summaries in dplyr

July 12, 2017
By

For R dplyr users one of the promises of the new rlang/tidyeval system is an improved ability to program over dplyr itself. In particular to add new verbs that encapsulate previously compound steps into better self-documenting atomic steps. Let’s take a look at this capability. First let’s start dplyr. suppressPackageStartupMessages(library("dplyr")) packageVersion("dplyr") ## [1] '0.7.1.9000' A … Continue reading Better Grouped Summaries in dplyr

Read more »

In praise of syntactic sugar

July 7, 2017
By

There has been some talk of adding native pipe notation to R (for example here, here, and here). And even a tidyeval/rlang pipe here. I think a critical aspect of such an extension would be to treat such a notation as syntactic sugar and not insist such a pipe match magrittr semantics, or worse yet … Continue reading In praise of syntactic sugar

Read more »

Join Dependency Sorting

July 1, 2017
By
Join Dependency Sorting

In our latest installment of “R and big data” let’s again discuss the task of left joining many tables from a data warehouse using R and a system called "a join controller" (last discussed here). One of the great advantages to specifying complicated sequences of operations in data (rather than in code) is: it is … Continue reading Join Dependency Sorting

Read more »

wrapr Implementation Update

June 19, 2017
By
wrapr Implementation Update

Introduction The development version CRAN version of our R helper function wrapr::let() has switched from string-based substitution to abstract syntax tree based substitution (AST based substitution, or language based substitution). I am looking for some feedback from wrapr::let() users already doing substantial work with wrapr::let(). If you are already using wrapr::let() please test if the … Continue reading wrapr Implementation Update

Read more »

Non-Standard Evaluation and Function Composition in R

June 16, 2017
By

In this article we will discuss composing standard-evaluation interfaces (SE) and composing non-standard-evaluation interfaces (NSE) in R. In R the package tidyeval/rlang is a tool for building domain specific languages intended to allow easier composition of NSE interfaces. To use it you must know some of its structure and notation. Here are some details paraphrased … Continue reading Non-Standard Evaluation and Function Composition in R

Read more »

An easy way to accidentally inflate reported R-squared in linear regression models

June 15, 2017
By

Here is an absolutely horrible way to confuse yourself and get an inflated reported R-squared on a simple linear regression model in R. We have written about this before, but we found a new twist on the problem (interactions with categorical variable encoding) which we would like to call out here. First let’s set up … Continue reading An easy way to accidentally inflate reported R-squared in linear regression models

Read more »


Subscribe

Email:

  Subscribe