R Tip: use inline operators for legibility. A Python feature I miss when working in R is the convenience of Python‘s inline + operator. In Python, + does the right thing for some built in data types: It concatenates lists: [1,2] + [3] is [1, 2, 3]. It concatenates strings: ‘a’ + ‘b’ is ‘ab’. … Continue reading R Tip: Use Inline Operators For Legibility

# Tag: R

## Practical Data Science with R, 2nd Edition discount!

Please help share our news and this discount. The second edition of our best-selling book Practical Data Science with R2, Zumel, Mount is featured as deal of the day at Manning. The second edition isn’t finished yet, but chapters 1 through 4 are available in the Manning Early Access Program (MEAP), and we have finished … Continue reading Practical Data Science with R, 2nd Edition discount!

## R Tip: Use seqi() For Indexes

R Tip: use seqi() for indexing. R‘s “1:0 trap” is a mal-feature that confuses newcomers and is a reliable source of bugs. This note will show how to use seqi() to write more reliable code and document intent. The issue is, contrary to expectations (formed in working with other programming languages) the sequence 1:0 is … Continue reading R Tip: Use seqi() For Indexes

## A Beautiful 2 by 2 Matrix Identity

While working on a variation of the RcppDynProg algorithm we derived the following beautiful identity of 2 by 2 real matrices: The superscript “top” denoting the transpose operation, the ||.||^2_2 denoting sum of squares norm, and the single |.| denoting determinant. This is derived from one of the check equations for the Moore–Penrose inverse and … Continue reading A Beautiful 2 by 2 Matrix Identity

## Timing the Same Algorithm in R, Python, and C++

While developing the RcppDynProg R package I took a little extra time to port the core algorithm from C++ to both R and Python. This means I can time the exact same algorithm implemented nearly identically in each of these three languages. So I can extract some comparative “apples to apples” timings. Please read on … Continue reading Timing the Same Algorithm in R, Python, and C++

## What does it mean to write “vectorized” code in R?

One often hears that R can not be fast (false), or more correctly that for fast code in R you may have to consider “vectorizing.” A lot of knowledgable R users are not comfortable with the term “vectorize”, and not really familiar with the method. “Vectorize” is just a slightly high-handed way of saying: R … Continue reading What does it mean to write “vectorized” code in R?

## Introducing RcppDynProg

RcppDynProg is a new Rcpp based R package that implements simple, but powerful, table-based dynamic programming. This package can be used to optimally solve the minimum cost partition into intervals problem (described below) and is useful in building piecewise estimates of functions (shown in this note). The abstract problem The primary problem RcppDynProg::solve_dynamic_program() is designed … Continue reading Introducing RcppDynProg

## Le Monde puzzle [#1076]

A cheezy Le Monde mathematical puzzle : (which took me much longer to find [in the sense of locating] than to solve, as Warwick U does not get a daily delivery of the newspaper [and this is pre-Brexit!]): Take a round pizza (or a wheel of Gruyère) cut into seven identical slices and turn one […]

## vtreat Variable Importance

vtreat‘s purpose is to produce pure numeric R data.frames that are ready for supervised predictive modeling (predicting a value from other values). By ready we mean: a purely numeric data frame with no missing values and a reasonable number of columns (missing-values re-encoded with indicators, and high-degree categorical re-encode by effects codes or impact codes). … Continue reading vtreat Variable Importance

## Quoting Concatenate

In our last note we used wrapr::qe() to help quote expressions. In this note we will discuss quoting and code-capturing interfaces (interfaces that capture user source code) a bit more. My position on code-capturing interfaces (or non-standard-evaluation/NSE) is: if poorly handled, they can be a large interface price/risk to pay for the minor convenience of … Continue reading Quoting Concatenate

## Reusable Pipelines in R

Pipelines in R are popular, the most popular one being magrittr as used by dplyr. This note will discuss the advanced re-usable piping systems: rquery/rqdatatable operator trees and wrapr function object pipelines. In each case we have a set of objects designed to extract extra power from the wrapr dot-arrow pipe %.>%. Piping Piping is … Continue reading Reusable Pipelines in R

## Sharing Modeling Pipelines in R

Reusable modeling pipelines are a practical idea that gets re-developed many times in many contexts. wrapr supplies a particularly powerful pipeline notation, and a pipe-stage re-use system (notes here). We will demonstrate this with the vtreat data preparation system. Our example task is to fit a model on some arbitrary data. Our model will try … Continue reading Sharing Modeling Pipelines in R

## Timing Grouped Mean Calculation in R

This note is a comment on some of the timings shared in the dplyr-0.8.0 pre-release announcement. The original published timings were as follows: With performance metrics: measurements are marketing. So let’s dig in the above a bit. These timings are of the kind of small task large number of repetition breed that Matt Dowle writes … Continue reading Timing Grouped Mean Calculation in R

## Very Non-Standard Calling in R

Our group has done a lot of work with non-standard calling conventions in R. Our tools work hard to eliminate non-standard calling (as is the purpose of wrapr::let()), or at least make it cleaner and more controllable (as is done in the wrapr dot pipe). And even so, we still get surprised by some of … Continue reading Very Non-Standard Calling in R

## Quoting in R

Many R users appear to be big fans of “code capturing” or “non standard evaluation” (NSE) interfaces. In this note we will discuss quoting and non-quoting interfaces in R. The above terms are simply talking about interfaces where a name to be used is captured from the source code the user typed, and thus does … Continue reading Quoting in R

## More on Bias Corrected Standard Deviation Estimates

This note is just a quick follow-up to our last note on correcting the bias in estimated standard deviations for binomial experiments. For normal deviates there is, of course, a well know scaling correction that returns an unbiased estimate for observed standard deviations. It (from the same source): … provides an example where imposing the … Continue reading More on Bias Corrected Standard Deviation Estimates

## How to de-Bias Standard Deviation Estimates

This note is about attempting to remove the bias brought in by using sample standard deviation estimates to estimate an unknown true standard deviation of a population. We establish there is a bias, concentrate on why it is not important to remove it for reasonable sized samples, and (despite that) give a very complete bias … Continue reading How to de-Bias Standard Deviation Estimates

## StanCon Helsinki streaming live now (and tomorrow)

We’re streaming live right now!

Thursday 08:45-17:30: YouTube Link

Friday 09:00-17:00: YouTube Link

Timezone is Eastern European Summer Time (EEST) +0300 UTC

Here’s a link to the full program.

There have already been some great talks an…

## Thanks, NVIDIA

Andrew and I both received a note like this from NVIDIA: We have reviewed your NVIDIA GPU Grant Request and are happy support your work with the donation of (1) Titan Xp to support your research. Thanks! In case other people are interested, NVIDA’s GPU grant program provides ways for faculty or research scientists to […]

The post Thanks, NVIDIA appeared first on Statistical Modeling, Causal Inference, and Social Science.

## Thanks, NVIDIA

Andrew and I both received a note like this from NVIDIA: We have reviewed your NVIDIA GPU Grant Request and are happy support your work with the donation of (1) Titan Xp to support your research. Thanks! In case other people are interested, NVIDA’s GPU grant program provides ways for faculty or research scientists to […]

The post Thanks, NVIDIA appeared first on Statistical Modeling, Causal Inference, and Social Science.