# Tag: R

## Le Monde puzzle [#1111]

Another low-key arithmetic problem as Le Monde current mathematical puzzle:

Notice that there are 10 numbers less than, and prime with 11, 100 less than and prime with 101, 1000 less than, and prime with 1111? What is the smallest integer N such that…

## Practical Data Science with R update

Just got the following note from a new reader: Thank you for writing Practical Data Science with R. It’s challenging for me, but I am learning a lot by following your steps and entering the commands. Wow, this is exactly what Nina Zumel and I hoped for. We wish we could make everything easy, but … Continue reading Practical Data Science with R update

## Le Monde puzzle [#1110]

A low-key sorting problem as Le Monde current mathematical puzzle: If the numbers from 1 to 67 are randomly permuted and if the sorting algorithm consists in picking a number i with a position higher than its rank i and moving it at the correct i-th position, what is the maximal number of steps to […]

## String interpolation in Python and R

One of the things I liked about Perl was string interpolation. If you use a variable name in a string, the variable will expand to its value. For example, if you a variable \$x which equals 42, then the string “The answer is \$x.” will expand to “The answer is 42.” Perl requires variables to […]

## WVPlots 1.1.2 on CRAN

I have put a new release of the WVPlots package up on CRAN. This release adds palette and/or color controls to most of the plotting functions in the package. WVPlots was originally a catch-all package of ggplot2 visualizations that we at Win-Vector tended to use repeatedly, and wanted to turn into “one-liners.” A consequence of … Continue reading WVPlots 1.1.2 on CRAN

## R puzzle

Can you guess the meaning of the following R code
“?”=`u\164f8ToI\x6Et`;’!’=prod;!{
z<-y[1]}&z>T##&[]>~48bEfILpu
If not (!), the explanation is provided in Robin’s answer to a codegolf …

## Advanced Data Reshaping in Python and R

This note is a simple data wrangling example worked using both the Python data_algebra package and the R cdata package. Both of these packages make data wrangling easy through he use of coordinatized data concepts (relying heavily on Codd’s “rule of access”). The advantages of data_algebra and cdata are: The user specifies their desired transform … Continue reading Advanced Data Reshaping in Python and R

## Le Monde puzzle [#1109]

A digital problem as Le Monde current mathematical puzzle: Noble numbers are such that they only involve different digits and are multiple of all their digits. What is the largest noble number? Hmmmm…. Brute force? Since the maximal number of digits is 10, one may as well try: k=soz=9 for (t in 1:1e3){ sol=1 while […]

## Seeking postdoc (or contractor) for next generation Stan language research and development

The Stan group at Columbia is looking to hire a postdoc* to work on the next generation compiler for the Stan open-source probabilistic programming language. Ideally, a candidate will bring language development experience and also have research interests in a related field such as programming languages, applied statistics, numerical analysis, or statistical computation. The language […]

## Why R?

I was working with our copy editor on Appendix A of Practical Data Science with R, 2nd Edition; Zumel, Mount; Manning 2019, and ran into this little point (unfortunately) buried in the back of the book. In our opinion the R ecosystem is the fastest path to substantial data science, statistical, and machine learning accomplishment. … Continue reading Why R?

## It is Time for CRAN to Ban Package Ads

NPM (a popular Javascript package repository) just banned package advertisements. I feel the CRAN repository should do the same. Not all R-users are fully aware of package advertisements. But they clutter up work, interfere with reproducibility, and frankly are just wrong. For example, here is the advertisement code from ggplot2: .onAttach <- function(…) { withr::with_preserve_seed({ … Continue reading It is Time for CRAN to Ban Package Ads

## Introducing data_algebra

This article introduces the data_algebra project: a data processing tool family available in R and Python. These tools are designed to transform data either in-memory or on remote databases. In particular we will discuss the Python implementation (also called data_algebra) and its relation to the mature R implementations (rquery and rqdatatable). Introduction Parts of the … Continue reading Introducing data_algebra

## What is vtreat?

vtreat is a DataFrame processor/conditioner that prepares real-world data for supervised machine learning or predictive modeling in a statistically sound manner. vtreat takes an input DataFrame that has a specified column called “the outcome variable” (or “y”) that is the quantity to be predicted (and must not have missing values). Other input columns are possible … Continue reading What is vtreat?

## Speaking at BARUG

We will be speaking at the Tuesday, September 3, 2019 BARUG. If you are in the Bay Area, please come see us. Nina Zumel & John Mount Practical Data Science with R Practical Data Science with R (Zumel and Mount) was one of the first, and most widely-read books on the practice of doing Data … Continue reading Speaking at BARUG

## vtreat up on PyPi

I am excited to announce vtreat is now available for Python on PyPi, in addition for R on CRAN. vtreat is: A data.frame processor/conditioner that prepares real-world data for predictive modeling in a statistically sound manner. vtreat prepares variables so that data has fewer exceptional cases, making it easier to safely use models in production. … Continue reading vtreat up on PyPi

## Returning to Tides

Fred Viole shared a great “data only” R solution to the forecasting tides problem. The methodology comes from a finance perspective, and has some great associated notes and articles. This gives me a chance to comment on the odd relation between prediction and profit in finance. If there really was a trade-able item with low … Continue reading Returning to Tides

## Lord Kelvin, Data Scientist

In 1876 A. Légé & Co., 20 Cross Street, Hatton Gardens, London completed the first “tide calculating machine” for William Thomson (later Lord Kelvin) (ref). Thomson’s (Lord Kelvin) First Tide Predicting Machine, 1876 The results were plotted on the paper cylinders, and one literally “turned the crank” to perform the calculations. The tide calculating machine … Continue reading Lord Kelvin, Data Scientist

## R wins COPSS Award!

Hadley Wickham from RStudio has won the 2019 COPSS Award, which expresses a rather radical switch from the traditional recipient of this award in that this recognises his many contributions to the R language and in particular to RStudio. The full quote for the nomination is his  “influential work in statistical computing, visualisation, graphics, and […]

## Some Notes on GNU Licenses in R Packages

I was recently asked if Win-Vector LLC would move the R wrapr package from a GPL-3 license to an LGPL license. In the end I decided to move wrapr distribution to a “GPL-2 | GPL-3” license. This means the package is now available under both GPL-2 and GPL-3 licensing, allowing the user to pick which … Continue reading Some Notes on GNU Licenses in R Packages

## A Comment on Data Science Integrated Development Environments

A point that differs from our experience struck us in the recent note: A development environment specifically tailored to the data science sector on the level of RStudio, for example, does not (yet) exist. “What’s the Best Statistical Software? A Comparison of R, Python, SAS, SPSS and STATA” Amit Ghosh Actually, Python has a large … Continue reading A Comment on Data Science Integrated Development Environments