We have just released two new free video lectures on vectors from a programmer’s point of view. I am experimenting with what ideas do programmers find interesting about vectors, what concepts do they consider safe starting points, and how to condense and present the material. Please check the lectures out. Vectors for Programmers and Data … Continue reading Free Video Lecture: Vectors for Programmers and Data Scientists

# Tag: R

## biased sample!

A chance occurrence led me to this thread on R-devel about R sample function generating a bias by taking the integer part of the continuous uniform generator… And then to the note by Kellie Ottoboni and Philip Stark analysing the reason, namely the fact that R uniform [0,1) pseudo-random generator is not perfectly continuously uniform […]

## Timing Working With a Row or a Column from a data.frame

In this note we share a quick study timing how long it takes to perform some simple data manipulation tasks with R data.frames. We are interested in the time needed to select a column, alter a column, or select a row. Knowing what is fast and what is slow is critical in planning code, so … Continue reading Timing Working With a Row or a Column from a data.frame

## an attempt at code golf

Having discovered codegolf on Stack Exchange a few weeks ago, I spotted a few interesting puzzles since then but only got the opportunity at a try over a quiet and rainy weekend (and Robin being on vacation)! The challenge was to write an R code for deciding whether or not a given integer n is […]

## chance call for book reviewers

Since I have been unable to find local reviewers for my CHANCE review column of the above recent CRC Press books, namely Bayesian Cost-Effectiveness Analysis of Medical Treatments, Moreno et al. Combinatorial Inference in Geometric Data Analysis, Le Roux et al. Time Series Clustering and Classification, Maharaj et al. Textual Data Science with R, Bécue-Bertaut […]

## What is “Tidy Data”

I would like to write a bit on the meaning and history of the phrase “tidy data.” Hadley Wickham has been promoting the term “tidy data.” For example in an eponymous paper, he wrote: In tidy data: Each variable forms a column. Each observation forms a row. Each type of observational unit forms a table. … Continue reading What is “Tidy Data”

## easy Riddler

The riddle of the week is rather standard probability calculus If N points are generated at random places on the perimeter of a circle, what is the probability that you can pick a diameter such that all of those points are on only one side of the newly halved circle? Since it is equivalent to […]

## a perfectly normally distributed sample

When I saw this title on R-bloggers, I was wondering how “more perfect” a Normal sample could be when compared with the outcome of rnorm(n). Hence went checking the original blog on bayestestR in search of more information. Which was stating nothing more than how to generate a sample is perfectly normal by using the […]

## a perfectly normally distributed sample

When I saw this title on R-bloggers, I was wondering how “more perfect” a Normal sample could be when compared with the outcome of rnorm(n). Hence went checking the original blog on bayestestR in search of more information. Which was stating nothing more than how to generate a sample is perfectly normal by using the […]

## Could not Resist

Also, Practical Data Science with R, 2nd Edition; Zumel, Mount; Manning 2019 is now content complete! It is deep into editing and soon into production!

## visualising bias and unbiasedness

A question on X validated led me to wonder at the point made by Christopher Bishop in his Pattern Recognition and Machine Learning book about the MLE of the Normal variance being biased. As it is illustrated by the above graph that opposes the true and green distribution of the data (made of two points) […]

## Le Monde puzzle [#1099]

A simple 2×2 Le Monde mathematical puzzle: Arielle and Brandwein play a game out of two distinct even integers between 1500 and 2500, and y. Providing one another with either the pair (x/2,y+x/2) or the pair (x+y/2,y/2) until they run out of even possibilities or exceed 6 rounds. When x=2304, what is the value of […]

## Data Layout Exercises

John Mount, Nina Zumel; Win-Vector LLC 2019-04-27 In this note we will use five real life examples to demonstrate data layout transforms using the cdata R package. The examples for this note are all demo-examples from tidyr/demo/, and are mostly based on questions posted to StackOverflow. They represent a good cross-section of data layout problems, … Continue reading Data Layout Exercises

## Comparing Point-and-Click Front Ends for R

Now that I’ve completed seven detailed reviews of Graphical User Interfaces (GUIs) for R, let’s try to compare them. It’s easy enough to count their features and plot them, so let’s start there. Continue reading →

## Le Monde puzzle [#1094]

A rather blah number Le Monde mathematical puzzle: Find all integer multiples of 11111 with exactly one occurrence of each decimal digit.. Which I solved by brute force, by looking at the possible range of multiples (and borrowing stringr:str_count from Robin!) > combien=0 > for (i in 90001:900008){ j=i*11111 combien=combien+(min(stringr::str_count(j,paste(0:9)))==1)} > combien [1] 3456 And […]

## Practical Data Science with R Book Update (April 2019)

I thought I would give a personal update on our book: Practical Data Science with R 2nd edition; Zumel, Mount; Manning 2019. The second edition should be fully available this fall! Nina and I have finished up through chapter 10 (of 12), and Manning has released previews of up through chapter 7 (with more to … Continue reading Practical Data Science with R Book Update (April 2019)

## survivalists [a Riddler’s riddle]

A neat question from The Riddler on a multi-probability survival rate: Nine processes are running in a loop with fixed survivals rates .99,….,.91. What is the probability that the first process is the last one to die? Same question with probabilities .91,…,.99 and the probability that the last process is the last one to die. […]

## A Comparative Review of the JASP Statistical Software

JASP is a free and open source statistics package that targets beginners looking to point-and-click their way through analyses. This article is one of a series of reviews which aim to help non-programmers choose the Graphical User Interface (GUI) for R, which best meets their needs. Continue reading →

## Le Monde puzzle [#1092]

A Latin square Le Monde mathematical puzzle that I found rather dreary: A hidden 3×3 board contains all numbers from 1 to 9. Anselm wants to guess the board and makes two proposals. Berenicke tells him how many entries are in the right rows and colums for each proposal, along with the information that no […]

## Controlling Data Layout With cdata

Here is an example how easy it is to use cdata to re-layout your data. Tim Morris recently tweeted the following problem (corrected). Please will you take pity on me #rstats folks? I only want to reshape two variables x & y from wide to long! Starting with: d xa xb ya yb 1 1 … Continue reading Controlling Data Layout With cdata