Posts Tagged ‘ Tutorials ’

Odds and Probability: Commonly Misused Terms in Statistics – An Illustrative Example in Baseball

Odds and Probability: Commonly Misused Terms in Statistics – An Illustrative Example in Baseball

Yesterday, all 15 home teams in Major League Baseball won on the same day – the first such occurrence in history.  CTV News published an article written by Mike Fitzpatrick from The Associated Press that reported on this event.  The article states, “Viewing every game as a 50-50 proposition independent of all others, STATS figured the […]

Read more »

Compiling RMarkdown from a Helper R Script

August 6, 2015
By
Compiling RMarkdown from a Helper R Script

The problemI was looking for a way to compile an RMarkdown document and have the filename of the resulting PDF or HTML document contain the name of the input data that it processed. That is, if I compiled the analysis.Rmd file, where in that file it di...

Read more »

Using and Producing a Control Chart in R for Statistical Process Control – An Application in Analytical Chemistry

Using and Producing a Control Chart in R for Statistical Process Control – An Application in Analytical Chemistry

Update on Wednesday, August 5, 2015: I am considering retracting this blog post based on my disagreement with Daniel Harris’ approach to building the warning and action lines in a control chart.  Please see my replies to Jake Yeager and Lee Kennedy in the comments section for my thoughts.  Your patience is appreciated; please stay […]

Read more »

I loved this %>% crosstable

July 28, 2015
By
I loved this %>% crosstable

This is a public tank you for @heatherturner's contribution. Now the SciencesPo's crosstable can work in a chain (%>%) fashion; useful for using along with other packages that have integrated the magrittr operator. > candidatos %>% + filter(desc_cargo == 'DEPUTADO ESTADUAL'| desc_cargo =='DEPUTADO DISTRITAL' | desc_cargo =='DEPUTADO FEDERAL' | desc_cargo =='VEREADOR' | desc_cargo =='SENADOR') %>% […]

Read more »

Efficient accumulation in R

July 27, 2015
By
Efficient accumulation in R

R has a number of very good packages for manipulating and aggregating data (plyr, sqldf, ScaleR, data.table, and more), but when it comes to accumulating results the beginning R user is often at sea. The R execution model is a bit exotic so many R users are very uncertain which methods of accumulating results are … Continue reading Efficient accumulation in R →

Read more »

Working with Sessionized Data 1: Evaluating Hazard Models

July 8, 2015
By
Working with Sessionized Data 1: Evaluating Hazard Models

When we teach data science we emphasize the data scientist’s responsibility to transform available data from multiple systems of record into a wide or denormalized form. In such a “ready to analyze” form each individual example gets a row of data and every fact about the example is a column. Usually transforming data into this … Continue reading Working with Sessionized Data 1: Evaluating Hazard Models →

Read more »

Why does designing a simple A/B test seem so complicated?

June 22, 2015
By
Why does designing a simple A/B test seem so complicated?

Why does planning something as simple as an A/B test always end up feeling so complicated? An A/B test is a very simple controlled experiment where one group is subject to a new treatment (often group “B”) and the other group (often group “A”) is considered a control group. The classic example is attempting to … Continue reading Why does designing a simple A/B test seem so complicated? →

Read more »

Wanted: A Perfect Scatterplot (with Marginals)

June 12, 2015
By

We saw this scatterplot with marginal densities the other day, in a blog post by Thomas Wiecki: The graph was produced in Python, using the seaborn package. Seaborn calls it a “jointplot;” it’s called a “scatterhist” in Ma...

Read more »

My favorite R bug

May 23, 2015
By

In this note am going to recount “my favorite R bug.” It isn’t a bug in R. It is a bug in some code I wrote in R. I call it my favorite bug, as it is easy to commit and (thanks to R’s overly helpful nature) takes longer than it should to find. ...

Read more »

R: single plot with two different y-axes

April 21, 2015
By
R: single plot with two different y-axes

I forgot where I originally found the code to do this, but I recently had to dig it out again to remind myself how to draw two different y axes on the same plot to show the values of two different features of the data. This is somewhat distinct from th...

Read more »


Subscribe

Email:

  Subscribe