Tutorial: Using seplyr to Program Over dplyr

July 22, 2017
By

seplyr is an R package that makes it easy to program over dplyr 0.7.*. To illustrate this we will work an example. Suppose you had worked out a dplyr pipeline that performed an analysis you were interested in. For an example we could take something similar to one of the examples from the dplyr 0.7.0 … Continue reading Tutorial: Using seplyr to Program Over dplyr

Read more »

A stunned Dyson

July 22, 2017
By

Terry Martin writes: I ran into this quote and thought you might enjoy it. It’s from p. 273 of Segre’s new biography of Fermi, The Pope of Physics: When Dyson met with him in 1953, Fermi welcomed him politely, but he quickly put aside the graphs he was being shown indicating agreement between theory and […] The post A stunned…

Read more »

Stan Weekly Roundup, 21 July 2017

July 21, 2017
By

It was another productive week in Stan land. The big news is that Jonathan Auerbach, Tim Jones, Susanna Makela, Swupnil Sahai, and Robin Winstanley won first place in a New York City competition for predicting elementary school enrollment. Jonathan told me, “I heard 192 entered, and there were 5 finalists….Of course, we used Stan (RStan […] The post Stan Weekly…

Read more »

“Bayes factor”: where the term came from, and some references to why I generally hate it

July 21, 2017
By

Someone asked: Do you know when this term was coined or by whom? Kass and Raftery’s use of the tem as the title of their 1995 paper suggests that it was still novel then, but I have not noticed in the paper any information about where it started. I replied: According to Etz and Wagenmakers […] The post “Bayes factor”:…

Read more »

Surprising result when exploring Rcpp gallery

July 21, 2017
By
Surprising result when exploring Rcpp gallery

I’m starting to incorporate more Rcpp in my R work, and so decided to spend some time exploring the Rcpp Gallery. One example by John Merrill caught my eye. He provides a C++ solution to transforming an list of lists into a data frame, and shows impressive speed savings compared to as.data.frame. This got me thinking about how […]

Read more »

Quirks about running Rcpp on Windows through RStudio

July 20, 2017
By
Quirks about running Rcpp on Windows through RStudio

Quirks about running Rcpp on Windows through RStudio This is a quick note about some tribulations I had running Rcpp (v. 0.12.12) code through RStudio (v. 1.0.143) on a Windows 7 box running R (v. 3.3.2). I also have RTools v. 3.4 installed. I fully admit that this may very well be specific to my […]

Read more »

How does a Nobel-prize-winning economist become a victim of bog-standard selection bias?

July 20, 2017
By

Someone who wishes to remain anonymous writes in with a story: Linking to a new paper by Jorge Luis García, James J. Heckman, and Anna L. Ziff, an economist Sue Dynarski makes this “joke” on facebook—or maybe it’s not a joke: How does one adjust standard errors to account for the fact that N of […] The post How does…

Read more »

Dragon Trainer rich mathematical task

July 20, 2017
By
Dragon Trainer rich mathematical task

I love rich mathematical tasks. Here is one for all levels of schooling. What do you think? Background to rich tasks A rich task is an open-ended task that students can engage with at multiple levels. I use the following … Continue reading →

Read more »

Make Your Plans for Stans (-s + Con)

July 19, 2017
By
Make Your Plans for Stans (-s + Con)

This post is by Mike A friendly reminder that registration is open for StanCon 2018, which will take place over three days, from Wednesday January 10, 2018 to Friday January 12, 2018, at the beautiful Asilomar Conference Grounds in Pacific Grove, California. Detailed information about registration and accommodation at Asilomar, including fees and instructions, can be found on […] The post Make Your…

Read more »

Short course on Bayesian data analysis and Stan 23-25 Aug in NYC!

July 19, 2017
By

Jonah “ShinyStan” Gabry, Mike “Riemannian NUTS” Betancourt, and I will be giving a three-day short course next month in New York, following the model of our successful courses in 2015 and 2016. Before class everyone should install R, RStudio and RStan on their computers. (If you already have these, please update to the latest version […] The post Short course…

Read more »

His concern is that the authors don’t control for the position of games within a season.

July 19, 2017
By

Chris Glynn wrote last year: I read your blog post about middle brow literature and PPNAS the other day. Today, a friend forwarded me this article in The Atlantic that (in my opinion) is another example of what you’ve recently been talking about. The research in question is focused on Major League Baseball and the […] The post His concern…

Read more »

The imprecision of data, subway edition

July 19, 2017
By
The imprecision of data, subway edition

Kaiser Fung, founder of Principal Analytics Prep, draws practical lessons from a CUNY advertisement in the subway train

Read more »

A quantile definition for skewness

July 19, 2017
By
A quantile definition for skewness

Skewness is a measure of the asymmetry of a univariate distribution. I have previously shown how to compute the skewness for data distributions in SAS. The previous article computes Pearson's definition of skewness, which is based on the standardized third central moment of the data. Moment-based statistics are sensitive to [...] The post A quantile definition for skewness appeared first…

Read more »

seplyr update

July 19, 2017
By

The development version of my new R package seplyr is performing in practical applications with dplyr 0.7.* much better than even I (the seplyr package author) expected. I think I have hit a very good set of trade-offs, and I have now spent significant time creating documentation and examples. I wish there had been such … Continue reading seplyr update

Read more »

My unfunded HHMI teaching professors proposal

July 19, 2017
By

A little over a year ago I saw a request from the Howard Hughes Medical Institute for proposals focused on undergraduate teaching. I decided to apply for this grant since it combines all of the things I’m interested in: teaching, education resear...

Read more »

Animating a spinner using ggplot2 and ImageMagick

July 18, 2017
By
Animating a spinner using ggplot2 and ImageMagick

It’s Sunday, and I [Bob] am just sitting on the couch peacefully ggplotting to illustrate basic sample spaces using spinners (a trick I’m borrowing from Jim Albert’s book Curve Ball). There’s an underlying continuous outcome (i.e., where the spinner lands) and a quantization into a number of regions to produce a discrete outcome (e.g., “success” […] The post Animating a…

Read more »

“The ‘Will & Grace’ Conjecture That Won’t Die” and other stories from the blogroll

July 18, 2017
By

From sociologist Jay Livingston: The “Will & Grace” Conjecture That Won’t Die From sociologist David Weakliem: Why does Trump try to implement the unpopular ideas he’s proposed, and not the popular ideas? History professor who wrote award-winning book about 1970-era crime, is misinformed about the history of 1970s-era crime “West Virginia, which was a lock […] The post “The ‘Will…

Read more »

This one takes time to make, takes even more time to read

July 18, 2017
By
This one takes time to make, takes even more time to read

Kaiser Fung, creator of Junk Charts and Principal Analytics Prep, explains why this Wired chart about Netflix viewing behavior is so hard to read, and offers an alternative focusing on a particular insight about the data

Read more »

How to run a course (if you’re me)

July 17, 2017
By

Last summer, I and my trusty henchpeople from the Department of Politics ran an intensive six week summer course for incoming freshmen on data science (‘POL245’, for locals). This post sketches out how I think course infrastructure should work, and provides some practical details of how we arranged things.  Most of our structures worked pretty … Continue reading How to…

Read more »

How to design future studies of systemic exercise intolerance disease (chronic fatigue syndrome)?

July 17, 2017
By

Someone named Ramsey writes on behalf of a self-managed support community of 100+ systemic exercise intolerance disease (SEID) patients. He read my recent article on the topic and had a question regarding the following excerpt: For conditions like S.E.I.D., then, the better approach may be to gather data from people suffering “in the wild,” combining […] The post How to…

Read more »

Should we continue not to trust the Turk? Another reminder of the importance of measurement

July 17, 2017
By

From 2013: Don’t trust the Turk From 2017 (link from Kevin Lewis), from Jesse Chandler and Gabriele Paolacci: The Internet has enabled recruitment of large samples with specific characteristics. However, when researchers rely on participant self-report to determine eligibility, data quality depends on participant honesty. Across four studies on Amazon Mechanical Turk, we show that […] The post Should we…

Read more »

3 ways to visualize prediction regions for classification problems

July 17, 2017
By
3 ways to visualize prediction regions for classification problems

An important problem in machine learning is the "classification problem." In this supervised learning problem, you build a statistical model that predicts a set of categorical outcomes (responses) based on a set of input features (explanatory variables). You do this by training the model on data for which the outcomes [...] The post 3 ways to visualize prediction regions for…

Read more »

They want help designing a crowdsourcing data analysis project

July 16, 2017
By

Michael Feldman writes: My collaborators and myself are doing research where we try to understand the reasons for the variability in data analysis (“the garden of forking paths”). Our goal is to understand the reasons why scientists make different decisions regarding their analyses and in doing so reach different results. In a project called “Crowdsourcing […] The post They want…

Read more »


Subscribe

Email:

  Subscribe