A thanksgiving dplyr Rubik’s cube puzzle for you

November 25, 2015
By

Nick Carchedi is back visiting from DataCamp and for fun we came up with a dplyr Rubik's cube puzzle. Here is how it works. To solve the puzzle you have to make a 4 x 3 data frame that spells Thanksgiving like this: View the code on Gist. To solve the puzzle you need to pipe this

Read more »

Gary McClelland agrees with me that dichotomizing continuous variables is a bad idea. He also thinks my suggestion of dividing a variable into 3 parts is also a mistake.

November 25, 2015
By
Gary McClelland agrees with me that dichotomizing continuous variables is a bad idea.  He also thinks my suggestion of dividing a variable into 3 parts is also a mistake.

In response to some of the discussion that inspired yesterday’s post, Gary McClelland writes: I remain convinced that discretizing a continuous variable, especially for multiple regression, is the road to perdition. Here I explain my concerns. First, I don’t buy the motivation that discretized analyses are easier to explain to lay citizens and the press. […] The post Gary McClelland agrees with me that dichotomizing continuous variables is a bad…

Read more »

3 YEARS AGO (NOVEMBER 2012): MEMORY LANE

November 25, 2015
By
3 YEARS AGO (NOVEMBER 2012): MEMORY LANE

MONTHLY MEMORY LANE: 3 years ago: November 2012. I mark in red three posts that seem most apt for general background on key issues in this blog.[1]. Please check out others that didn’t make the “bright red cut”. If you’re interested in the Likelihood Principle, check “Blogging Birnbaum” and “Likelihood Links”. If you think P-values are hard to explain, see how […]

Read more »

Even the tiniest error messages can indicate an invalid statistical analysis

November 25, 2015
By
Even the tiniest error messages can indicate an invalid statistical analysis

The other day, I was reading in a data set in R, and the function indicated that there was a warning about a parsing error on one line. I went ahead with the analysis anyway, but that small parsing error kept bothering me. I thought it was just one lin...

Read more »

Extracting elements from a matrix: rows, columns, submatrices, and indices

November 25, 2015
By
Extracting elements from a matrix: rows, columns, submatrices, and indices

A matrix is a convenient way to store an array of numbers. However, often you need to extract certain elements from a matrix. The SAS/IML language aupports two ways to extract elements: by using subscripts or by using indices. Use subscripts when you are extracting a rectangular portion of a […] The post Extracting elements from a matrix: rows, columns, submatrices, and indices appeared first on The DO Loop.

Read more »

a programming bug with weird consequences

November 24, 2015
By
a programming bug with weird consequences

One student of mine coded by mistake an independent Metropolis-Hastings algorithm with too small a variance in the proposal when compared with the target variance. Here is the R code of this implementation: It produces outputs of the following shape which is quite amazing because of the small variance. The reason for the lengthy freezes […]

Read more »

Internet use and religion, part four

November 24, 2015
By
Internet use and religion, part four

[If you are jumping into the middle of this series, you might want to start with this article, which explains the methodological approach I am taking.]In the previous article, I presented preliminary results from a study of relationships between I...

Read more »

Statistical Models That Support Design Thinking: Driver Analysis vs. Partial Correlation Networks

November 24, 2015
By
Statistical Models That Support Design Thinking: Driver Analysis vs. Partial Correlation Networks

We have been talking about design thinking in marketing since Tim Brown's Harvard Business Review article in 2008. It might be easy for the data scientist to dismiss the approach as merely a type of brainstorming for new products or services. Yet, desi...

Read more »

Fitting linear mixed models for QTL mapping

November 24, 2015
By
Fitting linear mixed models for QTL mapping

Linear mixed models (LMMs) have become widely used for dealing with population structure in human GWAS, and they’re becoming increasing important for QTL mapping in model organisms, particularly for the analysis of advanced intercross lines (AIL), which often exhibit variation in the relationships among individuals. In my efforts on R/qtl2, a reimplementation R/qtl to better […]

Read more »

20 years of Data Science: from Music to Genomics

November 24, 2015
By
20 years of Data Science: from Music to Genomics

I finally got around to reading David Donoho's 50 Years of Data Science paper.  I highly recommend it. The following quote seems to summarize the sentiment that motivated the paper, as well as why it has resonated among academic statisticians: The statistics profession is caught at a confusing moment: the activities which preoccupied it over centuries are now

Read more »

Beyond the median split: Splitting a predictor into 3 parts

November 24, 2015
By
Beyond the median split:  Splitting a predictor into 3 parts

Carol Nickerson pointed me to a series of papers in the journal Consumer Psychology, first one by Dawn Iacobucci et al. arguing in favor of the “median split” (replacing a continuous variable by a 0/1 variable split at the median) “to facilitate analytic ease and communication clarity,” then a response by Gary McClelland et al. […] The post Beyond the median split: Splitting a predictor into 3 parts appeared first…

Read more »

Estimating the exponent of discrete power law data

November 24, 2015
By
Estimating the exponent of discrete power law data

Suppose you have data from a discrete power law with exponent α. That is, the probability of an outcome n is proportional to n-α. How can you recover α? A naive approach would be to gloss over the fact that you have discrete data and use the MLE (maximum likelihood estimator) for continuous data. That […]

Read more »

Statbusters: please back up an extreme claim with numbers

November 23, 2015
By

In this week's Statbusters, my column with Andrew Gelman in the Daily Beast, we take note of Slate's recent rant about "wasteful" anti-smoking advertising, and demonstrate how to think about cost-benefit analysis. The key point is: if you are going to make an extreme claim, you better have some numbers to back it up. These numbers can be approximate, and based on (potentially dubious) Googled data. Not every analysis needs…

Read more »

I already know who will be president in 2016 but I’m not telling

November 23, 2015
By

Nadia Hassan writes: One debate in political science right now concerns how the economy influences voters. Larry Bartels argues that Q14 and Q15 impact election outcomes the most. Doug Hibbs argues that all 4 years matter, with later growth being more important. Chris Wlezien claims that the first two years don’t influence elections but the […] The post I already know who will be president in 2016 but I’m not…

Read more »

Efficiency in space usage leads to efficiency in comprehension

November 23, 2015
By
Efficiency in space usage leads to efficiency in comprehension

Consider the following two charts that illustrate the same data. (I deliberately took out the header text to make a point. The original chart came from the Wall Street Journal.) To me, the line chart gets to the point more...

Read more »

On Bayesian DSGE Modeling with Hard and Soft Restrictions

November 23, 2015
By

A theory is essentially a restriction on a reduced form. It can be imposed directly (hard restrictions) or used as as a prior mean in a more flexible Bayesian analysis (soft restrictions). The soft restriction approach -- "theory as a shrinkage directi...

Read more »

Determine whether a SAS product is licensed

November 23, 2015
By
Determine whether a SAS product is licensed

Sometimes you are writing a program that needs to find out whether a particular SAS product (like SAS/ETS, SAS/QC, or SAS/OR) is licensed. I was reminded of this fact when I wrote last week's blog post about how to create a map with PROC SGPLOT. Although the SGPLOT procedure is […] The post Determine whether a SAS product is licensed appeared first on The DO Loop.

Read more »

Paper: The Connected Scatterplot for Presenting Paired Time Series

November 23, 2015
By
Paper: The Connected Scatterplot for Presenting Paired Time Series

I’m very happy to finally be able to announce our paper on the connected scatterplot technique. It describes the technique, provides some historical perspective, and most of all looks into how easy to understand and engaging the technique actually is. The connected scatterplot isn’t really known in visualization, but has gotten some interest in journalism. … Continue reading Paper: The Connected Scatterplot for Presenting Paired Time Series

Read more »

Top 9 questions to ask a statistician

November 23, 2015
By

Someone writes in: I am a student at . . . We have been given an assignment that requires us to interview a professional in the criminal justice field who performs, or has performed, statistical analyses on social science related data. . . . We are supposed to collect information pertaining to job description, job […] The post Top 9 questions to ask a statistician appeared first on Statistical Modeling,…

Read more »

If a study is worth a mention, it’s worth a link

November 22, 2015
By

Gur Huberman points to this op-ed entitled “Are Good Doctors Bad for Your Health?” and writes: Can’t the NYT provide a link or an explicit reference to the JAMA Internal Medicine article underlying this OpEd? A reader could then access the original piece and judge its credibility for himself I replied: Yes, very tacky of […] The post If a study is worth a mention, it’s worth a link appeared…

Read more »

Flatten your abs with this new statistical approach to quadrature

November 22, 2015
By
Flatten your abs with this new statistical approach to quadrature

Philipp Hennig, Michael Osborne, and Mark Girolami write: We deliver a call to arms for probabilistic numerical methods: algorithms for numerical tasks, including linear algebra, integration, optimization and solving differential equations, that return uncertainties in their calculations. . . . We describe how several seminal classic numerical methods can be interpreted naturally as probabilistic inference. […] The post Flatten your abs with this new statistical approach to quadrature appeared first…

Read more »

Sunday morning puzzle

November 21, 2015
By
Sunday morning puzzle

A question from X validated that took me quite a while to fathom and then the solution suddenly became quite obvious: If a sample taken from an arbitrary distribution on {0,1}⁶ is censored from its (0,0,0,0,0,0) elements, and if the marginal probabilities are know for all six components of the random vector, what is an […]

Read more »

Free gradient boosting lecture

November 21, 2015
By

We have always regretted that we didn’t get to cover gradient boosting in Practical Data Science with R (Manning 2014). To try make up for that we are sharing (for free) our GBM lecture from our (paid) video course Introduction to Data Science. (link, all support material here). Please help us get the word out … Continue reading Free gradient boosting lecture

Read more »


Subscribe

Email:

  Subscribe