Via Twitter, Bart S (@BartSchuijt) sent me to this TechCrunch article, which contains several uninspiring charts. The most disturbing one is this: There is a classic Tufte class here: only five numbers and yet the chart is so confusing. And...

Glenn Chisholm writes: As a frequent visitor of your blog (a bit of a long time listener first time caller comment I know) I saw this particular controversy: Summary: https://drive.google.com/file/d/0B6mLpCEIGEYGYl9RZWFRcmpsZk0/view?pref=2&pli=1 Very superficial analysis: https://docs.google.com/document/d/1SdmBLFW9gISaqOyyz_fATgaFupI2-n6vWx80XRGUVBo/edit?pref=2&pli=1 and was interested if I could get you to blog on its actual statistic foundations, this particular paper has at least […] The post You’ll never guess what I’ll say about this paper claiming election fraud!…

A woman who’s arguably the top person ever in a male-dominated field. Steve Sailer introduced the category and entered Pauline Kael (top film critic) as its inaugural member. I followed up with Alice Waters (top chef/restaurateur), Mata Hari (top spy), Agatha Christie (top mystery writer), and Helen Keller (top person who overcame a disability; sorry, […] The post Objects of the class “Pauline Kael” appeared first on Statistical Modeling, Causal…

I have previously shown how to overlay basic plots on box plots when all plots share a common discrete X axis. It is interesting to note that box plots can also be overlaid on a continuous (interval) axis. You often need to bin the data before you create the plot. […] The post Overlay plots on a box plot in SAS: Continuous X axis appeared first on The DO Loop.

In an earlier post, "Fixed Effects Without Panel Data", I argued that you could allow for (and indeed estimate) fixed effects in pure cross sections (i.e., no need for panel data) by using regularization estimators like LASSO. The idea is to fit ...

After noticing this from a recent Pew Research report: Ben Hanowell wrote: This made me [Hanowell] think of your critique of Case and Deaton’s finding about non-Hispanic mortality. I wonder how much these results are driven by the fact that the population of adults aged 65 and older has gotten older with increasing lifespans, etc […] The post “Smaller Share of Women Ages 65 and Older Are Living Alone,” before…

A game-related Le Monde mathematical puzzle: Starting with a pile of 10⁴ tokens, Bob plays the following game: at each round, he picks one of the existing piles with at least 3 tokens, takes away one of the tokens in this pile, and separates the remaining ones into two non-empty piles of arbitrary size. Bob […]

Garnett McMillan writes: You have argued about the pervasive role of the Garden of Forking Paths in published research. Given this influence, do you think that it is sensible to use published research to inform priors in new studies? My reply: Yes, I think you can use published research but in doing so you should […] The post The answer is the Edlin factor appeared first on Statistical Modeling, Causal…

Mike Spagat, famous for blowing the whistle on that Iraq survey (the so-called Lancet study) ten years ago, writes: I’ve just put up the story about how a survey research company threatened to sue me to keep me quiet. I’ve also put up a lot of data that readers can analyse if they want to […] The post They threatened to sue Mike Spagat but that’s not shutting him up…

Mon: They threatened to sue Mike Spagat but that’s not shutting him up Tues: “Smaller Share of Women Ages 65 and Older Are Living Alone,” before and after age adjusment Wed: Objects of the class “Pauline Kael” Thurs: research-lies-allegations-windpipe-surgery Fri: Hey—here’s a tip from the biology literature: If your correlation is .02, try binning your […] The post On deck this week appeared first on Statistical Modeling, Causal Inference, and…

Box plots summarize the distribution of a continuous variable. You can display multiple box plots in a single graph by specifying a categorical variable. The resulting graph shows the distribution of subpopulations, such as different experimental groups. In the SGPLOT procedure, you can use the CATEGORY= option on the VBOX […] The post Overlay plots on a box plot in SAS: Discrete X axis appeared first on The DO Loop.

The MultiThreaded blog over at Stitch Fix (hat tip to Hilary Parker) has posted a really nice list of data science books (disclosure: one of my books is on the list). We’ve queried our data science team for some of their favorite data science boo...

Barry Quinn writes: I would like some quick advice on survey design literature, specifically any good references you would have when designing a good online survey to allow for some decent hierarchal modeling? My quick response is that during the opening you should already be thinking about the endgame. In this case, the endgame is […] The post How to design a survey so that Mister P will work well?…

This is a public service announcement in the interest of more robust numerical calculations. Like matrix inverse, exponentiation is bad news. It’s prone to overflow or underflow. Just try this in R: > exp(-800) > exp(800) That’s not rounding error you see. The first one evaluates to zero (underflows) and the second to infinity (overflows). […] The post Log Sum of Exponentials for Robust Sums on the Log Scale appeared…

Alex Gamma sends along a recently published article by Carola Salvi, Irene Cristofori, Jordan Grafman, and Mark Beeman, along with the note: This might be of interest to you, since it’s political science and smells bad. From The Quarterly Journal of Experimental Psychology: Two groups of 22 college students each identified as conservatives or liberals […] The post No, I’m not convinced by this one either. appeared first on Statistical…

Leonardo Egidi writes: Inspired by your world cup model I fitted in Stan a model for the Euro Cup which start today, with two Poisson distributions for the goals scored at every match by the two teams (perfect prediction for the first match!). Data and code are here. Here’s the model, and here are the […] The post Stan makes Euro predictions! (now with data and code so you can…

Even better than binging on Netflix, catch up on Michael Betancourt’s updated video lectures, just days after their live theatrical debut in Tokyo. Scalable Bayesian Inference with Hamiltonian Monte Carlo (YouTube, 1 hour) Some Bayesian Modeling Techniques in Stan (YouTube, 1 hour 40 minutes) His previous videos have received very good reviews and they’re only […] The post Betancourt Binge (Video Lectures on HMC and Stan) appeared first on Statistical…

The other day I posted on a controversy in sociology where Aliya Saperstein and Andrew Penner analyzed data from the National Longitudinal Survey of Youth, coming to the conclusion that “that race is not a fixed characteristic of individuals but is flexible and continually negotiated in everyday interactions,” but then Lance Hannon and Robert DeFina […] The post Racial classification sociology controversy update appeared first on Statistical Modeling, Causal Inference,…

After Tuesday and Wednesday, EuroVis continued for the rest of the week. There were papers about visualization, interaction, networks, and other stuff, a dinner in a former church, and finally the capstone. First a little update: you can now watch Anders Ynnerman’s epic keynote. Coordinated Views and Interaction Design Who says coordinated views have to be next … Continue reading EuroVis 2016, Thursday and Friday