Here's some suggested reading for the coming month:Backhouse, R. and B. Cherrier, 2016. 'It's computerization, stupid!' The spread of computers and the changing roles of theoretical and applied economics.Castle, J. L., M. P. Clements, and D. F. H...

Attention conservation notice: I have no taste. Amitav Ghosh, Sea of Poppies, River of Smoke and Flood of Fire Collectively, "the Ibis trilogy", three historical novels centered around the First Opium War. They're beautifully written and the viewpo...

I went over to the Freakonomics website and found this story about Leicester City’s unexpected championship. Here’s Stephen Dubner: At the start of this season, British betting houses put Leicester’s chances of winning the league at 5,000-to-1, which seemed, if anything, perhaps too generous. My [Dubner’s] son Solomon again: SOLOMON DUBNER: What would you say […] The post Freak Punts on Leicester Bet appeared first on Statistical Modeling, Causal Inference,…

What is a spaghetti plot? Spaghetti plots are line plots that involve many overlapping lines. Like spaghetti on your plate, they can be hard to unravel, yet for many analysts they are a delicious staple of data visualization. This article presents the good, the bad, and the messy about spaghetti […] The post Create spaghetti plots in SAS appeared first on The DO Loop.

This article is a demonstration the use of the R vtreat variable preparation package followed by caret controlled training. In previous writings we have gone to great lengths to document, explain and motivate vtreat. That necessarily gets long and unnecessarily feels complicated. In this example we are going to show what building a predictive model … Continue reading A demonstration of vtreat data preparation

That’s the title of a new paper by Paul Smaldino and Richard McElreath which presents a sort of agent-based model that reproduces the growth in the publication of junk science that we’ve seen in recent decades. Even before looking at this paper I was positively disposed toward it for two reasons. First because I do […] The post “The Natural Selection of Bad Science” appeared first on Statistical Modeling, Causal…

A not so enticing Le Monde mathematical puzzle: Find the minimal value of a five digit number divided by the sum of its digits. This can formalised as finding the minimum of N/(a+b+c+d+e) when N writes abcde. And solved by brute force. Using a rough approach to finding the digits of a five-digit number, the […]

For people not attending EuroVis: I will be tweeting from there next week and write postings here, like I have in the previous years. For people who will be attending: let’s meet up and run! EuroVis Coverage I will be tweeting during the sessions – assuming there’s Wifi – Tuesday through Friday (June 7–10). EuroVis being in the … Continue reading EuroVis Coverage and Running

This is a fun story. Jeff pointed me to a post on the sister blog by Christopher Hare and Robert Lupton, entitled “No, Sanders voters aren’t more conservative than Clinton voters. Here’s the data.” My reaction: “Who would ever think that Sanders supporters are more conservative than Clinton supporters? That’s counterintuitivism gone amok.” It turned […] The post The way we social science now appeared first on Statistical Modeling, Causal…

In the business analytics universe, the discipline of "business intelligence" is often frowned upon. Business intelligence is primarily generating reports on business metrics, tracking them over time, and producing ad-hoc analyses explaining these trends. People often complain that such work is not challenging and not sexy. There is a stigma that BI work is data dumping. In reality, good BI work is rare and extremely valuable. Horrible BI work is…

So says James Coyne, going full Meehl. I agree. Replication is great, but if you replicate noise studies, you’ll just get noise, hence the beneficial effects on science are (a) to reduce confidence in silly studies that we mostly shouldn’t have taken seriously in the first place, and (b) to provide an disincentive for future […] The post “Replication initiatives will not salvage the trustworthiness of psychology” appeared first on…

Elizabeth Heyman points us to this display by Adam Pearce and Dorothy Gambrell who write, “We scanned data from the U.S. Census Bureau’s 2014 American Community Survey—which covers 3.5 million households—to find out how people are pairing up.” They continue: For any selected occupation, the chart highlights the five most common occupation/relationship matchups. (For example, […] The post Who marries whom? appeared first on Statistical Modeling, Causal Inference, and Social…

A grid is a set of evenly spaced points. You can use SAS to create a grid of points on an interval, in a rectangular region in the plane, or even in higher-dimensional regions like the parallelepiped shown at the left, which is generated by three vectors. You can use […] The post Grids and linear subspaces appeared first on The DO Loop.

There’s definitely a need to innovate and develop new treatments in the area of asthma, but it’s easy to underestimate the barriers to just doing what we already know, such as making sure that people are following existing, well-established guideli...

A new movie on Ramanujan is coming out; mathematician Peter Woit gives it a very positive review, while film critic Anthony Lane is not so impressed. Both these reactions make sense, I guess (or so I say without having actually seen the movie myself). I’ll take this as an occasion to plug my article on […] The post Ramanujan notes appeared first on Statistical Modeling, Causal Inference, and Social Science.

In our previous note we demonstrated Y-Aware PCA and other y-aware approaches to dimensionality reduction in a predictive modeling context, specifically Principal Components Regression (PCR). For our examples, we selected the appropriate number of principal components by eye. In this note, we will look at ways to select the appropriate number of principal components in … Continue reading Principal Components Regression, Pt. 3: Picking the Number of Components

It is often said that “R is its packages.” One package of interest is ranger a fast parallel C++ implementation of random forest machine learning. Ranger is great package and at first glance appears to remove the “only 63 levels allowed for string/categorical variables” limit found in the Fortran randomForest package. Actually this appearance is … Continue reading On ranger respect.unordered.factors

How do we read pie charts? Do they differ from the even more reviled donut charts? What about common pie chart designs like exploded pies? In two papers to be presented at EuroVis next week, Drew Skau and I show that the common wisdom about how we read these charts (by angle) is almost certainly wrong, and that … Continue reading A Pair of Pie Chart Papers

Kaiser writes: More on that work on age adjustment. I keep asking myself where is it in the Stats curriculum do we teach students this stuff? A class session focused on that analysis teaches students so much more about statistical thinking than anything we have in the textbooks. I’m not sure. This sort of analysis […] The post All that really important statistics stuff that isn’t in the statistics textbooks…

Mon: All that really important statistics stuff that isn’t in the statistics textbooks Tues: Who marries whom? Wed: Gray graphs look pretty Thurs: Freak Punts on Leicester Bet Fri: Who falls for the education reform hype? Sat: Taking responsibility for your statistical conclusions: You must decide what variation to compare to. Sun: Researchers demonstrate new […] The post On deck this week appeared first on Statistical Modeling, Causal Inference, and…