"Unifying the Counterfactual and Graphical Approaches to Causality" (Tomorrow at the Statistics Seminar)

March 5, 2014
By

Attention conservation notice: Late notice of an academic talk in Pittsburgh. Only of interest if you care about the places where the kind of statistical theory that leans on concepts like "the graphical Markov property" merges with the kind of analy...

Read more »

Applied Statistics Lesson of the Day – The Full Factorial Design

Applied Statistics Lesson of the Day – The Full Factorial Design

An experimenter may seek to determine the causal relationships between factors and the response, where .  On first instinct, you may be tempted to conduct separate experiments, each using the completely randomized design with 1 factor.  Often, however, it is possible to conduct 1 experiment with  factors at the same time.  This is better than […]

Read more »

Power, power everywhere–(it) may not be what you think! [illustration]

March 5, 2014
By
Power, power everywhere–(it) may not be what you think! [illustration]

Statistical power is one of the neatest [i], yet most misunderstood statistical notions [ii].So here’s a visual illustration (written initially for our 6334 seminar), but worth a look by anyone who wants an easy way to attain the will to understand power.(Please see notes below slides.) [i]I was tempted to say power is one of […]

Read more »

Remembering Seymour Geisser

March 5, 2014
By
Remembering Seymour Geisser

This is the text, minus the nice formatting, of an email from Dennis Cook (my thesis advisor and current director of the U of MN School of Statistics) and Wes Johnson (a U of MN alum, a good friend, a great colleague and a student of Seymour Geisser's)...

Read more »

Remembering Seymour Geisser

March 5, 2014
By
Remembering Seymour Geisser

This is the text, minus the nice formatting, of an email from Dennis Cook (my thesis advisor and current director of the U of MN School of Statistics) and Wes Johnson (a U of MN alum, a good friend, a great colleague and a student of Seymour Geisser's)...

Read more »

Forecasting weekly data

March 4, 2014
By
Forecasting weekly data

This is another situation where Fourier terms are useful for handling the seasonality. Not only is the seasonal period rather long, it is non-integer (averaging 365.25/7 = 52.18). So ARIMA and ETS models do not tend to give good results, even with a period of 52 as an approximation. Regression with ARIMA errors The simplest approach is a regression with ARIMA errors. Here is an example using weekly data on…

Read more »

Some statistics about the book

March 4, 2014
By
Some statistics about the book

The release date for Zumel, Mount “Practical Data Science with R” is getting close. I thought I would share a few statistics about what goes into this kind of book. “Practical Data Science with R” started formal work in October of 2012. We had always felt the Win-Vector blog represented practice and research for such […] Related posts: On writing a technical book Book Review: Ensemble Methods in Data Mining…

Read more »

The Essential Identity between Classical Statistics and Statistical Mechanics

March 4, 2014
By

Many doubt that Statistical Mechanics and Classical Statistics have anything to do with each other. So I’ll lay this out step by step so you can see just how identical they really are. Step 1: The State Space Classical Statistics: If we roll a dice n...

Read more »

Nomenclatural abomination

March 4, 2014
By

David Hogg calls conventional statistical notation a “nomenclatural abomination”: The terminology used throughout this document enormously overloads the symbol p(). That is, we are using, in each line of this discussion, the function p() to mean something different; its meaning is set by the letters used in its arguments. That is a nomenclatural abomination. I […]

Read more »

Literal vs. rhetorical

March 4, 2014
By
Literal vs. rhetorical

Thomas Basbøll pointed me to a discussion on the orgtheory blog in which Jerry Davis, the editor of a journal of business management argued that it is difficult for academic researchers to communicate with the public because “the public prefers Cheetos to a healthy salad” and when serious papers are discussed on the internet, “everyone […]The post Literal vs. rhetorical appeared first on Statistical Modeling, Causal Inference, and Social Science.

Read more »

Nothing to see here

March 4, 2014
By
Nothing to see here

Some graphics are made to inform, some to amuse, some to delight. But the following scatter plot makes one wonder why why why... What does the designer want to say? *** I saw this chart inside an infographics titled "Where...

Read more »

The Star Puzzle

March 4, 2014
By
The Star Puzzle

The Star Puzzle is a puzzle presented on The Math Forum.  I became aware of this problem by noticing the article and solution posted on Quantitative Decisions article section. It asks the question, "How many triangles, quadrilaterals, and irregula...

Read more »

Advances in scalable Bayesian computation [day #1]

March 4, 2014
By
Advances in scalable Bayesian computation [day #1]

This was the first day of our workshop Advances in Scalable Bayesian Computation and it sounded like the “main” theme was probabilistic programming, in tune with my book review posted this morning. Indeed, both Vikash Mansinghka and Frank Wood gave talks about this concept, Vikash detailing the specifics of a new programming language called Venture […]

Read more »

Review: Kölner R Meeting 26 Feburary 2014

March 4, 2014
By
Review: Kölner R Meeting 26 Feburary 2014

Last week's Cologne R user group meeting was all about R and databases. We had three talks from a generic overview on how to connect R to databases, to a specific example with kdb+ and perhaps the future with ArangoDB, a NoSQL database.Connecting R wit...

Read more »

Fitting models to short time series

March 4, 2014
By
Fitting models to short time series

Following my post on fitting models to long time series, I thought I’d tackle the opposite problem, which is more common in business environments. I often get asked how few data points can be used to fit a time series model. As with almost all sample size questions, there is no easy answer. It depends on the number of model parameters to be estimated and the amount of randomness in the data.…

Read more »

Issue with thinning in R2OpenBUGS vs R2jags

March 3, 2014
By
Issue with thinning in R2OpenBUGS vs R2jags

While preparing the practicals for our course at the University of Alberta, I've discovered something kind of interesting. I'm sure this is nothing new and actually people who normally use both OpenBUGS and JAGS have already figured this out.&nbsp...

Read more »

capitalizing on chance (ii)

March 3, 2014
By
capitalizing on chance (ii)

I may have been exaggerating one year ago when I started this post with “Hardly a day goes by”, but now it is literally the case*. (This  also pertains to reading for Phil6334 for Thurs. March 6): Hardly a day goes by where I do not come across an article on the problems for statistical […]

Read more »

Google Maps Gallery Highlights Specialized Maps based on Public Data

March 3, 2014
By
Google Maps Gallery Highlights Specialized Maps based on Public Data

Google recently launched a dedicated Maps Gallery [google.com] to showcase a collection of hand-picked maps from several preferred organizations, such as the National Geographic, the U.S. Geological Survey or the City of Edmonton. It is the goal that ...

Read more »

On The Role Data Visualization Plays in the Scientific Process

March 3, 2014
By
On The Role Data Visualization Plays in the Scientific Process

In a new exhibition titled Beautiful Science: Picturing Data, Inspiring Insight [bl.uk], the British Library pays homage to the important role data visualization plays in the scientific process. The exhibition can be visited from 20 February until 2...

Read more »

Running into a Stan Reference by Accident

March 3, 2014
By

We were talking about parallelizing MCMC and I came up with what I thought was a neat idea for parallelizing MCMC (sample with fractional prior, average samples on a per-draw basis). But then I realized this approach could get the right posterior mean or right posterior variance, but not both, depending on how the prior […]The post Running into a Stan Reference by Accident appeared first on Statistical Modeling, Causal…

Read more »

What is the appropriate time scale for blogging—the day or the week?

March 3, 2014
By
What is the appropriate time scale for blogging—the day or the week?

I post (approximately) once a day and don’t plan to change that. I have enough material to post more often—for example, I could intersperse existing blog posts with summaries of my published papers or of other work that I like; and, beyond this, we currently have a one-to-two-month backlog of posts—but I’m afraid that if […]The post What is the appropriate time scale for blogging—the day or the week? appeared…

Read more »

On deck this week

March 3, 2014
By

Mon: What is the appropriate time scale for blogging—the day or the week? Tues: Literal vs. rhetorical Wed: Plagiarism, Arizona style Thurs: How much time (if any) should we spend criticizing research that’s fraudulent, crappy, or just ...

Read more »

Netflix’s stoked-up algorithms

March 3, 2014
By

At the start of the year, The Atlantic published a very nice, long article about Netflix's movie recommendation algorithm. You may remember this algorithm (internally known as Cinematch) received a $1 million makeover several years ago (the Netflix Prize), only that the prize-winning entry was deemed too complex--and does not generate sufficient incremental value--to be put into production. The reporter, Alexis Madrigal, noticed that Netflix has shifted attention from the…

Read more »


Subscribe

Email:

  Subscribe