## Machine Learning Lesson of the Day – Memory-Based Learning

Memory-based learning (also called instance-based learning) is a type of non-parametric algorithm that compares new test data with training data in order to solve the given machine learning problem.  Such algorithms search for the training data that are most similar to the test data and make predictions based on these similarities.  (From what I have learned, memory-based learning […]

## Highlighting the web

March 6, 2014
By

Users of my new online forecasting book have asked about having a facility for personal highlighting of selected sections, as students often do with print books. We have plans to make this a built-in part of the platform, but for now it is possible to do it using a simple browser extension. This approach allows any website to be highlighted, so is even more useful than if we only had…

## "Unifying the Counterfactual and Graphical Approaches to Causality" (Tomorrow at the Statistics Seminar)

March 5, 2014
By

Attention conservation notice: Late notice of an academic talk in Pittsburgh. Only of interest if you care about the places where the kind of statistical theory that leans on concepts like "the graphical Markov property" merges with the kind of analy...

## Exploring Ball Locations and Player Behaviors in Basketball

March 5, 2014
By

"Game on!" by Fathom Information Design is an exploratory visualization prototype that allows users to parse through a basketball game's data, to investigate the behaviors and patterns in terms of the statistics and locations of players. Based on a ...

## Fun with Mike Steele Quotes and Rants

March 5, 2014
By

Check out the web page of my Penn Statistics buddy, Mike Steele, probabilist, statistician and mathematician extraordinaire. (And that's just his day job. At night he battles the really hard stuff -- financial markets.) Among other things, you'll ...

## PLoS One, I have an idea for what to do with all your profits: buy hard drives

March 5, 2014
By

I've been closely following the fallout from PLoS One's new policy for data sharing. The policy says, basically, that if you publish a paper, all data and code to go with that paper should be made publicly available at the … Continue reading →

## Plagiarism, Arizona style

March 5, 2014
By

Last month a history professor sent me a note regarding plagiarism at Arizona State University: Matthew Whitaker, who had received an expedited promotion to full professor and was made Director of a new Center for the Study of Race and Democracy by Provost Elizabeth Capaldi and President Michael Crow, was charged by most of the […]The post Plagiarism, Arizona style appeared first on Statistical Modeling, Causal Inference, and Social Science.

## Advances in scalable Bayesian computation [day #2]

March 5, 2014
By

And here is the second day of our workshop Advances in Scalable Bayesian Computation gone! This time, it sounded like the “main” theme was about brains… In fact, Simon Barthelmé‘s research originated from neurosciences, while Dawn Woodard dissected a brain (via MRI) during her talk! (Note that the BIRS website currently posts Simon’s video as […]

## Optimizing a function that evaluates an integral

March 5, 2014
By

SAS programmers use the SAS/IML language for many different tasks. One important task is computing an integral. Another is optimizing functions, such as maximizing a likelihood function to find parameters that best fit a set of data. Last week I saw an interesting problem that combines these two important tasks. [...]

## "Unifying the Counterfactual and Graphical Approaches to Causality" (Tomorrow at the Statistics Seminar)

March 5, 2014
By

Attention conservation notice: Late notice of an academic talk in Pittsburgh. Only of interest if you care about the places where the kind of statistical theory that leans on concepts like "the graphical Markov property" merges with the kind of analy...

## Applied Statistics Lesson of the Day – The Full Factorial Design

$Applied Statistics Lesson of the Day – The Full Factorial Design$

An experimenter may seek to determine the causal relationships between factors and the response, where .  On first instinct, you may be tempted to conduct separate experiments, each using the completely randomized design with 1 factor.  Often, however, it is possible to conduct 1 experiment with  factors at the same time.  This is better than […]

## Power, power everywhere–(it) may not be what you think! [illustration]

March 5, 2014
By

Statistical power is one of the neatest [i], yet most misunderstood statistical notions [ii].So here’s a visual illustration (written initially for our 6334 seminar), but worth a look by anyone who wants an easy way to attain the will to understand power.(Please see notes below slides.) [i]I was tempted to say power is one of […]

## Remembering Seymour Geisser

March 5, 2014
By

This is the text, minus the nice formatting, of an email from Dennis Cook (my thesis advisor and current director of the U of MN School of Statistics) and Wes Johnson (a U of MN alum, a good friend, a great colleague and a student of Seymour Geisser's)...

## Remembering Seymour Geisser

March 5, 2014
By

This is the text, minus the nice formatting, of an email from Dennis Cook (my thesis advisor and current director of the U of MN School of Statistics) and Wes Johnson (a U of MN alum, a good friend, a great colleague and a student of Seymour Geisser's)...

## Forecasting weekly data

March 4, 2014
By

This is another situation where Fourier terms are useful for handling the seasonality. Not only is the seasonal period rather long, it is non-integer (averaging 365.25/7 = 52.18). So ARIMA and ETS models do not tend to give good results, even with a period of 52 as an approximation. Regression with ARIMA errors The simplest approach is a regression with ARIMA errors. Here is an example using weekly data on…

## Some statistics about the book

March 4, 2014
By

The release date for Zumel, Mount “Practical Data Science with R” is getting close. I thought I would share a few statistics about what goes into this kind of book. “Practical Data Science with R” started formal work in October of 2012. We had always felt the Win-Vector blog represented practice and research for such […] Related posts: On writing a technical book Book Review: Ensemble Methods in Data Mining…

## The Essential Identity between Classical Statistics and Statistical Mechanics

March 4, 2014
By

Many doubt that Statistical Mechanics and Classical Statistics have anything to do with each other. So I’ll lay this out step by step so you can see just how identical they really are. Step 1: The State Space Classical Statistics: If we roll a dice n...

## Nomenclatural abomination

March 4, 2014
By

David Hogg calls conventional statistical notation a “nomenclatural abomination”: The terminology used throughout this document enormously overloads the symbol p(). That is, we are using, in each line of this discussion, the function p() to mean something different; its meaning is set by the letters used in its arguments. That is a nomenclatural abomination. I […]

## Literal vs. rhetorical

March 4, 2014
By

Thomas Basbøll pointed me to a discussion on the orgtheory blog in which Jerry Davis, the editor of a journal of business management argued that it is difficult for academic researchers to communicate with the public because “the public prefers Cheetos to a healthy salad” and when serious papers are discussed on the internet, “everyone […]The post Literal vs. rhetorical appeared first on Statistical Modeling, Causal Inference, and Social Science.

## Nothing to see here

March 4, 2014
By

Some graphics are made to inform, some to amuse, some to delight. But the following scatter plot makes one wonder why why why... What does the designer want to say? *** I saw this chart inside an infographics titled "Where...

## The Star Puzzle

March 4, 2014
By

The Star Puzzle is a puzzle presented on The Math Forum.  I became aware of this problem by noticing the article and solution posted on Quantitative Decisions article section. It asks the question, "How many triangles, quadrilaterals, and irregula...

## Advances in scalable Bayesian computation [day #1]

March 4, 2014
By

This was the first day of our workshop Advances in Scalable Bayesian Computation and it sounded like the “main” theme was probabilistic programming, in tune with my book review posted this morning. Indeed, both Vikash Mansinghka and Frank Wood gave talks about this concept, Vikash detailing the specifics of a new programming language called Venture […]

## Review: Kölner R Meeting 26 Feburary 2014

March 4, 2014
By

Last week's Cologne R user group meeting was all about R and databases. We had three talks from a generic overview on how to connect R to databases, to a specific example with kdb+ and perhaps the future with ArangoDB, a NoSQL database.Connecting R wit...