Machine Learning Lesson of the Day – Memory-Based Learning

Machine Learning Lesson of the Day – Memory-Based Learning

Memory-based learning (also called instance-based learning) is a type of non-parametric algorithm that compares new test data with training data in order to solve the given machine learning problem.  Such algorithms search for the training data that are most similar to the test data and make predictions based on these similarities.  (From what I have learned, memory-based learning […]

Read more »

Highlighting the web

March 6, 2014
By
Highlighting the web

Users of my new online forecasting book have asked about having a facility for personal highlighting of selected sections, as students often do with print books. We have plans to make this a built-in part of the platform, but for now it is possible to do it using a simple browser extension. This approach allows any website to be highlighted, so is even more useful than if we only had…

Read more »

"Unifying the Counterfactual and Graphical Approaches to Causality" (Tomorrow at the Statistics Seminar)

March 5, 2014
By

Attention conservation notice: Late notice of an academic talk in Pittsburgh. Only of interest if you care about the places where the kind of statistical theory that leans on concepts like "the graphical Markov property" merges with the kind of analy...

Read more »

Exploring Ball Locations and Player Behaviors in Basketball

March 5, 2014
By
Exploring Ball Locations and Player Behaviors in Basketball

"Game on!" by Fathom Information Design is an exploratory visualization prototype that allows users to parse through a basketball game's data, to investigate the behaviors and patterns in terms of the statistics and locations of players. Based on a ...

Read more »

Fun with Mike Steele Quotes and Rants

March 5, 2014
By
Fun with Mike Steele Quotes and Rants

Check out the web page of my Penn Statistics buddy, Mike Steele, probabilist, statistician and mathematician extraordinaire. (And that's just his day job. At night he battles the really hard stuff -- financial markets.) Among other things, you'll ...

Read more »

PLoS One, I have an idea for what to do with all your profits: buy hard drives

March 5, 2014
By

I've been closely following the fallout from PLoS One's new policy for data sharing. The policy says, basically, that if you publish a paper, all data and code to go with that paper should be made publicly available at the … Continue reading →

Read more »

Plagiarism, Arizona style

March 5, 2014
By
Plagiarism, Arizona style

Last month a history professor sent me a note regarding plagiarism at Arizona State University: Matthew Whitaker, who had received an expedited promotion to full professor and was made Director of a new Center for the Study of Race and Democracy by Provost Elizabeth Capaldi and President Michael Crow, was charged by most of the […]The post Plagiarism, Arizona style appeared first on Statistical Modeling, Causal Inference, and Social Science.

Read more »

Advances in scalable Bayesian computation [day #2]

March 5, 2014
By
Advances in scalable Bayesian computation [day #2]

And here is the second day of our workshop Advances in Scalable Bayesian Computation gone! This time, it sounded like the “main” theme was about brains… In fact, Simon Barthelmé‘s research originated from neurosciences, while Dawn Woodard dissected a brain (via MRI) during her talk! (Note that the BIRS website currently posts Simon’s video as […]

Read more »

Optimizing a function that evaluates an integral

March 5, 2014
By
Optimizing a function that evaluates an integral

SAS programmers use the SAS/IML language for many different tasks. One important task is computing an integral. Another is optimizing functions, such as maximizing a likelihood function to find parameters that best fit a set of data. Last week I saw an interesting problem that combines these two important tasks. [...]

Read more »

"Unifying the Counterfactual and Graphical Approaches to Causality" (Tomorrow at the Statistics Seminar)

March 5, 2014
By

Attention conservation notice: Late notice of an academic talk in Pittsburgh. Only of interest if you care about the places where the kind of statistical theory that leans on concepts like "the graphical Markov property" merges with the kind of analy...

Read more »

Applied Statistics Lesson of the Day – The Full Factorial Design

Applied Statistics Lesson of the Day – The Full Factorial Design

An experimenter may seek to determine the causal relationships between factors and the response, where .  On first instinct, you may be tempted to conduct separate experiments, each using the completely randomized design with 1 factor.  Often, however, it is possible to conduct 1 experiment with  factors at the same time.  This is better than […]

Read more »

Power, power everywhere–(it) may not be what you think! [illustration]

March 5, 2014
By
Power, power everywhere–(it) may not be what you think! [illustration]

Statistical power is one of the neatest [i], yet most misunderstood statistical notions [ii].So here’s a visual illustration (written initially for our 6334 seminar), but worth a look by anyone who wants an easy way to attain the will to understand power.(Please see notes below slides.) [i]I was tempted to say power is one of […]

Read more »

Remembering Seymour Geisser

March 5, 2014
By
Remembering Seymour Geisser

This is the text, minus the nice formatting, of an email from Dennis Cook (my thesis advisor and current director of the U of MN School of Statistics) and Wes Johnson (a U of MN alum, a good friend, a great colleague and a student of Seymour Geisser's)...

Read more »

Remembering Seymour Geisser

March 5, 2014
By
Remembering Seymour Geisser

This is the text, minus the nice formatting, of an email from Dennis Cook (my thesis advisor and current director of the U of MN School of Statistics) and Wes Johnson (a U of MN alum, a good friend, a great colleague and a student of Seymour Geisser's)...

Read more »

Forecasting weekly data

March 4, 2014
By
Forecasting weekly data

This is another situation where Fourier terms are useful for handling the seasonality. Not only is the seasonal period rather long, it is non-integer (averaging 365.25/7 = 52.18). So ARIMA and ETS models do not tend to give good results, even with a period of 52 as an approximation. Regression with ARIMA errors The simplest approach is a regression with ARIMA errors. Here is an example using weekly data on…

Read more »

Some statistics about the book

March 4, 2014
By
Some statistics about the book

The release date for Zumel, Mount “Practical Data Science with R” is getting close. I thought I would share a few statistics about what goes into this kind of book. “Practical Data Science with R” started formal work in October of 2012. We had always felt the Win-Vector blog represented practice and research for such […] Related posts: On writing a technical book Book Review: Ensemble Methods in Data Mining…

Read more »

The Essential Identity between Classical Statistics and Statistical Mechanics

March 4, 2014
By

Many doubt that Statistical Mechanics and Classical Statistics have anything to do with each other. So I’ll lay this out step by step so you can see just how identical they really are. Step 1: The State Space Classical Statistics: If we roll a dice n...

Read more »

Nomenclatural abomination

March 4, 2014
By

David Hogg calls conventional statistical notation a “nomenclatural abomination”: The terminology used throughout this document enormously overloads the symbol p(). That is, we are using, in each line of this discussion, the function p() to mean something different; its meaning is set by the letters used in its arguments. That is a nomenclatural abomination. I […]

Read more »

Literal vs. rhetorical

March 4, 2014
By
Literal vs. rhetorical

Thomas Basbøll pointed me to a discussion on the orgtheory blog in which Jerry Davis, the editor of a journal of business management argued that it is difficult for academic researchers to communicate with the public because “the public prefers Cheetos to a healthy salad” and when serious papers are discussed on the internet, “everyone […]The post Literal vs. rhetorical appeared first on Statistical Modeling, Causal Inference, and Social Science.

Read more »

Nothing to see here

March 4, 2014
By
Nothing to see here

Some graphics are made to inform, some to amuse, some to delight. But the following scatter plot makes one wonder why why why... What does the designer want to say? *** I saw this chart inside an infographics titled "Where...

Read more »

The Star Puzzle

March 4, 2014
By
The Star Puzzle

The Star Puzzle is a puzzle presented on The Math Forum.  I became aware of this problem by noticing the article and solution posted on Quantitative Decisions article section. It asks the question, "How many triangles, quadrilaterals, and irregula...

Read more »

Advances in scalable Bayesian computation [day #1]

March 4, 2014
By
Advances in scalable Bayesian computation [day #1]

This was the first day of our workshop Advances in Scalable Bayesian Computation and it sounded like the “main” theme was probabilistic programming, in tune with my book review posted this morning. Indeed, both Vikash Mansinghka and Frank Wood gave talks about this concept, Vikash detailing the specifics of a new programming language called Venture […]

Read more »

Review: Kölner R Meeting 26 Feburary 2014

March 4, 2014
By
Review: Kölner R Meeting 26 Feburary 2014

Last week's Cologne R user group meeting was all about R and databases. We had three talks from a generic overview on how to connect R to databases, to a specific example with kdb+ and perhaps the future with ArangoDB, a NoSQL database.Connecting R wit...

Read more »


Subscribe

Email:

  Subscribe