How do you know if your model is going to work?

September 22, 2015
By
How do you know if your model is going to work?

Authors: John Mount (more articles) and Nina Zumel (more articles). Our four part article series collected into one piece. Part 1: The problem Part 2: In-training set measures Part 3: Out of sample procedures Part 4: Cross-validation techniques “Essentially, all models are wrong, but some are useful.” George Box Here’s a caricature of a data … Continue reading How do you know if your model is going to work?

Read more »

Parsing a large amount of characters into a POSIXct object

September 22, 2015
By

When trying to parse a large amount of datetime characters into POSXIct objects, it struck me that strftime and as.POSIXct where actually quite slow. When using the parsing functions from lubridate, these where a lot faster. The following benchmark shows… See more ›

Read more »

“I do not agree with the view that being convinced an effect is real relieves a researcher from statistically testing it.”

September 22, 2015
By

Florian Wickelmaier writes: I’m writing to tell you about my experiences with another instance of “the difference between significant and not significant.” In a lab course, I came across a paper by Costa et al. [Cognition 130 (2) (2014) 236-254 (http://dx.doi.org/10.1016/j.cognition.2013.11.010). In several experiments, they compare the effects in two two-by-two tables by comparing the […] The post “I do not agree with the view that being convinced an effect…

Read more »

Round-up of up-coming events

September 22, 2015
By

I finally got around to updating the event listings. In the coming months, I will be giving a number of talks on data visualization. Next week, I will be speaking to the Data Visualization New York meetup, ably organized by...

Read more »

Propublica is on a roll

September 22, 2015
By

Really enjoying Propublica pieces lately. There are several articles about topics of great interest to me, and those who read my books will be familiar with these themes. My favorite is an article that speaks a truth about data projects -- much as we sweat about data collection, data integrity and statistical models, the true challenge is in persuading the rest of the world to adopt our endproducts. The title…

Read more »

What’s the Difference Between Data Science and Statistics?

September 22, 2015
By
What’s the Difference Between Data  Science and Statistics?

From: https://www.udemy.com/data-science/#articleNot long ago, the term "data science" meant nothing to most people -- even the those who worked in data. A likely response to the term was: "Isn't that just statistics?".These days, data science is ...

Read more »

Upcoming talks in California

September 22, 2015
By
Upcoming talks in California

I’m back in California for the next couple of weeks, and will give the following talk at Stanford and UC-Davis. Optimal forecast reconciliation for big time series data Time series can often be naturally disaggregated in a hierarchical or grouped structure. For example, a manufacturing company can disaggregate total demand for their products by country of […]

Read more »

Notes from the Kölner R meeting, 18 September 2015

September 22, 2015
By
Notes from the Kölner R meeting, 18 September 2015

Last Friday the Cologne R user group came together for the 15th time. Since its inception over three years ago the group evolved from a small gathering in a pub into an active data science community, covering wider topics than just R. Still, R is the l...

Read more »

How do you know if your model is going to work? Part 4: Cross-validation techniques

September 21, 2015
By
How do you know if your model is going to work? Part 4: Cross-validation techniques

Authors: John Mount (more articles) and Nina Zumel (more articles). In this article we conclude our four part series on basic model testing. When fitting and selecting models in a data science project, how do you know that your final model is good? And how sure are you that it’s better than the models that … Continue reading How do you know if your model is going to work? Part…

Read more »

Try This Problem

September 21, 2015
By
Try This Problem

Here's a little exercise for you to work on:We know from the Gauss-Markhov Theorem that within the class of linear and unbiased estimators, the OLS estimator is most efficient. Because it is unbiased, it therefore has the smallest possible Mean Squared...

Read more »

Have weak data. But need to make decision. What to do?

September 21, 2015
By

Vlad Malik writes: I just re-read your article “Of Beauty, Sex and Power”. In my line of work (online analytics), low power is a recurring, existential problem. Do we act on this data or not? If not, why are we even in this business? That’s our daily struggle. Low power seems to create a sort […] The post Have weak data. But need to make decision. What to do? appeared…

Read more »

Statbusters: What the experiments on rigging elections via Google tell us

September 21, 2015
By

For this week's Statbusters (link), we opine on that astounding report from a few weeks ago about how Google could manipulate the next elections by biasing search results. We walk you through our vetting process, starting with face validity ("the magnitude of the reported effect is too large to be believed!"). The crux of the article is about the experimental design. You start with a group of people who have…

Read more »

Making stuff up to get published in NYTimes

September 21, 2015
By
Making stuff up to get published in NYTimes

Reader/friend Tom B. knows about my interest in grade "deflation" policies, and proceeds to ruin my breakfast by sending me a link to this ludicrous "letter to the editor" by a high-school counsellor (link). It starts with a made-up assertion: As the new academic term starts, I’m rooting for this to be the year when students start getting the grades they rightfully earn without high schools and colleges manipulating numbers…

Read more »

On deck this week

September 21, 2015
By

Mon: Have weak data. But need to make decision. What to do? Tues: “I do not agree with the view that being convinced an effect is real relieves a researcher from statistically testing it.” Wed: Optimistic or pessimistic priors Thurs: Draw your own graph! Fri: Low-power pose Sat: Annals of Spam Sun: The Final Bug, […] The post On deck this week appeared first on Statistical Modeling, Causal Inference, and…

Read more »

Nice title but dubious message

September 21, 2015
By
Nice title but dubious message

I like to uaeuse declarative titles for charts. This chart below, found in an investment magazine published by Charles Schwab, wants to tell us that emerging markets "perform differently." That is a nice concise message. Now, what does the chart...

Read more »

International Symposium on Forecasting: Spain 2016

September 21, 2015
By
International Symposium on Forecasting: Spain 2016

June 19-22, 2016 Santander, Spain – Palace of La Magdalena The International Symposium on Forecasting (ISF) is the premier forecasting conference, attracting the world’s leading forecasting researchers, practitioners, and students. Through a combination of keynote speaker presentations, academic sessions, workshops, and social programs, the ISF provides many excellent opportunities for networking, learning, and fun. Speakers: […]

Read more »

Excluding variables: Read all but one variable into a matrix

September 21, 2015
By
Excluding variables: Read all but one variable into a matrix

Dear Rick, I have a data set with 1,001 numerical variables. One variable is the response, the others are explanatory variable. How can I read the 1,000 explanatory variables into an IML matrix without typing every name? That's a good question. You need to be able to perform two sub-tasks: […] The post Excluding variables: Read all but one variable into a matrix appeared first on The DO Loop.

Read more »

Erdos bio for kids

September 20, 2015
By

Chris Gittins recommends the book, “The Boy Who Loved Math: The Improbable Life of Paul Erdos,” by Deborah Heiligman. Gittins reports: We read it with our soon-to-be-first-grader this evening. She liked it and so did we. I knew a little about Erdos but the book probably quadrupled my knowledge. Thought it might be of interest […] The post Erdos bio for kids appeared first on Statistical Modeling, Causal Inference, and…

Read more »

Recipe for Computing and Sampling Multivariate Kernel Density Estimates (and Plotting Contours for 2D KDEs).

September 19, 2015
By
Recipe for Computing and Sampling Multivariate Kernel Density Estimates (and Plotting Contours for 2D KDEs).

The code snippet below creates the above graphic: ## radially symmetric kernel (Gussian kernel) RadSym

Read more »

“The frequentist case against the significance test”

September 19, 2015
By

Richard Morey writes: I suspect that like me, many people didn’t get a whole lot of detail about Neyman’s objections to the significance test in their statistical education besides “Neyman thought power is important”. Given the recent debate about significance testing, I have gone back to Neyman’s papers and tried to summarize, for the modern […] The post “The frequentist case against the significance test” appeared first on Statistical Modeling,…

Read more »

Predicting Titanic deaths on Kaggle VI: Stan

September 19, 2015
By

It is a bit a contradiction. Kaggle provides competitions on data science, while Stan is clearly part of the (Bayesian) statistics. Yet after using random forests, boosting and bagging, I also think this problem has a suitable size for Stan, which I un...

Read more »

Stan users meetup in Cambridge, MA on 9/22

September 19, 2015
By

There’s a new Stan users meetup group in Boston / Camberville. The first meeting will be on Tuesday, 9/22, at 6 pm in Cambridge. If you’re a seasoned Stan user, just starting out with Stan, or hearing about Stan for the first time, feel free to join in. At least a couple of the Stan […] The post Stan users meetup in Cambridge, MA on 9/22 appeared first on Statistical…

Read more »

The Leek group guide to writing your first paper

September 18, 2015
By

The @jtleek guide to writing your first academic paper https://t.co/APLrEXAS46 — Stephen Turner (@genetics_blog) September 17, 2015 I have written guides on reviewing papers, sharing data,  and writing R packages. One thing I haven't touched on until now has been writing papers. Certainly for me, and I think for a lot of students, the hardest

Read more »


Subscribe

Email:

  Subscribe