What I said about data science at Princeton Reunions

June 3, 2015
By
What I said about data science at Princeton Reunions

Here was how I spent last weekend: At college reunions in beautiful Princeton on a glorious sunny day. I also spoke about data science at a Faculty-Alumni panel titled "Science Under Attack!". Here is what I said: In the past five to 10 years, there has been an explosion of interest in using data in business decision-making. What happens when business executives learn that the data do not support their…

Read more »

The mystery of the density curve that was too short

June 3, 2015
By
The mystery of the density curve that was too short

I was reading a statistics book when I encountered a histogram that caught my eye. The histogram looked similar to the one at the left. It contained a normal density estimate overlaid on a histogram, but the height of the density curve seemed too short when compared to the heights […] The post The mystery of the density curve that was too short appeared first on The DO Loop.

Read more »

R vs Autobox vs ForecastPro vs …

June 3, 2015
By
R vs Autobox vs ForecastPro vs …

Every now and then a commercial software vendor makes claims on social media about how their software is so much better than the forecast package for R, but no details are provided. There are lots of reasons why you might select a particular software solution, and R isn’t for everyone. But anyone claiming superiority should […]

Read more »

June Reading List

June 2, 2015
By
June Reading List

Andrews, I. and T. B. Armstrong, 2015. Unbiased instrumental variables estimation under known first-stage sign. Cowles Foundation Discussion Paper No. 1984R, Yale University,Bajari, P., D. Nekipelov, S. P. Ryan, and M. Yang. 2015. Demand estimation wit...

Read more »

Air Pollution (PM10 and PM2.5) in Different Cities using Interactive Charts

June 2, 2015
By

Gardiner Harris, who is a South Asia correspondent of the New York Times, shared a personal story of his son’s breathing troubles in New Delhi, India, in a recent dispatch titled Holding Your Breath in India. In this post, I use data from the World H...

Read more »

Statistical Models with a Point of View: First vs. Third Person

June 2, 2015
By
Statistical Models with a Point of View: First vs. Third Person

Marketing data can be collected in the first or third person, and we require different statistical models for each point of view.Netflix encourages you to adopt a third-person perspective when it surveys your taste preferences by asking how often you w...

Read more »

Cross-validation != magic

June 2, 2015
By

In a post entitled “A subtle way to over-fit,” John Cook writes: If you train a model on a set of data, it should fit that data well. The hope, however, is that it will fit a new set of data well. So in machine learning and statistics, people split their data into two parts. […] The post Cross-validation != magic appeared first on Statistical Modeling, Causal Inference, and Social…

Read more »

Back from R/Finance in Chicago

June 2, 2015
By
Back from R/Finance in Chicago

I had a great time at the R/Finance conference in Chicago last Friday/Saturday. Some brief takeaways for me were:From Emanuel Derman's talk: It is is important to distinguish between theories and models. Theories live in an abstract world and for a giv...

Read more »

Report: EuroVis 2015

June 2, 2015
By

I attended EuroVis 2015 last week in Cagliari, Sardinia. This is the second-most important conference in the academic visualization world, and there were plenty of good sessions to choose from (full and short papers, state-of-the-art reports, and industry sessions). As usual, this is a highly subjective and incomplete report. I did not see anywhere near all … Continue reading Report: EuroVis 2015

Read more »

My final post on this Tony Blair thing

June 1, 2015
By
My final post on this Tony Blair thing

Gur Huberman writes on the recent fraud in experiments in polisci: This comment is a reaction to the little of the discussion which I [Gur] followed, mostly in the NYTimes. What I didn’t see anybody say is that the system actually worked. First, there’s a peer-reviewed report in Science. Then other people deem the results […] The post My final post on this Tony Blair thing appeared first on Statistical…

Read more »

How to tell if your graphic is underpowered?

June 1, 2015
By
How to tell if your graphic is underpowered?

Some time ago, this chart showed up in a NYT Magazine (it's about sex): In this composition, the visual element (the circles) has no utility. A self-sufficiency test makes this point clear. All the data (four numbers) are printed on...

Read more »

All the things that don’t make it into the news

June 1, 2015
By
All the things that don’t make it into the news

I got buzzed last week by a couple of NY journalists about this recent political science fraud case. My responses were pretty undramatic so I don’t think they made their way into the news stories. Which is fine. As a reader of the news, I like to see excitement so it’s fair enough that reporters […] The post All the things that don’t make it into the news appeared first…

Read more »

On deck this week

June 1, 2015
By

Mon: All the things that don’t make it into the news Tues: Cross-validation != magic Wed: Of buggy whips and moral hazards; or, Sympathy for the Aapor Thurs: Low-power pose Fri: Should you get the blood transfusion? Sat: “History is the prediction of the present” Sun: What to do to train to apply statistical models […] The post On deck this week appeared first on Statistical Modeling, Causal Inference, and…

Read more »

Interview with Chris Wiggins, chief data scientist at the New York Times

June 1, 2015
By

Editor's note: We are trying something a little new here and doing an interview with Google Hangouts on Air. The interview will be live at 11:30am EST. I have some questions lined up for Chris, but if you have others you'd like to ask, you can tweet them @simplystats and I'll see if I can

Read more »

My talk @ the London Machine Learning Meetup

June 1, 2015
By
My talk @ the London Machine Learning Meetup

This Wednesday I've been invited to give a talk at the London Machine Learning Meetup $-$ I don't have a lot of experience of these meetings but I'm told that the audience is typically industry practitioners and some academics, ranging from novice...

Read more »

SAS/IML functions that operate on columns of a matrix

June 1, 2015
By
SAS/IML functions that operate on columns of a matrix

A SAS programmer asked for a list of SAS/IML functions that operate on the columns of an n x p matrix and return a 1 x p row vector of results. The functions that behave this way tend to compute univariate descriptive statistics such as the mean, median, standard deviation, and quantiles. The following […] The post SAS/IML functions that operate on columns of a matrix appeared first on The DO Loop.

Read more »

Connecting Python to Postgres . . . An Arduous Task

May 31, 2015
By

I've decided that I want to learn a bit more about Python.  So, I downloaded Python onto my Mac running OS 10.8.x and decided to connect it to my local Postgres database.  Several days later, I have succeeded! Along the way, I may have learne...

Read more »

The greatest impediment to research progress is not impediments to research progress, it is scientists reading about impediments to research progress

May 31, 2015
By

My short answer is that I think twitter is destructive of clear communication. Now I’ll give the question, and I’ll give my long answer. Here’s the question provided by a reader: Just wondering what you thought of Brian Nosek’s recent comment on twitter, “The biggest impediment to research progress is not fraud, it is all […] The post The greatest impediment to research progress is not impediments to research progress,…

Read more »

500

May 31, 2015
By
500

 Blogstats celebrates a jubilee This blog posts since May 2006. In the nine years, we published 500 posts and got 206 000 views. Every week a blogstats post !   Filed under: 071 Hint, 08 Events

Read more »

Paper Helicopter Experiment, part III

May 31, 2015
By
Paper Helicopter Experiment, part III

As final part of my paper helicopter experiment analysis (part I, part II) I do a reanalysis for one more data set. In 2002 Erik Erhardt and Hantao Mai did an extensive experiment, see The Search for the Optimal Paper Helicopter. They did a number of s...

Read more »

A new R package for detecting unusual time series

May 31, 2015
By
A new R package for detecting unusual time series

The anomalous package provides some tools to detect unusual time series in a large collection of time series. This is joint work with Earo Wang (an honours student at Monash) and Nikolay Laptev (from Yahoo Labs). Yahoo is interested in detecting unusual patterns in server metrics. The package is based on this paper with Earo […]

Read more »

Lessons learned in high-performance R

Lessons learned in high-performance R

On this blog, I've had a long running investigation/demonstration of how to make a "embarrassingly-parallel" but computationally intractable (on commodity hardware, at least) R problem more performant by using parallel computation and Rcpp. The example problem is to find the… Continue reading →

Read more »

Cracking Safe Cracker with R

May 30, 2015
By
Cracking Safe Cracker with R

My wife got me a Safe Cracker 40 puzzle a while back. I believe I misplaced the solution some time back. The company, Creative Crafthouse, stands behind their products. They had amazing customer service and promptly supplied me with a … Continue reading →

Read more »


Subscribe

Email:

  Subscribe