Another Benefit of Publicly Version-Controlled Research

June 4, 2014
By

I've been thinking quite a bit lately about why and how political scientists should publicly version control their research projects. By research projects, I mean data, manuscript, and code. And by publicly version control, I mean use Git to version-control and post a public GitHub repository, from the beginning of the project, so that other […]

Read more »

Data!

June 4, 2014
By
Data!

The animated gif below (>>link) counts data transferred every second over the internet (sources?). Another (static) infographic by Cisco estimates that …Continue reading →

Read more »

Determining chemical concentration with standard addition: An application of linear regression in JMP – A Guest Blog Post for the JMP Blog

Determining chemical concentration with standard addition: An application of linear regression in JMP – A Guest Blog Post for the JMP Blog

I am very excited to announce that I have been invited by JMP to be a guest blogger for its official blog!  My thanks to Arati Mejdal, Global Social Media Manager for the JMP Division of SAS, for welcoming me into the JMP blogging community with so much support and encouragement, and I am pleased to […]

Read more »

All the Assumptions That Are My Life

June 4, 2014
By

Statisticians take tours in other people’s data. All methods of statistical inference rest on statistical models. Experiments typically have problems with compliance, measurement error, generalizability to the real world, and representativeness of the sample. Surveys typically have problems of undercoverage, nonresponse, and measurement error. Real surveys are done to learn about the general population. But […] The post All the Assumptions That Are My Life appeared first on Statistical Modeling,…

Read more »

Yet another power-law tail, explained

June 4, 2014
By
Yet another power-law tail, explained

At the next Boston Python user group meeting, participants will present their solutions to a series of puzzles, posted here.  One of the puzzles lends itself to a solution that uses Python iterators, which is something I was planning to get more f...

Read more »

Simulate lognormal data with specified mean and variance

June 4, 2014
By
Simulate lognormal data with specified mean and variance

In my book Simulating Data with SAS, I specify how to generate lognormal data with a shape and scale parameter. The method is simple: you use the RAND function to generate X ~ N(μ, σ), then compute Y = exp(X). The random variable Y is lognormally distributed with parameters μ […]

Read more »

Machine Learning and Applied Statistics Lesson of the Day – How to Construct Receiver Operating Characteristic Curves

Machine Learning and Applied Statistics Lesson of the Day – How to Construct Receiver Operating Characteristic Curves

A receiver operating characteristic (ROC) curve is a 2-dimensional plot of the (the true positive rate) versus (1 minus the true negative rate) of a binary classifier while varying its discrimination threshold.  In statistics and machine learning, a basic and popular tool for binary classification is logistic regression, and an ROC curve is a useful way to assess the predictive accuracy […]

Read more »

Did you buy laundry detergent on their most recent trip to the store? Also comments on scientific publication and yet another suggestion to do a study that allows within-person comparisons

June 3, 2014
By

Please answer the above question before reading on . . . I’m curious after reading Leif Nelson’s report that, based on research with Minah Jung, approximately 42% of the people they surveyed said they bought laundry detergent on their most recent trip to the store. I’m stunned that the number is so high. 42%??? That’s […] The post Did you buy laundry detergent on their most recent trip to the…

Read more »

The pleasure of walking

June 3, 2014
By
The pleasure of walking

The proverb goes: walk before you run. My latest contribution to Harvard Business Review (link) makes the point that many websites can improve their user experience by focusing on simple personalization measures, like showing me my shirt size. Recommendation engines based on machine-learning algorithms still have ways to go. I ran across a number of obstacles in my recent travel, which again highlights the value of getting the basics down.…

Read more »

Post-Piketty Lessons

June 3, 2014
By

The latest crisis in data analysis comes to us (once again) from the field of Economics. Thomas Piketty, a French economist recently published a book titled Capital in the 21st Century that has been a best-seller. I have not read … Continue reading →

Read more »

Video Tutorial – Useful Relationships Between Any Pair of h(t), f(t) and S(t)

Video Tutorial – Useful Relationships Between Any Pair of h(t), f(t) and S(t)

I first started my video tutorial series on survival analysis by defining the hazard function.  I then explained how this definition leads to the elegant relationship of . In my new video, I derive 6 useful mathematical relationships that exist between any 2 of the 3 quantities in the above equation.  Each relationship allows one quantity […]

Read more »

Skimming statistics papers for the ideas (instead of the complete procedures)

June 2, 2014
By
Skimming statistics papers for the ideas (instead of the complete procedures)

Been reading a lot of Gelman, Carlin, Stern, Dunson, Vehtari, Rubin “Bayesian Data Analysis” 3rd edition lately. Overall in the Bayesian framework some ideas (such as regularization, and imputation) are way easier to justify (though calculating some seemingly basic quantities becomes tedious). A big advantage (and weakness) of this formulation is statistics has a much […] Related posts: Checking claims in published statistics papers Data Science, Machine Learning, and Statistics:…

Read more »

How does Practical Data Science with R stand out?

June 2, 2014
By
How does Practical Data Science with R stand out?

There are a lot of good books on statistics, machine learning, analytics, and R. So it is valid to ask: how does Practical Data Science with R stand out? Why should a data scientist or an aspiring data scientist buy it? We admit, it isn’t the only book we own. Some relevant books from the […] Related posts: A bit of the agenda of Practical Data Science with R Data…

Read more »

Swallowing the Bitter Pill: England, the Premier League and the World Cup

June 2, 2014
By
Swallowing the Bitter Pill: England, the Premier League and the World Cup

Discussions abound about England’s chances at the 2014 edition of the World Cup. For a country which has produced elite football players such as Gary Neville, John Terry and Paul Scholes (and yes, David Beckham), there isn’t a lot of optimism ...

Read more »

Collaborative lesson development with GitHub

June 2, 2014
By
Collaborative lesson development with GitHub

If you're doing any kind of scientific computing and not using version control, you're doing it wrong. The git version control system and GitHub, a web-based service for hosting and collaborating on git-controlled projects, have both become wildly popu...

Read more »

Why we hate stepwise regression

June 2, 2014
By

Haynes Goddard writes: I have been slowly working my way through the grad program in stats here, and the latest course was a biostats course on categorical and survival analysis. I noticed in the semi-parametric and parametric material (Wang and Lee is the text) that they use stepwise regression a lot. I learned in econometrics […] The post Why we hate stepwise regression appeared first on Statistical Modeling, Causal Inference,…

Read more »

On deck this week

June 2, 2014
By

Mon: Why we hate stepwise regression Tues: Did you buy laundry detergent on their most recent trip to the store? Also comments on scientific publication and yet another suggestion to do a study that allows within-person comparisons Wed: All the Assumptions That Are My Life Thurs: Identifying pathways for managing multiple disturbances to limit plant […] The post On deck this week appeared first on Statistical Modeling, Causal Inference, and…

Read more »

Missing data, mysterious order, reverse causation wipes out a simple theory

June 2, 2014
By
Missing data, mysterious order, reverse causation wipes out a simple theory

New York Times columnist Floyd Norris published a set of charts purportedly to show that the housing market in the U.S. is on the mend. Not so quick Floyd. His theory - originating from an economist at Hanley Wood, a...

Read more »

Specify formats when you write vectors to a data set

June 2, 2014
By
Specify formats when you write vectors to a data set

Sometimes you have data in SAS/IML vectors that you need to write to a SAS data set. By default, no formats are associated with the variables that you create from SAS/IML vectors. However, some variables (notably dates, times, and datetimes) should have formats associated with the data values. You can […]

Read more »

Aerial Views

June 2, 2014
By
Aerial Views

Depict reality with photograps has a long tradition: On Wikimedia Commons the Swiss National Library published a series of old and …Continue reading →

Read more »

Autocorrelation in project Tycho’s measles data

June 1, 2014
By
Autocorrelation in project Tycho’s measles data

Project Tycho includes data from all weekly notifiable disease reports for the United States dating back to 1888. These data are freely available to anybody interested.I have looked at Ptoject Tycho's measles data before, general look, incidence,...

Read more »

Jessica Tracy and Alec Beall (authors of the fertile-women-wear-pink study) comment on our Garden of Forking Paths paper, and I comment on their comments

May 31, 2014
By
Jessica Tracy and Alec Beall (authors of the fertile-women-wear-pink study) comment on our Garden of Forking Paths paper, and I comment on their comments

Jessica Tracy and Alec Beall, authors of that paper that claimed that women at peak fertility were more likely to wear red or pink shirts (see further discussion here and here), and then a later paper that claimed that this happens in some weather but not others, just informed me that they have posted a […] The post Jessica Tracy and Alec Beall (authors of the fertile-women-wear-pink study) comment on…

Read more »

Loading IP Test Data Into Postgres

May 31, 2014
By
Loading IP Test Data Into Postgres

Recently, I was trolling around the internet looking for some IP address data to play with.  Fortunately, I stumbled across MaxMind's Geolite Database, which is available for free.    All I have to do is include this notice:This product ...

Read more »


Subscribe

Email:

  Subscribe