## Big Data and Marketing

August 1, 2013
By

In Chicago, I spoke about the impact of Big Data on marketing. I was going to summarize the key points here but then I noticed that Chris Rollyson has already done the work, and did a much better job than I can. Here are his copious notes. One question from the audience that I didn't address fully was the use of data in education. I got as far as the…

## Practical Data Science with R, deal of the day Aug 1 2013

August 1, 2013
By

Deal of the Day August 1: Half off my book Practical Data Science with R. Use code dotd0801au at www.manning.com/zumel/ Related posts: Data Science, Machine Learning, and Statistics: what is in a name? Data science project planning Setting expectation...

## JSM2013

July 31, 2013
By

This post is for JSM2013. I will put useful links here and I will update this post during the meeting. Big Data Sessions at JSM Nate Silver addresses assembled statisticians at this year’s JSM Data scientist is just a sexed up word for statistician What I have learned from this meeting (Key words of this […]

## JSM2013

July 31, 2013
By

This post is for JSM2013. I will put useful links here and I will update this post during the meeting. Big Data Sessions at JSM Nate Silver addresses assembled statisticians at this year’s JSM Data scientist is just a sexed up word for statistician What I have learned from this meeting (Key words of this […]

## Woodbury Matrix Inverse Identity

$Woodbury Matrix Inverse Identity$

Application in Conditional Distribution of Multivariate Normal The Sherman-Woodbury-Morrison matrix inverse identity can be regarded as a transform between Schur complements. That is, given one can obtain by using the Woodbury matrix identity and vice versa. Recall the Woodbury Identity: and I recently stumbled across a neat application of this whilst deriving full conditionals for […] The post Woodbury Matrix Inverse Identity appeared first on Lindons Log.

## On the automated scoring of essays and the lessons learned along the way

July 31, 2013
By

We’ve all written essays, primarily while we were in school. The sometimes enjoyable process of researching the topic and composing the paper can take hours and hours of careful work. Given this, people react badly to the notion that their essays may be scored not by a human teacher, but by machine. A piece of software coldly judging the quality of our carefully constructed phrases and metaphors based on unknown…

## The researcher degrees of freedom – recipe tradeoff in data analysis

July 31, 2013
By

An important concept that is only recently gaining the attention it deserves is researcher degrees of freedom. From Simmons et al.: The culprit is a construct we refer to as researcher degrees of freedom. In the course of collecting and … Continue reading →

## Measuring Bias in Published Work

July 31, 2013
By
$Measuring Bias in Published Work$

In a series of previous posts, I’ve spent some time looking at the idea that the review and publication process in political science—and specifically, the requirement that a result must be statistically significant in order to be scientifically notable or publishable—produces a very misleading scientific literature. In short, published studies of some relationship will tend […]

## Response by Jessica Tracy and Alec Beall to my critique of the methods in their paper, “Women Are More Likely to Wear Red or Pink at Peak Fertility”

July 31, 2013
By

Last week I published in Slate a critique of a paper that appeared in the journal Psychological Science. That paper, by Alec Beall and Jessica Tracy, found that women who were at peak fertility were three times more likely to wear red or pink shirts, compared to women at other points in their menstrual cycles. […]The post Response by Jessica Tracy and Alec Beall to my critique of the methods…

## On the automated scoring of essays and the lessons learned along the way

July 31, 2013
By

We've all written essays, primarily while we were in school. The sometimes enjoyable process of researching the topic and composing the paper can take hours and hours of careful work. Given this, people react badly to the notion that their essays may be scored not by a human teacher, but by machine. A piece of software coldly judging the quality of our carefully constructed phrases and metaphors based on unknown…

## Read hundreds of data sets into matrices

July 31, 2013
By

Do you have dozens (or even hundreds) of SAS data sets that you want to read into SAS/IML matrices? In a previous blog post, I showed how to iterate over a series of data sets and analyze each one. Inside the loop, I read each data set into a matrix [...]

## R in Insurance: Presentations are online

July 31, 2013
By

The programme and the presentation files of the first R in Insurance conference have been published on GitHub.Front slides of the conference presentationsAdditionally to the slides many presenters have made their R code available as well:Alexander McNe...

## The Roy causal model?

July 30, 2013
By

A link from Simon Jackman’s blog led me to an article by James Heckman, Hedibert Lopes, and Remi Piatek from 2011, “Treatment effects: A Bayesian perspective.” I was pleasantly surprised to see this, partly because I didn’t know that Heckman was working on Bayesian methods, and partly because the paper explicitly refers to the “potential […]The post The Roy causal model? appeared first on Statistical Modeling, Causal Inference, and Social…

## How divided is the Senate?

July 30, 2013
By

I very seldom pay attention to politics directly, because politics have always seemed a bit circular and cyclical to me. Most of the political news that I take in ends up worming its way into the news sources that I do consume, like the excellent longform.org. Even given my limited intake of political news, one trend that I have noticed lately is the increasing number of references to the Senate…

## More on the Strange American Estimator: GMM, Simulation, and Misspecification

July 30, 2013
By

What's so interesting, then, about GMM? For me there are two key things: its implementation by simulation, and its properties under misspecification.First consider the implementation of GMM by simulation (so-called simulated method of moments, SMM).GMM...

## Stop Loss

July 30, 2013
By

Today I want to share and present an example of the flexible Stop Loss functionality that I added to the Systematic Investor Toolbox. Let’s examine a simple Moving Average Crossover strategy: Buy is triggered once fast moving average crosses above the slow moving average Sell is triggered once fast moving average crosses below the slow […]

## How divided is the Senate?

July 30, 2013
By

I very seldom pay attention to politics directly, because politics have always seemed a bit circular and cyclical to me. Most of the political news that I take in ends up worming its way into the news sources that I do consume, like the excellent longform.org. Even given my limited intake of political news, one trend that I have noticed lately is the increasing number of references to the Senate…

## Programming instrumental music from scratch

July 29, 2013
By

I recently posted about automatically making music. The algorithm that I made pulled out interesting sequences of music from existing songs and remixed them. While this worked reasonably well, it also didn’t have full control over the basics of the music; it wasn’t actually specifying which instruments to use, or what notes to play. Maybe I’m being a control freak, but it would be nice to have complete control over…

## Exploratory Data Analysis: Combining Histograms and Density Plots to Examine the Distribution of the Ozone Pollution Data from New York in R

Introduction This is a follow-up post to my recent introduction of histograms.  Previously, I presented the conceptual foundations of histograms and used a histogram to approximate the distribution of the “Ozone” data from the built-in data set “airquality” in R.  Today, I will examine this distribution in more detail by overlaying the histogram with parametric […]

## Programming instrumental music from scratch

July 29, 2013
By

I recently posted about automatically making music. The algorithm that I made pulled out interesting sequences of music from existing songs and remixed them. While this worked reasonably well, it also didn't have full control over the basics of the music; it wasn't actually specifying which instruments to use, or what notes to play. Maybe I'm being a control freak, but it would be nice to have complete control over…

## Postdocs in probabilistic modeling! With David Blei! And Stan!

July 29, 2013
By

David Blei writes: I have two postdoc openings for basic research in probabilistic modeling. The thrusts are (a) scalable inference and (b) model checking. We will be developing new methods and implementing them in probabilistic programming systems. I am open to applicants interested in many kinds of applications and from any field. “Scalable inference” means […]The post Postdocs in probabilistic modeling! With David Blei! And Stan! appeared first on Statistical…

## Upcoming talk in Chicago

July 29, 2013
By

I'm busy preparing for my talk tomorrow in Chicago. The topic is Big Data and Marketing, a topic that is central to Numbersense. The event is free, and you can register here. It's hosted by BIGfrontier's Steve Lundin.

## Read data sets that are specified by an array of names

July 29, 2013
By

One of my favorite features of SAS/IML 12.1 (released with 9.3m2) is that the USE and CLOSE statements support reading data set names that are specified in a SAS/IML matrix. The IMLPlus language in SAS/IML Studio has supported this syntax since the early 2000s, so I am pleased that this [...]