Whale charts – Visualising customer profitability

January 7, 2014
By
Whale charts – Visualising customer profitability

The Christmas and New Year's break is over, yet there is still time to return unwanted presents. Return to Santa was the title of an article in the Economist that highlighted the impact on online retailers, as return rates can be alarmingly high. ...

Read more »

Machine Learning Lesson of the Day: Clustering, Density Estimation and Dimensionality Reduction

Machine Learning Lesson of the Day: Clustering, Density Estimation and Dimensionality Reduction

I struggle to categorize unsupervised learning.  It is not an easily defined field, and it is also hard to find generalizations of techniques that are exhaustive and mutually exclusive. Nonetheless, here are some categories of unsupervised learning that cover many of its commonly used techniques.  I learned this categorization from Mathematical Monk, who posted a […]

Read more »

Applied Statistics Lesson of the Day: Sample Size and Replication in Experimental Design

Applied Statistics Lesson of the Day: Sample Size and Replication in Experimental Design

The goal of an experiment is to determine whether or not there is a cause-and-effect relationship between the factor and the response the strength of the causal relationship, should such a relationship exist. To answer these questions, the response variable is measured in both the control group and the experimental group.  If there is a […]

Read more »

Reinforcement Learning in R: Markov Decision Process (MDP) and Value Iteration

January 7, 2014
By
Reinforcement Learning in R: Markov Decision Process (MDP) and Value Iteration

How can we find the best long-term plan? In the last post, we looked at the idea of dynamic programming,...

Read more »

An Introduction to Statistical Learning with Applications in R

January 7, 2014
By
An Introduction to Statistical Learning with Applications in R

Statistical learning theory offers an opportunity for those of us trained as social science methodologists to look at everything we have learned from a different perspective. For example, missing value imputation can be seen as matrix completion and re...

Read more »

You Are What You Write

January 7, 2014
By

To my wonderful students: These paragraphs are a revision of advice recently given to a student writer.  Writing is a craft we all must master. And we all will. You are young: enthusiasm and energy come through in your writing: keep that and add to it...

Read more »

Spam names

January 6, 2014
By
Spam names

There was this thing going around awhile ago, the “porn star name,” which you create by taking the name of your childhood pet, followed by the name of the street where you grew up (for example, Blitz Clifton). But recently I’ve been thinking about spam names. Just in the last two days, I’ve received emails […]The post Spam names appeared first on Statistical Modeling, Causal Inference, and Social Science.

Read more »

Visualizing movements of people

January 6, 2014
By
Visualizing movements of people

Long-time reader Daniel L. sends in this chart illustrating a large data set of intra-state migration flows in the U.S. The original chart is at Vizynary by way of Daily Kos. *** There is no denying that this chart is...

Read more »

S&P that might have been

January 6, 2014
By
S&P that might have been

The S&P 500 returned 29.6% in 2013.  How might that have varied? S&P weights There are many features that could vary — here we will keep the same constituents (almost) and weights with similar sizes but that are randomly assigned rather than based on market capitalization. That is, we want the large weights of our … Continue reading →

Read more »

MCMSki IV, Jan. 6-8, 2014, Chamonix (news #18)

January 6, 2014
By
MCMSki IV, Jan. 6-8, 2014, Chamonix (news #18)

MCMSki IV is about to start! While further participants may still register (registration is still open!), we are currently 223 registered participants, without accompanying people. I do hope most of these managed to reach the town of Chamonix-Mont-Blanc despite the foul weather on the East Coast. Unfortunately, three speakers (so far) cannot make it: Yugo […]

Read more »

WTFViz, ThumbsUpViz, and HelpMeViz

January 6, 2014
By
WTFViz, ThumbsUpViz, and HelpMeViz

I have complained, repeatedly, about the lack of good online resources for visualization; in particular, when it comes to discussion and critical reflection. Also, where can you go to get help with a visualization project? A few recent websites are tackling these issues in different ways. First, Drew Skau started WTFViz, which quickly became hugely […]

Read more »

R as a second language

January 6, 2014
By
R as a second language

Imagine that you are studying English as a second language; you learn the basic rules, some vocabulary and start writing sentences. After a little while, it is very likely that you’ll write grammatically correct sentences that no native speaker would use. You’d be following the formalisms but ignoring culture, idioms, slang and patterns of effective […]

Read more »

Applied Statistics Lesson of the Day – Basic Terminology in Experimental Design #2: Controlling for Confounders

Applied Statistics Lesson of the Day – Basic Terminology in Experimental Design #2: Controlling for Confounders

A well designed experiment must have good control, which is the reduction of effects from confounding variables.  There are several ways to do so: Include a control group.  This group will receive a neutral treatment or a standard treatment.  (This treatment may simply be nothing.)  The experimental group will receive the new treatment or treatment of […]

Read more »

Machine Learning Lesson of the Day – Classification and Regression

Machine Learning Lesson of the Day – Classification and Regression

Supervised learning has 2 categories: In classification, the target variable is categorical. In regression, the target variable is continuous. Thus, regression in statistics is different from regression in supervised learning. In statistics, regression is used to model relationships between predictors and targets, and the targets could be continuous or categorical.   a regression model usually includes 2 components to […]

Read more »

Statistics – Singular and Plural, Lies and Truth

January 5, 2014
By
Statistics – Singular and Plural, Lies and Truth

Language is an issue in teaching and learning statistics. There are many words that have meanings in statistics, different from their everyday meaning, and even with multiple meanings within the study of statistics. Examples of troublesome words are: error, correlation, … Continue reading →

Read more »

Statistics – Singular and Plural, Lies and Truth

January 5, 2014
By
Statistics – Singular and Plural, Lies and Truth

Language is an issue in teaching and learning statistics. There are many words that have meanings in statistics, different from their everyday meaning, and even with multiple meanings within the study of statistics. Examples of troublesome words are: error, correlation, … Continue reading →

Read more »

Sunday data/statistics link roundup (1/5/14)

January 5, 2014
By
Sunday data/statistics link roundup (1/5/14)

If you haven't seen lolmythesis it is pretty incredible. 1-2 line description of thesis projects. I think every student should be required to make one of these up before they defend. The best I could come up with for mine … Continue reading →

Read more »

Your 2014 wishing well….

January 4, 2014
By
Your 2014 wishing well….

A reader asks how I would complete the following sentence: I wish that new articles* written in 2014 would refrain from_______.   Here are my quick answers, in no special order: (a) rehearsing the howlers of significance tests and other frequentist statistical methods; (b) misinterpreting p-values, ignoring discrepancy assessments (and thus committing fallacies of rejection […]

Read more »

Machine Learning Lesson of the Day – Supervised and Unsupervised Learning

Machine Learning Lesson of the Day – Supervised and Unsupervised Learning

The 2 most commonly used and studied categories of machine learning are supervised learning and unsupervised learning. In supervised learning, there is a target variable, , and a set of predictor variables, .  The goal is to use  to predict .  Supervised learning is synonymous with predictive modelling, but the latter term does not connote […]

Read more »

Repost: Prediction: the Lasso vs. just using the top 10 predictors

January 4, 2014
By
Repost: Prediction: the Lasso vs. just using the top 10 predictors

Editor's note: This is a previously published post of mine from a couple of years ago (!). I always thought about turning it into a paper. The interesting idea (I think) is how the causal model matters for whether the … Continue reading →

Read more »

Applied Statistics Lesson of the Day – Basic Terminology in Experimental Design #1

Applied Statistics Lesson of the Day – Basic Terminology in Experimental Design #1

Experiment: A procedure to determine the causal relationship between 2 variables – an explanatory variable and a response variable.  The value of the explanatory variable is changed, and the value of the response variable is observed for each value of the explantory variable. An experiment can have 2 or more explanatory variables and 2 or […]

Read more »

“Dogs are sensitive to small variations of the Earth’s magnetic field”

January 4, 2014
By
“Dogs are sensitive to small variations of the Earth’s magnetic field”

Two different people pointed me to this article by Vlastimil Hart et al. in the journal Frontiers in Zoology: It is for the first time that (a) magnetic sensitivity was proved in dogs, (b) a measurable, predictable behavioral reaction upon natural MF fluctuations could be unambiguously proven in a mammal, and (c) high sensitivity to […]The post “Dogs are sensitive to small variations of the Earth’s magnetic field” appeared first…

Read more »

Multivariate Archimax copulas

January 4, 2014
By

Our paper, written jointly also with Anne-Laure Fougères, Christian Genest and Johanna Nešlehová, entitled Multivariate Archimax Copulas, should appear some day in the Journal of Multivariate Analysis. “A multivariate extension of the bivariate class of Archimax copulas was recently proposed by Mesiar & Jagr (2013), who asked under which conditions it holds. This paper answers their question and provides a stochastic representation of multivariate Archimax copulas. A few basic properties of these copulas are…

Read more »


Subscribe

Email:

  Subscribe