## Significant news

January 7, 2014
Good news on the second day back to work after the Christmas break: I've been invited to join the Editorial Board of the Significance magazine \$-\$ of course I have happily agreed to the invitation!I have always been a big fan of the magazine (in fact I...

## 13 popular articles from 2013

January 7, 2014
In 2013 I published 110 blog posts. Some of these articles were more popular than others, often because they were linked to from a SAS newsletter such as the SAS Statistics and Operations Research News. In no particular order, here are some of my most popular posts from 2013, organized [...]

## Preparing for tenure track job interviews

January 7, 2014
Editor's note: This is a slightly modified version of a previous post. If you are in the job market you will soon be receiving (or already received) an invitation for an interview. So how should you prepare?  You have two goals. The … Continue reading →

## My recent debugging experience

January 7, 2014
OK, so this sort of thing happens sometimes. I was working on a new idea (still working on it; if it ultimately works out—or if it doesn’t—I’ll let you know) and as part of it I was fitting little models in Stan, in a loop. I thought it would make sense to start with linear […]The post My recent debugging experience appeared first on Statistical Modeling, Causal Inference, and Social…

## Text Mining: The Next Data Frontier – Scientific Computing

January 7, 2014
From: http://www.scientificcomputing.com/articles/2014/01/text-mining-next-data-frontier#.UswIHNLuLToMon, 01/06/2014 - 2:04pmMark AnawisBy some estimates, 80 percent of available information occurs as free-form textText Mining: The Next Data Front...

## MCMSki IV [day 1.5]

January 7, 2014
The afternoon sessions I attended were “Computational and Methodological Challenges in evidence synthesis and multi-step” organised by Nicky Best and Sylvia Richardson and “Approximate inference” put together by Dan Simpson. Since both Nicky and Sylvia were alas unable to attend MCMSki, I chaired their session, which I found most interesting as connected to a recurrent […]

## From spreadsheet thinking to R thinking

January 7, 2014
Towards the basic R mindset. Previously The post “A first step towards R from spreadsheets” provides an introduction to switching from spreadsheets to R.  It also includes a list of additional posts (like this one) on the transition. Add two columns Figure 1 shows some numbers in two columns and the start of adding those […] The post From spreadsheet thinking to R thinking appeared first on Burns Statistics.

## Whale charts – Visualising customer profitability

January 7, 2014
The Christmas and New Year's break is over, yet there is still time to return unwanted presents. Return to Santa was the title of an article in the Economist that highlighted the impact on online retailers, as return rates can be alarmingly high. ...

## Machine Learning Lesson of the Day: Clustering, Density Estimation and Dimensionality Reduction

$Machine Learning Lesson of the Day: Clustering, Density Estimation and Dimensionality Reduction$

I struggle to categorize unsupervised learning.  It is not an easily defined field, and it is also hard to find generalizations of techniques that are exhaustive and mutually exclusive. Nonetheless, here are some categories of unsupervised learning that cover many of its commonly used techniques.  I learned this categorization from Mathematical Monk, who posted a […]

## Applied Statistics Lesson of the Day: Sample Size and Replication in Experimental Design

The goal of an experiment is to determine whether or not there is a cause-and-effect relationship between the factor and the response the strength of the causal relationship, should such a relationship exist. To answer these questions, the response variable is measured in both the control group and the experimental group.  If there is a […]

## Reinforcement Learning in R: Markov Decision Process (MDP) and Value Iteration

January 7, 2014
How can we find the best long-term plan? In the last post, we looked at the idea of dynamic programming,...

## An Introduction to Statistical Learning with Applications in R

January 7, 2014
Statistical learning theory offers an opportunity for those of us trained as social science methodologists to look at everything we have learned from a different perspective. For example, missing value imputation can be seen as matrix completion and re...

## You Are What You Write

January 7, 2014
To my wonderful students: These paragraphs are a revision of advice recently given to a student writer.  Writing is a craft we all must master. And we all will. You are young: enthusiasm and energy come through in your writing: keep that and add to it...

## Spam names

January 6, 2014
There was this thing going around awhile ago, the “porn star name,” which you create by taking the name of your childhood pet, followed by the name of the street where you grew up (for example, Blitz Clifton). But recently I’ve been thinking about spam names. Just in the last two days, I’ve received emails […]The post Spam names appeared first on Statistical Modeling, Causal Inference, and Social Science.

## Visualizing movements of people

January 6, 2014
Long-time reader Daniel L. sends in this chart illustrating a large data set of intra-state migration flows in the U.S. The original chart is at Vizynary by way of Daily Kos. *** There is no denying that this chart is...

## S&P that might have been

January 6, 2014
The S&P 500 returned 29.6% in 2013.  How might that have varied? S&P weights There are many features that could vary — here we will keep the same constituents (almost) and weights with similar sizes but that are randomly assigned rather than based on market capitalization. That is, we want the large weights of our … Continue reading →

## MCMSki IV, Jan. 6-8, 2014, Chamonix (news #18)

January 6, 2014
MCMSki IV is about to start! While further participants may still register (registration is still open!), we are currently 223 registered participants, without accompanying people. I do hope most of these managed to reach the town of Chamonix-Mont-Blanc despite the foul weather on the East Coast. Unfortunately, three speakers (so far) cannot make it: Yugo […]

## WTFViz, ThumbsUpViz, and HelpMeViz

January 6, 2014
I have complained, repeatedly, about the lack of good online resources for visualization; in particular, when it comes to discussion and critical reflection. Also, where can you go to get help with a visualization project? A few recent websites are tackling these issues in different ways. First, Drew Skau started WTFViz, which quickly became hugely […]

## R as a second language

January 6, 2014
Imagine that you are studying English as a second language; you learn the basic rules, some vocabulary and start writing sentences. After a little while, it is very likely that you’ll write grammatically correct sentences that no native speaker would use. You’d be following the formalisms but ignoring culture, idioms, slang and patterns of effective […]

## Applied Statistics Lesson of the Day – Basic Terminology in Experimental Design #2: Controlling for Confounders

A well designed experiment must have good control, which is the reduction of effects from confounding variables.  There are several ways to do so: Include a control group.  This group will receive a neutral treatment or a standard treatment.  (This treatment may simply be nothing.)  The experimental group will receive the new treatment or treatment of […]

## Machine Learning Lesson of the Day – Classification and Regression

$Machine Learning Lesson of the Day – Classification and Regression$

Supervised learning has 2 categories: In classification, the target variable is categorical. In regression, the target variable is continuous. Thus, regression in statistics is different from regression in supervised learning. In statistics, regression is used to model relationships between predictors and targets, and the targets could be continuous or categorical.   a regression model usually includes 2 components to […]

## Statistics – Singular and Plural, Lies and Truth

January 5, 2014
Language is an issue in teaching and learning statistics. There are many words that have meanings in statistics, different from their everyday meaning, and even with multiple meanings within the study of statistics. Examples of troublesome words are: error, correlation, … Continue reading →

