## The top 10 predictor takes on the debiased Lasso – still the champ!

January 8, 2014
By

After reposting on the comparison between the lasso and the always top 10 predictor (leekasso) I got some feedback that the problem could be I wasn't debiasing the Lasso (thanks Tim T. on Twitter!). The idea behind debiasing (as I … Continue reading →

## Elements of Statistical Learning: A Stunningly Good Job of LaTeX to pdf to Web

January 8, 2014
By

A very Happy New Year to all! Here's a little thing to start us off.I happened to be thinking about principal-component regression vs. ridge regression yesterday, so as usual I consulted the Hastie-Tibshirani-Friedman (HTF) classic, El...

## How to display multinominal logit results graphically?

January 8, 2014
By

Adriana Lins de Albuquerque writes: Do you have any suggestions for the best way to represent multinominal logit results graphically? I am using stata. My reply: I don’t know from Stata, but here are my suggestions: 1. If the categories are unordered, break them up into a series of binary choices in a tree structure […]The post How to display multinominal logit results graphically? appeared first on Statistical Modeling, Causal…

## Losing the big picture

January 8, 2014
By

One of the dangers of "Big Data" is the temptation to get lost in the details. You become so absorbed in the peeling of the onion that you don't realize your tear glands have dried up. Hans Rosling linked to...

## Applied Statistics Lesson of the Day – Choosing the Number of Levels for Factors in Experimental Design

The experimenter needs to decide the number of levels for each factor in an experiment. For a qualitative (categorical) factor, the number of levels may simply be the number of categories for that factor.  However, because of cost constraints, an experimenter may choose to drop a certain category.  Based on the experimenter’s prior knowledge or […]

## Machine Learning Lesson of the Day – Using Validation to Assess Predictive Accuracy in Supervised Learning

Supervised learning puts a lot of emphasis on building a model that has high predictive accuracy.  Validation is a good method for assessing a model’s predictive accuracy. Validation is the use of one part of your data set to build your model and another part of your data set to assess the model’s predictive accuracy. […]

## Connecting TOAD For MySQL, MySQL Workbench, and R to Amazon AWS EC2 Using SSH Tunneling

January 8, 2014
By

I often use Amazon EC2 to store and retrieve data when I need either additional storage or higher computing capacity.  In this tutorial I’ll share how to connect to a MySQL database so that one can retrieve the data and do the analysis.  I tend to use either TOAD for MySQL or MySQL Workbench to run […]

## “Philosophy of Statistical Inference and Modeling” New Course: Spring 2014: Mayo and Spanos: (Virginia Tech)

January 8, 2014
By

New course for Spring 2014: Thursday 3:30-6:15 Phil 6334: Philosophy of Statistical Inference and Modeling D. Mayo and A. Spanos Contact: error@vt.edu This new course, to be jointly taught by Professors D. Mayo (Philosophy) and A. Spanos (Economics) will provide an introductory, in-depth introduction to graduate level research in philosophy of inductive-statistical inference and probabilistic […]

## Significant news

January 7, 2014
By

Good news on the second day back to work after the Christmas break: I've been invited to join the Editorial Board of the Significance magazine \$-\$ of course I have happily agreed to the invitation!I have always been a big fan of the magazine (in fact I...

## 13 popular articles from 2013

January 7, 2014
By

In 2013 I published 110 blog posts. Some of these articles were more popular than others, often because they were linked to from a SAS newsletter such as the SAS Statistics and Operations Research News. In no particular order, here are some of my most popular posts from 2013, organized [...]

## Preparing for tenure track job interviews

January 7, 2014
By

Editor's note: This is a slightly modified version of a previous post. If you are in the job market you will soon be receiving (or already received) an invitation for an interview. So how should you prepare?  You have two goals. The … Continue reading →

## My recent debugging experience

January 7, 2014
By

OK, so this sort of thing happens sometimes. I was working on a new idea (still working on it; if it ultimately works out—or if it doesn’t—I’ll let you know) and as part of it I was fitting little models in Stan, in a loop. I thought it would make sense to start with linear […]The post My recent debugging experience appeared first on Statistical Modeling, Causal Inference, and Social…

## Text Mining: The Next Data Frontier – Scientific Computing

January 7, 2014
By

From: http://www.scientificcomputing.com/articles/2014/01/text-mining-next-data-frontier#.UswIHNLuLToMon, 01/06/2014 - 2:04pmMark AnawisBy some estimates, 80 percent of available information occurs as free-form textText Mining: The Next Data Front...

## MCMSki IV [day 1.5]

January 7, 2014
By

The afternoon sessions I attended were “Computational and Methodological Challenges in evidence synthesis and multi-step” organised by Nicky Best and Sylvia Richardson and “Approximate inference” put together by Dan Simpson. Since both Nicky and Sylvia were alas unable to attend MCMSki, I chaired their session, which I found most interesting as connected to a recurrent […]

## From spreadsheet thinking to R thinking

January 7, 2014
By

Towards the basic R mindset. Previously The post “A first step towards R from spreadsheets” provides an introduction to switching from spreadsheets to R.  It also includes a list of additional posts (like this one) on the transition. Add two columns Figure 1 shows some numbers in two columns and the start of adding those […] The post From spreadsheet thinking to R thinking appeared first on Burns Statistics.

## Whale charts – Visualising customer profitability

January 7, 2014
By

The Christmas and New Year's break is over, yet there is still time to return unwanted presents. Return to Santa was the title of an article in the Economist that highlighted the impact on online retailers, as return rates can be alarmingly high. ...

## Machine Learning Lesson of the Day: Clustering, Density Estimation and Dimensionality Reduction

$Machine Learning Lesson of the Day: Clustering, Density Estimation and Dimensionality Reduction$

I struggle to categorize unsupervised learning.  It is not an easily defined field, and it is also hard to find generalizations of techniques that are exhaustive and mutually exclusive. Nonetheless, here are some categories of unsupervised learning that cover many of its commonly used techniques.  I learned this categorization from Mathematical Monk, who posted a […]

## Applied Statistics Lesson of the Day: Sample Size and Replication in Experimental Design

The goal of an experiment is to determine whether or not there is a cause-and-effect relationship between the factor and the response the strength of the causal relationship, should such a relationship exist. To answer these questions, the response variable is measured in both the control group and the experimental group.  If there is a […]

## Reinforcement Learning in R: Markov Decision Process (MDP) and Value Iteration

January 7, 2014
By

How can we find the best long-term plan? In the last post, we looked at the idea of dynamic programming,...

## An Introduction to Statistical Learning with Applications in R

January 7, 2014
By

Statistical learning theory offers an opportunity for those of us trained as social science methodologists to look at everything we have learned from a different perspective. For example, missing value imputation can be seen as matrix completion and re...

## You Are What You Write

January 7, 2014
By

To my wonderful students: These paragraphs are a revision of advice recently given to a student writer.  Writing is a craft we all must master. And we all will. You are young: enthusiasm and energy come through in your writing: keep that and add to it...

## Spam names

January 6, 2014
By

There was this thing going around awhile ago, the “porn star name,” which you create by taking the name of your childhood pet, followed by the name of the street where you grew up (for example, Blitz Clifton). But recently I’ve been thinking about spam names. Just in the last two days, I’ve received emails […]The post Spam names appeared first on Statistical Modeling, Causal Inference, and Social Science.