## How to color clusters in a dendrogram

June 26, 2013
By

The CLUSTER procedure in SAS/STAT software creates a dendrogram automatically. The black-and-white dendrogram is nice, but plain. A SAS customer wanted to know whether it is possible to add color to the dendrogram to emphasize certain clusters. For example, the plot at the left emphasizes a four-cluster scenario for clustering [...]

## Future ISFs

June 26, 2013
By

The next few locations for the International Symposium on Forecasting have been announced: 2014: Rotterdam, The Netherlands 2015: Riverside, California, USA 2016: Santander, Spain 2017: Cairns, Australia The ISF is easily the best forecasting confere...

## Natural language processing tutorial

June 26, 2013
By

Introduction This will serve as an introduction to natural language processing. I adapted it from slides for a recent talk at Boston Python. We will go from tokenization to feature extraction to creating a model using a machine learning algorithm. ...

## My talk at Boston Python

June 26, 2013
By

I just gave a talk at Boston Python about natural language processing in general, and edX ease and discern in specific. You can find the presentation source here, and the web version of it here. There is a video of it here. Nelle Varoquaux and Micha...

## Natural Language Processing Tutorial

June 26, 2013
By

Introduction This will serve as an introduction to natural language processing. I adapted it from slides for a recent talk at Boston Python. We will go from tokenization to feature extraction to creating a model using a machine learning algorithm. The goal is to provide a reasonable baseline on top of which more complex natural language processing can be done, and provide a good introduction to the material. The examples…

## My Talk at Boston Python

June 25, 2013
By

I just gave a talk at Boston Python about natural language processing in general, and edX ease and discern in specific. You can find the presentation source here, and the web version of it here. There is a video of it here. Nelle Varoquaux and Michael Selik also had interesting talks in the same video above, recommend checking them out.

## Hot Shot Charts: Data-Based Insights of Past NBA Basketball Games

June 25, 2013
By

Hot Shot Charts [hotshotcharts.com], developed by a small team of data scientists, analysts and visualization researchers of consulting form Accenture, provides a wide range of data-based insights of the NBA basketball competition matches that were pl...

## Stadtbilder: Mapping the Digital Hotspots of a City

June 25, 2013
By

Stadtbilder [stadt-bilder.com], designed by Moritz Stefaner, provides an artistic overview of the typical digital "hotspots" in a city, such as its local restaurants, hotels or clubs. Based on data retrieved from different social media providers suc...

## Exploratory Data Analysis: 2 Ways of Plotting Empirical Cumulative Distribution Functions in R

$Exploratory Data Analysis: 2 Ways of Plotting Empirical Cumulative Distribution Functions in R$

Introduction Continuing my recent series on exploratory data analysis (EDA), and following up on the last post on the conceptual foundations of empirical cumulative distribution functions (CDFs), this post shows how to plot them in R.  (Previous posts in this series on EDA include descriptive statistics, box plots, kernel density estimation, and violin plots.) I […]

## Three Ways to Run Bayesian Models in R

June 25, 2013
By

There are different ways of specifying and running Bayesian models from within R. Here I will compare three different methods, two that relies on an external program and one that only relies on R. I won’t go into much detail about the differences i...

## Is there too much coauthorship in economics (and science more generally)? Or too little?

June 25, 2013
By

Economist Stan Liebowitz has a longstanding interest in the difficulties of flagging published research errors. Recently he wrote on the related topic of dishonest authorship: While not about direct research fraud, I thought you might be interested in this paper. It discusses the manner in which credit is given for economics articles, and I suspect [...]The post Is there too much coauthorship in economics (and science more generally)? Or too…

## Doing Statistical Research

June 25, 2013
By

There's a wonderful article over at the STATtr@k web site by Terry Speed on How to Do Statistical Research. There is a lot of good advice there, but the column is most notable because it's pretty much the exact opposite … Continue reading →

## A short statistics course

June 25, 2013
By

For years, I have wanted to see a statistics course that is not a math class. So I made one myself. The title of the course is "How to do statistics without really doing statistics?". It's on a new online learning platform called Three Nights and Done. There are three hours worth of materials divided into three or four chunks each hour. Here is the link. I'd love to hear…

## Predicting spatial locations using point processes

June 25, 2013
By

I’ve uploaded a draft tutorial on some aspects of prediction using point processes. I wrote it using R-Markdown, so there’s bits of R code for readers to play with. It’s hosted on Rpubs, which turns out to be a great deal more conveni...

## Exploratory Data Analysis: Conceptual Foundations of Empirical Cumulative Distribution Functions

Introduction Continuing my recent series on exploratory data analysis (EDA), this post focuses on the conceptual foundations of empirical cumulative distribution functions (CDFs); in a separate post, I will show how to plot them in R.  (Previous posts in this series include descriptive statistics, box plots, kernel density estimation, and violin plots.) To give you […]

## Talking data: Building interactive relationships with data and colleagues

June 25, 2013
By

Last week I had the honour to give the opening keynote talk at the Talking Data South Westconference, organised by the Exeter Initiative for Statistics and its Applications. The event was chaired by Steve Brooks and brought together over 100 people to...

## Opel Corsa Diesel Usage

June 24, 2013
By

I wanted to extend my car weight distribution calculation of June 16 from only 2000 to years 2000 to 2013. Unfortunately, come Sunday afternoon the code seemed too slow and not even the beginning of a post. So, I went on to another calculation I w...

## Why it doesn’t make sense in general to form confidence intervals by inverting hypothesis tests

June 24, 2013
By

I’m reposing this classic from 2011 . . . Peter Bergman pointed me to this discussion from Cyrus of a presentation by Guido Imbens on design of randomized experiments. Cyrus writes: The standard analysis that Imbens proposes includes (1) a Fisher-type permutation test of the sharp null hypothesis–what Imbens referred to as “testing”–along with a [...]The post Why it doesn’t make sense in general to form confidence intervals by inverting…

## Does fraud depend on my philosophy?

June 24, 2013
By

Ever since my last post on replication and fraud I've been doing some more thinking about why people consider some things "scientific fraud". (First of all, let me just say that I was a bit surprised by the discussion in … Continue reading →

## Bayesian quality control?

June 24, 2013
By

Gabriel Murray writes: I saw this post and response from about 5 years ago, regarding a fellow analyzing levels of white blood cells. He was asking about Bayesian approaches to quality control and couldn’t find a canonical resource on that topic. Five years on and I similarly don’t see many good resources on the topic, [...]The post Bayesian quality control? appeared first on Statistical Modeling, Causal Inference, and Social Science.

June 24, 2013
By

Predictive Analytics by Eric Siegel (link) was published earlier this year. Siegel is a consultant and organizer of a series of popular industry conferences, which I attend with some regularity. I recommend this book for readers who want to understand the current state of “data science” at a deeper level than the New York Times’s but still nonmathematical. If you want to measure against my own writing, then Siegel spends…

## Count the number of unique rows in a matrix

June 24, 2013
By

How do you count the number of unique rows in a matrix? The simplest algorithm is to sort the data and then iterate down the rows, comparing each row with the previous row. However, this algorithm has two shortcomings: it physically sorts the data (which means that the original locations [...]

## Sunday data/statistics link roundup (6/23/13)

June 24, 2013
By

An interesting study describing the potential benefits of using significance testing may be potentially beneficial and a scenario where the file drawer effect may even be beneficial. Granted this is all simulation so you have to take it with a … Continue reading →