## Oh no, the Leekasso….

March 12, 2014
By

An astute reader (Niels Hansen, who is visiting our department today) caught a bug in my code on Github for the Leekasso. I had: lm1 = lm(y ~ leekX) predict.lm(lm1,as.data.frame(leekX2)) Unfortunately, this meant that I was getting predictions for the … Continue reading →

## Reality check on the long tail

March 12, 2014
By

Some time ago, there was a lot of hype about how new tech will demolish the superstar effect in entertainment sales because all the little titles in the long tail will be exposed to consumers. I recall Amazon being labeled the shiny example of a company that made profits off the long tail (as opposed to the boring top of the distribution). I still remember this graphic from Wired (link):…

## Unit root tests and ARIMA models

March 12, 2014
By

An email I received today: I have a small problem. I have a time series called x : - If I use the default values of auto.arima(x), the best model is an ARIMA(1,0,0) - However, I tried the function ndiffs(x, test=“adf”) and ndiffs(x, test=“kpss”) as the KPSS test seems to be the default value, and the number of difference is 0 for the kpss test (consistent with the results of…

## Optimizing a function of an integral

March 12, 2014
By

Last week I showed how to find parameters that maximize the integral of a certain probability density function (PDF). Because the function was a PDF, I could evaluate the integral by calling the CDF function in SAS. (Recall that the cumulative distribution function (CDF) is the integral of a PDF.) [...]

## Heuristics in Analytics

March 12, 2014
By

Last week, a book -- a real, hard-cover paper-paged book -- arrived in the mail with the title:  Heuristics in Analytics:  A Practical Perspective of What Influences Our Analytic World.  The book wasn't a total surprise, because I had re...

## where did the normalising constants go?! [part 2]

March 11, 2014
By

Coming (swiftly and smoothly) back home after this wonderful and intense week in Banff, I hugged my loved ones,  quickly unpacked, ran a washing machine, and  then sat down to check where and how my reasoning was wrong. To start with, I experimented with a toy example in R: and (of course!) it produced the […]

## The myth of the myth of the myth of the hot hand

March 11, 2014
By

Phil pointed me to this paper so I thought I probably better repeat what I wrote a couple years ago: 1. The effects are certainly not zero. We are not machines, and anything that can affect our expectations (for example, our success in previous tries) should affect our performance. 2. The effects I’ve seen are […]The post The myth of the myth of the myth of the hot hand appeared…

## Less wordy R

March 11, 2014
By

The Swarm Lab presents a nice comparison of R and Python code for a simple (read ‘one could do it in Excel’) problem. The example works, but I was surprised by how wordy the R code was and decided to check if one could easily produce a shorter version. The beginning is pretty much the […]

## HereHere: Mapping the Concerns of NY Citizens as an Iconographic Map

March 11, 2014
By

Here Here [herehere.co], developed by Future Social Experiences (FuSE) Labs at Microsoft Research, expresses neighborhood-specific public data by mapping it as text labels and cartoon-like iconography. The data is based on New York City's 311 non-e...

## Sorting: Understanding How Famous Sorting Algorithms Work

March 11, 2014
By

There are quite a few visualizations of sorting algorithms out there, such as at sorting-algorithms.com and sortvis.org. "Sorting" [sorting.at], developed by Nokia data visualization designer Carlo Zapponi, brings some innovation to this field by tack...

## SAS, SPSS, Stata Users: Learn R from Home April 21

March 11, 2014
By

Has learning R been driving you a bit crazy? If so, it may be that you’re “lost in translation.” On April 21 and 23, I’ll be teaching a webinar, R for SAS, SPSS and Stata Users. With each R concept, … Continue reading →

## What if I were to stop publishing in journals?

March 11, 2014
By

In our recent discussion of modes of publication, Joseph Wilson wrote, “The single best reform science can make right now is to decouple publication from career advancement, thereby reducing the number of publications by an order of magnitude and then move to an entirely disjointed, informal, online free-for-all communication system for research results.” My first […]The post What if I were to stop publishing in journals? appeared first on Statistical…

## googleVis code development moved to GitHub

March 11, 2014
By

After nearly 4 years of developing googleVis on Google Code with SVN we decided to move to GitHub. The main reason was that Google stopped the facility of hosting pre-CRAN builds of the package for user testing. The devtools package on the other hand m...

## Machine Learning Lesson of the Day – Introduction to Linear Basis Function Models

$Machine Learning Lesson of the Day – Introduction to Linear Basis Function Models$

Given a supervised learning problem of using inputs () to predict a continuous target , the simplest model to use would be linear regression.  However, what if we know that the relationship between the inputs and the target is non-linear, but we are unsure of exactly what form this relationship has? One way to overcome […]

March 11, 2014
By

I want to follow up the Intraday data post with an example of how you can capture Intraday data without too much effort by recording 1 minute snapshots of the market. I will take market snapshots from Yahoo Finance using following function that downloads delayed market quotes with date and time stamps: Next we can […]

## where did the normalising constants go?! [part 1]

March 10, 2014
By

When listening this week to several talks in Banff handling large datasets or complex likelihoods by parallelisation, splitting the posterior as and handling each term of this product on a separate processor or thread as proportional to a probability density, then producing simulations from the mi‘s and attempting at deriving simulations from the original product, […]

## VB News – Statwing picks up funding from data science luminary Hammerbacher

March 10, 2014
By

From: http://venturebeat.com/2014/01/30/statwing-picks-up-funding-from-data-science-luminary-hammerbacher/Above: A correlation as shown in Statwing's software.Image Credit: StatwingJanuary 30, 2014 3:01 PM Jordan NovetBig data projects are tr...

## Stan Model of the Week: PK Calculation of IV and Oral Dosing

March 10, 2014
By

[Update: Revised given comments from Wingfeet, Andrew and germo. Thanks! I'd mistakenly translated the dlnorm priors in the first version --- amazing what a difference the priors make. I also escaped the less-than and greater-than signs in the constraints in the model so they're visible. I also updated to match the thin=2 output of JAGS.] […]The post Stan Model of the Week: PK Calculation of IV and Oral Dosing appeared…

## Preregistration: what’s in it for you?

March 10, 2014
By

Chris Chambers pointed me to a blog by someone called Neuroskeptic who suggested that I preregister my political science studies: So when Andrew Gelman (let’s say) is going to start using a new approach, he goes on Twitter, or on his blog, and posts a bare-bones summary of what he’s going to do. Then he […]The post Preregistration: what’s in it for you? appeared first on Statistical Modeling, Causal Inference,…

## On deck this week: Things people sent me

March 10, 2014
By

Mon: Preregistration: what’s in it for you? Tues: What if I were to stop publishing in journals? Wed: Empirical implications of Empirical Implications of Theoretical Models Thurs: An Economist’s Guide to Visualizing Data Fri: The maxim...

## Spatial perception: on the chart and in real life

March 10, 2014
By

A twitter follower @mdjoner felt that something is amiss with the squares in this chart comparing real estate prices in major cities around the world. I'm not sure where the chart originally came from but there is a CNBC icon....

## Man at work(-ish)

March 10, 2014
By

Perhaps one could argue that the obvious, manly activity to do at the weekend when you're home alone is to put and organise stuff in the garage. Well, I was home alone last weekend and my very own version of this was to arxiv the first p...

## How to get started with SAS: Free videos for beginners

March 10, 2014
By

On most Mondays I blog about a function, programming technique, or resource that is useful for programmers who are getting started with SAS software. Recently I learned that my colleagues in the SAS education division have been hard at work developing a series of short videos that explain basic tasks [...]