## Automatic time series forecasting in Granada

January 31, 2014
By

In two weeks I am presenting a workshop at the University of Granada (Spain) on Automatic Time Series Forecasting. Unlike most of my talks, this is not intended to be primarily about my own research. Rather it is to provide a state-of-the-art overview of the topic (at a level suitable for Masters students in Computer Science). I thought I’d provide some historical perspective on the development of automatic time series forecasting,…

## Python and R: Is Python really faster than R?

January 31, 2014
By

A friend of mine asked me to code the following in R:Generate samples of size 10 from Normal distribution with $\mu$ = 3 and $\sigma^2$ = 5;Compute the $\bar{x}$ and $\bar{x}\mp z_{\alpha/2}\displaystyle\frac{\sigma}{\sqrt{n}}$ using the 95% confidence...

## LaTeX can be arsey, but boy is it good?!

January 30, 2014
By

I have been using LaTeX since I wrote my BSc thesis (that was way back in the last century $-$ although I'm saying this just for dramatic effect, but I'm not THAT old!) and have loved it since. Of course, I do use WYSIWYG typesetting software now and t...

## Input data interactively into R

January 30, 2014
By

To input data interactively into R, use the function readline:

## More SOTU Scaling

January 30, 2014
By

A couple of days ago the Monkey Cage featured Ben Lauderdale’s one-dimensional scaling model of US State of the Union addresses. In this post, I replicate the analysis with a closely related model, ask what the scaled dimension actually means, and consider what things would look like if we added another one. The technical details […]

## Inference for ARMA(p,q) Time Series

January 30, 2014
By
$ARMA(1,1)$

As we mentioned in our previous post, as soon as we have a moving average part, inference becomes more complicated. Again, to illustrate, we do not need a two general model. Consider, here, some  process, where  is some white noise, and assume further that . > theta=.7 > phi=.5 > n=1000 > Z=rep(0,n) > set.seed(1) > e=rnorm(n) > for(t in 2:n) Z[t]=phi*Z[t-1]+e[t]+theta*e[t-1] > Z=Z[800:1000] > plot(Z,type="l") A two step procedure…

## GNU Screen

January 30, 2014
By

This is one of those things I picked up years ago while in graduate school that I just assumed everyone else already knew about. GNU screen is a great utility built-in to most Linux installations for remote session management. Typing 'screen' at t...

## Visualizing uneven distributions

January 30, 2014
By

Jeff, a reader of the blog, asks for comment on this blog post of his (link). The highlight of the post is this chart, which shows an uneven distribution. The message of the chart is that a large amount of...

## History is too important to be left to the history professors, Part 2

January 30, 2014
By

Completely non-gay historian Niall Ferguson, a man who we can be sure would never be caught at a ballet or a poetry reading, informs us that the British decision to enter the first world war on the side of France and Belgium was “the biggest error in modern history.” Ummm, here are a few bigger […]The post History is too important to be left to the history professors, Part 2…

## Machine Learning Lesson of the Day – Overfitting

Any model in statistics or machine learning aims to capture the underlying trend or systematic component in a data set.  That underlying trend cannot be precisely captured because of the random variation in the data around that trend.  A model must have enough complexity to capture that trend, but not too much complexity to capture […]

## Free books on statistical learning

January 30, 2014
By

Hastie, Tibshirani and Friedman’s Elements of Statistical Learning first appeared in 2001 and is already a classic. It is my go-to book when I need a quick refresher on a machine learning algorithm. I like it because it is written using the language and perspective of statistics, and provides a very useful entry point into the literature of machine learning which has its own terminology for statistical concepts. A free…

## Inference for MA(q) Time Series

January 30, 2014
By
$AR(p)$

Yesterday, we’ve seen how inference for time series was possible.  I started  with that one because it is actually the simple case. For instance, we can use ordinary least squares. There might be some possible bias (see e.g. White (1961)), but asymptotically, estimators are fine (consistent, with asymptotic normality). But when the noise is (auto)correlated, then it is more complex. So, consider here some  time series for some white noise…

## Hastie-Tibshirani Statistical Learning Course Now Open

January 29, 2014
By

Machine learning is hot, hot, hot. I can't imagine better instructors (or scholars) in the area than H&T (great videos), and the course is also a fine way to learn R. It's happening now (started just last week) and runs through late March. Just go ...

## Stupid R Tricks: Random Scope

January 29, 2014
By

Andrew and I have been discussing how we’re going to define functions in Stan for defining systems of differential equations; see our evolving ode design doc; comments welcome, of course. About Scope I mentioned to Andrew I would prefer pure lexical, static scoping, as found in languages like C++ and Java. If you’re not familiar […]The post Stupid R Tricks: Random Scope appeared first on Statistical Modeling, Causal Inference, and…

## Not teaching computing and statistics in our public schools will make upward mobility even harder

January 29, 2014
By

In his book Average Is Over, Tyler Cowen predicts that as automatization becomes more common, modern economies will eventually be composed of two groups: 1) a highly educated minority involved in the production of  automated services and 2) a vast majority … Continue reading →

## “Questioning The Lancet, PLOS, And Other Surveys On Iraqi Deaths, An Interview With Univ. of London Professor Michael Spagat”

January 29, 2014
By

Mike Spagat points to this interview, which, he writes, covers themes that are discussed on the blog such as wrong ideas that don’t die, peer review and the statistics of conflict deaths. I agree. It’s good stuff. Here are some of the things that Spagat says (he’s being interviewed by Joel Wing): In fact, the […]The post “Questioning The Lancet, PLOS, And Other Surveys On Iraqi Deaths, An Interview With…

## Sample with replacement in SAS

January 29, 2014
By

Randomly choosing a subset of elements is a fundamental operation in statistics and probability. Simple random sampling with replacement is used in bootstrap methods (where the technique is called resampling), permutation tests and simulation. Last week I showed how to use the SAMPLE function in SAS/IML software to sample with [...]

## Data mining with R course in the Netherlands taught by Luis Torgo

January 29, 2014
By

In the course of this year, Dr. Luis Torgo will teach a Data Mining with R course together with the DIKW Academy in Nieuwegein, The Netherlands. Dr. Torgo is an Associate Professor at the department of Computer Science at the… See more ›

## What do I do? How do I apply statistics in my job? How did I get started?

January 29, 2014
By

I've been invited to a panel discussion by the UCLA undergraduate statistics club. Some of the questions I was told to expect are down below. By answering the questions here, there's a chance of a more literate answer and other students will be able to...

## Applied Statistics Lesson of the Day – Blocking and the Randomized Complete Blocked Design (RCBD)

A completely randomized design works well for a homogeneous population - one that does not have major differences between any sub-populations.  However, what if a population is heterogeneous? Consider an example that commonly occurs in medical studies.  An experiment seeks to determine the effectiveness of a drug on curing a disease, and 100 patients are recruited […]

## The Mirrored Line Chart Is A Bad Idea

January 29, 2014
By

The mirrored line chart is a pet peeve of mine. It’s very common close to elections when there are two parties or candidates: one’s gains are at the other’s expense. But it becomes even more egregious when there are two categories that have to sum up to 100% by their very definition. In her coverage […]

## BOSTON COLLOQUIUM FOR PHILOSOPHY OF SCIENCE: Revisiting the Foundations of Statistics

January 29, 2014
By

BOSTON COLLOQUIUM FOR PHILOSOPHY OF SCIENCE 2013–2014 54th Annual Program Download the 54th Annual Program REVISITING THE FOUNDATIONS OF STATISTICS IN THE ERA OF BIG DATA: SCALING UP TO MEET THE CHALLENGE Cosponsored by the Department of Mathematics & Statistics at Boston University. Friday, February 21, 2014 10 a.m. – 5:30 p.m. Photonics Center, 9th Floor Colloquium […]

## Inference for AR(p) Time Series

January 29, 2014
By
$Y_t =\varphi_1 Y_{t-1}+\varphi_2 Y_{t-2}+\varepsilon_t$

Consider a (stationary) autoregressive process, say of order 2, for some white noise with variance . Here is a code to generate such a process, > phi1=.25 > phi2=.7 > n=1000 > set.seed(1) > e=rnorm(n) > Z=rep(0,n) > for(t in 3:n) Z[t]=phi1*Z[t-1]+phi2*Z[t-2]+e[t] > Z=Z[800:1000] > n=length(Z) > plot(Z,type="l") Here, we have to estimate two sets of parameters: the autoregressive coefficients, and the variance of the innovation process . Several techniques…