Automatic time series forecasting in Granada

January 31, 2014
By
Automatic time series forecasting in Granada

In two weeks I am presenting a workshop at the University of Granada (Spain) on Automatic Time Series Forecasting. Unlike most of my talks, this is not intended to be primarily about my own research. Rather it is to provide a state-of-the-art overview of the topic (at a level suitable for Masters students in Computer Science). I thought I’d provide some historical perspective on the development of automatic time series forecasting,…

Read more »

Python and R: Is Python really faster than R?

January 31, 2014
By

A friend of mine asked me to code the following in R:Generate samples of size 10 from Normal distribution with $\mu$ = 3 and $\sigma^2$ = 5;Compute the $\bar{x}$ and $\bar{x}\mp z_{\alpha/2}\displaystyle\frac{\sigma}{\sqrt{n}}$ using the 95% confidence...

Read more »

LaTeX can be arsey, but boy is it good?!

January 30, 2014
By
LaTeX can be arsey, but boy is it good?!

I have been using LaTeX since I wrote my BSc thesis (that was way back in the last century $-$ although I'm saying this just for dramatic effect, but I'm not THAT old!) and have loved it since. Of course, I do use WYSIWYG typesetting software now and t...

Read more »

Input data interactively into R

January 30, 2014
By
Input data interactively into R

To input data interactively into R, use the function readline:

Read more »

More SOTU Scaling

January 30, 2014
By
More SOTU Scaling

A couple of days ago the Monkey Cage featured Ben Lauderdale’s one-dimensional scaling model of US State of the Union addresses. In this post, I replicate the analysis with a closely related model, ask what the scaled dimension actually means, and consider what things would look like if we added another one. The technical details […]

Read more »

Inference for ARMA(p,q) Time Series

January 30, 2014
By
Inference for ARMA(p,q) Time Series

As we mentioned in our previous post, as soon as we have a moving average part, inference becomes more complicated. Again, to illustrate, we do not need a two general model. Consider, here, some  process, where  is some white noise, and assume further that . > theta=.7 > phi=.5 > n=1000 > Z=rep(0,n) > set.seed(1) > e=rnorm(n) > for(t in 2:n) Z[t]=phi*Z[t-1]+e[t]+theta*e[t-1] > Z=Z[800:1000] > plot(Z,type="l") A two step procedure…

Read more »

GNU Screen

January 30, 2014
By
GNU Screen

This is one of those things I picked up years ago while in graduate school that I just assumed everyone else already knew about. GNU screen is a great utility built-in to most Linux installations for remote session management. Typing 'screen' at t...

Read more »

Visualizing uneven distributions

January 30, 2014
By
Visualizing uneven distributions

Jeff, a reader of the blog, asks for comment on this blog post of his (link). The highlight of the post is this chart, which shows an uneven distribution. The message of the chart is that a large amount of...

Read more »

History is too important to be left to the history professors, Part 2

January 30, 2014
By
History is too important to be left to the history professors, Part 2

Completely non-gay historian Niall Ferguson, a man who we can be sure would never be caught at a ballet or a poetry reading, informs us that the British decision to enter the first world war on the side of France and Belgium was “the biggest error in modern history.” Ummm, here are a few bigger […]The post History is too important to be left to the history professors, Part 2…

Read more »

Machine Learning Lesson of the Day – Overfitting

Machine Learning Lesson of the Day – Overfitting

Any model in statistics or machine learning aims to capture the underlying trend or systematic component in a data set.  That underlying trend cannot be precisely captured because of the random variation in the data around that trend.  A model must have enough complexity to capture that trend, but not too much complexity to capture […]

Read more »

Free books on statistical learning

January 30, 2014
By
Free books on statistical learning

Hastie, Tibshirani and Friedman’s Elements of Statistical Learning first appeared in 2001 and is already a classic. It is my go-to book when I need a quick refresher on a machine learning algorithm. I like it because it is written using the language and perspective of statistics, and provides a very useful entry point into the literature of machine learning which has its own terminology for statistical concepts. A free…

Read more »

Inference for MA(q) Time Series

January 30, 2014
By
Inference for MA(q) Time Series

Yesterday, we’ve seen how inference for time series was possible.  I started  with that one because it is actually the simple case. For instance, we can use ordinary least squares. There might be some possible bias (see e.g. White (1961)), but asymptotically, estimators are fine (consistent, with asymptotic normality). But when the noise is (auto)correlated, then it is more complex. So, consider here some  time series for some white noise…

Read more »

Hastie-Tibshirani Statistical Learning Course Now Open

January 29, 2014
By
Hastie-Tibshirani Statistical Learning Course Now Open

Machine learning is hot, hot, hot. I can't imagine better instructors (or scholars) in the area than H&T (great videos), and the course is also a fine way to learn R. It's happening now (started just last week) and runs through late March. Just go ...

Read more »

Stupid R Tricks: Random Scope

January 29, 2014
By

Andrew and I have been discussing how we’re going to define functions in Stan for defining systems of differential equations; see our evolving ode design doc; comments welcome, of course. About Scope I mentioned to Andrew I would prefer pure lexical, static scoping, as found in languages like C++ and Java. If you’re not familiar […]The post Stupid R Tricks: Random Scope appeared first on Statistical Modeling, Causal Inference, and…

Read more »

Not teaching computing and statistics in our public schools will make upward mobility even harder

January 29, 2014
By

In his book Average Is Over, Tyler Cowen predicts that as automatization becomes more common, modern economies will eventually be composed of two groups: 1) a highly educated minority involved in the production of  automated services and 2) a vast majority … Continue reading →

Read more »

“Questioning The Lancet, PLOS, And Other Surveys On Iraqi Deaths, An Interview With Univ. of London Professor Michael Spagat”

January 29, 2014
By
“Questioning The Lancet, PLOS, And Other Surveys On Iraqi Deaths, An Interview With Univ. of London Professor Michael Spagat”

Mike Spagat points to this interview, which, he writes, covers themes that are discussed on the blog such as wrong ideas that don’t die, peer review and the statistics of conflict deaths. I agree. It’s good stuff. Here are some of the things that Spagat says (he’s being interviewed by Joel Wing): In fact, the […]The post “Questioning The Lancet, PLOS, And Other Surveys On Iraqi Deaths, An Interview With…

Read more »

Sample with replacement in SAS

January 29, 2014
By
Sample with replacement in SAS

Randomly choosing a subset of elements is a fundamental operation in statistics and probability. Simple random sampling with replacement is used in bootstrap methods (where the technique is called resampling), permutation tests and simulation. Last week I showed how to use the SAMPLE function in SAS/IML software to sample with [...]

Read more »

Data mining with R course in the Netherlands taught by Luis Torgo

January 29, 2014
By

In the course of this year, Dr. Luis Torgo will teach a Data Mining with R course together with the DIKW Academy in Nieuwegein, The Netherlands. Dr. Torgo is an Associate Professor at the department of Computer Science at the… See more ›

Read more »

What do I do? How do I apply statistics in my job? How did I get started?

January 29, 2014
By

I've been invited to a panel discussion by the UCLA undergraduate statistics club. Some of the questions I was told to expect are down below. By answering the questions here, there's a chance of a more literate answer and other students will be able to...

Read more »

Applied Statistics Lesson of the Day – Blocking and the Randomized Complete Blocked Design (RCBD)

Applied Statistics Lesson of the Day – Blocking and the Randomized Complete Blocked Design (RCBD)

A completely randomized design works well for a homogeneous population - one that does not have major differences between any sub-populations.  However, what if a population is heterogeneous? Consider an example that commonly occurs in medical studies.  An experiment seeks to determine the effectiveness of a drug on curing a disease, and 100 patients are recruited […]

Read more »

The Mirrored Line Chart Is A Bad Idea

January 29, 2014
By
The Mirrored Line Chart Is A Bad Idea

The mirrored line chart is a pet peeve of mine. It’s very common close to elections when there are two parties or candidates: one’s gains are at the other’s expense. But it becomes even more egregious when there are two categories that have to sum up to 100% by their very definition. In her coverage […]

Read more »

BOSTON COLLOQUIUM FOR PHILOSOPHY OF SCIENCE: Revisiting the Foundations of Statistics

January 29, 2014
By
BOSTON COLLOQUIUM FOR PHILOSOPHY OF SCIENCE:  Revisiting the Foundations of Statistics

BOSTON COLLOQUIUM FOR PHILOSOPHY OF SCIENCE 2013–2014 54th Annual Program Download the 54th Annual Program REVISITING THE FOUNDATIONS OF STATISTICS IN THE ERA OF BIG DATA: SCALING UP TO MEET THE CHALLENGE Cosponsored by the Department of Mathematics & Statistics at Boston University. Friday, February 21, 2014 10 a.m. – 5:30 p.m. Photonics Center, 9th Floor Colloquium […]

Read more »

Inference for AR(p) Time Series

January 29, 2014
By
Inference for AR(p) Time Series

Consider a (stationary) autoregressive process, say of order 2, for some white noise with variance . Here is a code to generate such a process, > phi1=.25 > phi2=.7 > n=1000 > set.seed(1) > e=rnorm(n) > Z=rep(0,n) > for(t in 3:n) Z[t]=phi1*Z[t-1]+phi2*Z[t-2]+e[t] > Z=Z[800:1000] > n=length(Z) > plot(Z,type="l") Here, we have to estimate two sets of parameters: the autoregressive coefficients, and the variance of the innovation process . Several techniques…

Read more »


Subscribe

Email:

  Subscribe