Lots of open education resources for your gadgetry

December 27, 2011
By
Lots of open education resources for your gadgetry

While not related exclusively to statistics, this resource does relate to open education. This page at OpenCulture.com gives a large list of resources of books, courses, and media you can use to fill your new (or old) gadget. They have a list of free o...

Read more »

Programming traps when using "sample"

December 23, 2011
By
Programming traps when using "sample"

Standard sample function works differently when it gets single element integer vector as opposed to longer vectors. This can lead to unexpected bugs in R code.Several times I had a problem with code similar to one given here:for (i in 1:4) {&...

Read more »

Are we getting better at forecasting?

December 23, 2011
By
Are we getting better at forecasting?

I was interviewed recently for the Boston Globe. The interview was by email and I thought it might be useful to post here.Here are the questions from the journalist. Are we better at predicting future events than we used to be? Or are there obstacles ...

Read more »

A statistician’s view of Stanford’s open Introduction to Databases class

December 22, 2011
By
A statistician’s view of Stanford’s open Introduction to Databases class

In addition to the Introduction to Machine Learning class (which I have reviewed), I took a class on introduction to databases, taught by Prof. Jennifer Widom. This class consisted of video lectures, review questions, and exercises. Topics covered incl...

Read more »

ASU Tuition by Academic Year

December 22, 2011
By
ASU Tuition by Academic Year

Description: Arizona State University tuition fees from 1987-2011. Data: https://azregents.asu.edu/ABOR%20Reports/TUITION%20HISTORY.pdf Analysis: It has been suggested that the 'cost' of tuition has remained the same throughout the years, ...

Read more »

PCA using NIPALS

December 22, 2011
By

Non-Linear Iterative Partial Least Squares (NIPALS) is an algorithm for calculating principal components, and unlike the SVD method it allows only the required number of components to be calculated, which is useful for large data sets as typically only...

Read more »

MIT launches online learning initiative

December 20, 2011
By
MIT launches online learning initiative

This may not relate directly to statistics, but it relate to my experiences in an online introductory machine learning class. MIT has decided to launch online public classes of its own. It looks like they are making the platform open-source as well and...

Read more »

A statistician’s view on Stanford’s public machine learning course

December 19, 2011
By
A statistician’s view on Stanford’s public machine learning course

This past fall, I took Stanford’s class on machine learning. Overall, it was a terrific experience, and I’d like to share a few thoughts on it: A lot of participants were concerned that it was a watered down version of Stanford’s CS229. And, in ...

Read more »

Organizing travel

December 19, 2011
By
Organizing travel

Whether travelling to a seminar or conference, or just having a holiday, using a travel organizer can make the process simpler and easier. A good travel organizer keeps all your travel details (flights, hotels, car rentals, meetings, weather forecasts...

Read more »

Christmas Gift to the R Community: The R Journal!

December 19, 2011
By
Christmas Gift to the R Community: The R Journal!

The R Journal Volume 3/2 is available!Get it from here.

Read more »

Optimal regularization for smoothing splines

December 16, 2011
By
Optimal regularization for smoothing splines

In smooth.spline procedure one can use df or spar parameter to control smoothing level. Usually they are not set manually but recently I was asked a question which one of them is a better measure of regularizatio...

Read more »

Psycho dice and Monte Carlo

December 16, 2011
By
Psycho dice and Monte Carlo

Following Pierre’s post on psycho dice, I want here to see by which average margin repeated plays might be called influenced by mind will. The rules are the following (exerpt from the novel Midnight in the Garden of Good and Evil, by John Berendt): You take four dice and call out four numbers between one […]

Read more »

It’s not every day a new statistical method is published in Science

December 16, 2011
By
It’s not every day a new statistical method is published in Science

I’ll have to check this out – Maximal Information-based Nonparametric Exploration (MINE - har har). The link to the paper in Science. I haven’t looked at this very much yet. It appears to be a way of weeding out potential variable relationships f...

Read more »

Forecasting time series using R

December 16, 2011
By
Forecasting time series using R

I gave this talk on Forecasting time series using R for the Melbourne Users of R Network (MelbURN) on Thursday 27 October 2011. Slides Examples Abstract I look at the various facilities for time series forecasting available in R, concentrating on the f...

Read more »

Query a MySQL Database from R using RMySQL

December 15, 2011
By
Query a MySQL Database from R using RMySQL

I use this all the time, and the setup is dead simple. Follow the code below to load the RMySQL package, connect to a database (here the UCSC genome browser's public MySQL instance), set up a function to make querying easier, and query the database to ...

Read more »

Galaxy Project Group on CiteULike and Mendeley

December 15, 2011
By
Galaxy Project Group on CiteULike and Mendeley

The Galaxy Project started using CiteULike to organize papers that are about, use, or reference Galaxy. The Galaxy CiteULike group is open to any CUL user, and once you join, you can add papers to the group, assign tags, and rate papers. While no...

Read more »

Cyclic and seasonal time series

December 14, 2011
By
Cyclic and seasonal time series

These terms get confused all the time (e.g., this question on CrossValidated.com), and so I thought it might be helpful to try to summarize the distinction and some of the associated models.Definitions A seasonal pattern exists when a series is influen...

Read more »

Create maps with maptools R package

December 13, 2011
By
Create maps with maptools R package

Baptiste Coulmont explains on his blog how to use the R package maptools. It is based on shapefile files, for example the ones offered by the French geography agency IGN (at départements and communes level). Some additional material like roads and railways are provided by the OpenStreetMap project, here. For the above map, you need […]

Read more »

A prize of US$3,000,000 for a data mining competition to improve healthcare

December 13, 2011
By
A prize of US$3,000,000 for a data mining competition to improve healthcare

There is a data mining competition with a prize of $3,000,000. The target is to improve healthcare in US by identifying patients who will be admitted to a hospital within the next year, using historical claims data. The algorithm to … Continue reading →

Read more »

Principal Components

December 12, 2011
By

Principal Component Analysis (PCA) is widely used in many data analysis methods as it can reduce the complexity of large interrelated data sets. The easiest way to calculate PCA is by a eigenvalue decomposition namely singular value decomposition (SVD)...

Read more »

How to Become an Efficient and Collaborative R Programmer

December 12, 2011
By

I may want to add a subtitle "Why R-Forge Must Die" (thinking of Barry Rowlingson's talk earlier this year). I have been a GitHub user for two years, and I was mainly influenced by Hadley. Now I even feel a little bit addicted to GitHub (its slogan is ...

Read more »

Why you can not to use statistics to dispute magic

December 10, 2011
By

It is a subtle point that statistical modeling is different than model based science. However, empirical scientists seem to go out of their way to conflate the two before the public (as statistical modeling is easier to perform and model based science is more highly rewarded). It is often claimed that model based science is [...] Related posts: Statistics to English Translation, Part 2a: ’Significant’ Doesn’t Always Mean ’Important’ Statistics…

Read more »

Stability of classification trees

December 9, 2011
By
Stability of classification trees

Classification trees are known to be unstable with respect to training data. Recently I have read an article on stability of classification trees by Briand et al. (2009). They propose a quantitative similarity measure between two trees. The method is i...

Read more »


Subscribe

Email:

  Subscribe