## Example 9.15: Bar chart with error bars ("Dynamite plot")

November 22, 2011
By

The "dynamite plot", a bar chart plotting the a mean with a error bar, is one of the most reviled types of image among statisticians. Reasons to dislike them are numerous, and are nicely summarized here. (Edward Tufte also suggests they be avoided.) ...

## Correlation and R-Squared

November 22, 2011
By

What is R2? In the context of predictive models (usually linear regression), where y is the true outcome, and f is the model’s prediction, the definition that I see most often is: In words, R2 is a measure of how much of the variance in y is explained by the model, f. Under “general conditions”, [...] Related posts: The Simpler Derivation of Logistic Regression Living in A Lognormal World “I…

## Comment on "Racism and Meritocracy"

November 21, 2011
By

November 21, 2011
By

A few weeks ago, J. D. Long gave some interesting advice in a Google+ discussion. He starts out Lunch today with an analyst 13 years my junior made me think about things I wish I had known about the technical analytical profession when I was 25. Here’s some things that popped into my head: The [...]

## randu dataset, part 2

November 19, 2011
By

In my last post I have plotted randu dataset to show that all its points lie on 15 parallel planes. But I was not fully satified with the solution and decided to show this numerically.It can be done in four steps:identifying four points lying...

## Plotting randu dataset

November 19, 2011
By

Recently I have stumbled on help description of randu data from datasets package. It contains pseudorandom numbers that are flawed. Help says that "In three dimensional displays it is evident that the triples fall on 15 paralle...

## Why balloons are better than balls (in urn schemes)

November 18, 2011
By

The below is taken from a work in progress: The Polya urn is a heuristic associated with Dirichlet process mixtures. We present the scheme in a modified format, using balloons instead of balls, where the probability of drawing a balloon from the urn is proportional to its volume. Balloons are preferred because their volume may [...]

## BioMart Gene ID Converter

November 18, 2011
By

BioMart recently got a facelift. I'm not sure if this was always available in the old BioMart, but there's now a link to a gene ID converter that worked pretty well for me for converting S. cerevisiae gene IDs to standard gene names. It looks like the ...

## GEO2R: Web App to Analyze Gene Expression in GEO Datasets Using R

November 17, 2011
By

Gene Expression Omnibus is NCBI's repository for publicly available gene expression data with thousands of datasets having over 600,000 samples with array or sequencing data. You can download data from GEO using FTP, or download and load the data direc...

## Bayesian vs. Frequentist Intervals: Which are more natural to scientists?

November 17, 2011
By

I don't know, of course, because the evidence at hand is based on my experience. But, I'll leave the reader to consider whether these observations generalize. Proponents of Bayesian statistical inference argue that Bayesian credible intervals are more intuitive than the frequentist confidence intervals, because the Bayesian inference is a probability statement about a parameter. [...]

## Review of “Parallel R” by McCallum and Weston

November 16, 2011
By

Introduction This is the first book review I’ve done on this blog, and I don’t intend to make it a regular feature, but I ordered a copy of “Parallel R” a few days ago. It arrived today, and I’m quite disappointed with it, so I wanted to write a quick review to provide some additional [...]

## Power-laws: choose your x and y variables carefully

November 16, 2011
By

This is a follow-up of the post Power of running world records As suggested by Andrew, plotting running world records could benefit from a change of variables. More exactly the use of different variables sheds light on a [now] well-known [to me] sports result provided in a 2000 Nature paper by Sandra Savaglio and Vincenzo […]

## Using PROC CANCORR to solve large scale PLS problem

November 16, 2011
By

Partial Least Square (PLS) is a powerful tool for discriminant analysis with large number of predictors [1].PLS extracts latent factors that maximize the covariance between independent variables and dependent variables. This process is equivalent to Ge...

## Weather forecast and good development practices

November 16, 2011
By

Inspired by this tutorial, I thought that it would be nice to have the possibility to have access to weather forecast directly from the R command line, for example for a personalized start-up message such as the one below:Weather summary for Trieste, F...

## Landscape figures in Sweave

November 15, 2011
By

This post is a quick follow up from my initial article on Sweave to add a note on how to get a plot in landscape orientation to fill the whole page, plus a little example of using BibTex.Just to clarify  my … Continue reading →

## Example 9.14: confidence intervals for logistic regression models

November 15, 2011
By

Recently a student asked about the difference between confint() and confint.default() functions, both available in the MASS library to calculate confidence intervals from logistic regression models. The following example demonstrates that they yield d...

## Seminar on Monte Carlo methods next Tuesday in Paris

November 13, 2011
By

Hey there, A quick post on a one-day seminar on Monte Carlo methods for inverse problems in image and signal processing, that will take place at Telecom ParisTech on Tuesday, November 15th. Details and abstracts are on the seminar’s webpage: http://perso.telecom-paristech.fr/~gfort/GdT/GDRisis.html (for English-reading people, here is a google translated version). The seminar is organised by […]

## Particle filtering and pMCMC using R

November 12, 2011
By

In the previous post I gave a quick introduction to the CRAN R package smfsb, and how it can be used for simulation of Markov processes determined by stochastic kinetic networks. In this post I’ll show how to use data and particle MCMC techniques in order to carry out Bayesian inference for the parameters of [...]

## Applying multiple functions to data frame

November 11, 2011
By

A very typical task in data analysis is calculation of summary statistics for each variable in data frame. Standard lapply or sapply functions work very nice for this but operate only on single function. The problem is that I o...

## Girl Named Florida solutions

November 10, 2011
By

In The Drunkard's Walk, Leonard Mlodinow presents "The Girl Named Florida Problem": "In a family with two children, what are the chances, if one of the children is a girl named Florida, that both children are girls?"I like this problem, and I use ...

## Stochastic Modelling for Systems Biology, second edition

November 9, 2011
By
$Stochastic Modelling for Systems Biology, second edition$

The second edition of my textbook, Stochastic Modelling for Systems Biology was published on 7th November, 2011. One of the new features introduced into the new edition is an R package called smfsb which contains all of the code examples discussed in the text, which allow modelling, simulation and inference for stochastic kinetic models. The [...]

## Example 9.13: Negative binomial regression with proc mcmc

November 8, 2011
By

In practice, data that derive from counts rarely seem to be fit well by a Poisson model; one more flexible alternative is a negative binomial model. In this SAS-only entry, we discuss how proc mcmc can be used for estimation. An overview of support f...

## Sports statistics

November 8, 2011
By

There was an article in the New York Times on Sunday about teaching statistics through sports examples. I personally would avoid sports entirely, as I view the subject to be insufficiently serious. Maybe that’s an indication of my being a terrible instructor of introductory statistics: I don’t care that much what the students are interested [...]