## Why balloons are better than balls (in urn schemes)

November 18, 2011
The below is taken from a work in progress: The Polya urn is a heuristic associated with Dirichlet process mixtures. We present the scheme in a modified format, using balloons instead of balls, where the probability of drawing a balloon from the urn is proportional to its volume. Balloons are preferred because their volume may [...]

## BioMart Gene ID Converter

November 18, 2011
BioMart recently got a facelift. I'm not sure if this was always available in the old BioMart, but there's now a link to a gene ID converter that worked pretty well for me for converting S. cerevisiae gene IDs to standard gene names. It looks like the ...

## GEO2R: Web App to Analyze Gene Expression in GEO Datasets Using R

November 17, 2011
Gene Expression Omnibus is NCBI's repository for publicly available gene expression data with thousands of datasets having over 600,000 samples with array or sequencing data. You can download data from GEO using FTP, or download and load the data direc...

## Bayesian vs. Frequentist Intervals: Which are more natural to scientists?

November 17, 2011
I don't know, of course, because the evidence at hand is based on my experience. But, I'll leave the reader to consider whether these observations generalize. Proponents of Bayesian statistical inference argue that Bayesian credible intervals are more intuitive than the frequentist confidence intervals, because the Bayesian inference is a probability statement about a parameter. [...]

## Review of “Parallel R” by McCallum and Weston

November 16, 2011
Introduction This is the first book review I’ve done on this blog, and I don’t intend to make it a regular feature, but I ordered a copy of “Parallel R” a few days ago. It arrived today, and I’m quite disappointed with it, so I wanted to write a quick review to provide some additional [...]

## Power-laws: choose your x and y variables carefully

November 16, 2011
This is a follow-up of the post Power of running world records As suggested by Andrew, plotting running world records could benefit from a change of variables. More exactly the use of different variables sheds light on a [now] well-known [to me] sports result provided in a 2000 Nature paper by Sandra Savaglio and Vincenzo […]

## Using PROC CANCORR to solve large scale PLS problem

November 16, 2011
Partial Least Square (PLS) is a powerful tool for discriminant analysis with large number of predictors [1].PLS extracts latent factors that maximize the covariance between independent variables and dependent variables. This process is equivalent to Ge...

## Weather forecast and good development practices

November 16, 2011
Inspired by this tutorial, I thought that it would be nice to have the possibility to have access to weather forecast directly from the R command line, for example for a personalized start-up message such as the one below:Weather summary for Trieste, F...

## Landscape figures in Sweave

November 15, 2011
This post is a quick follow up from my initial article on Sweave to add a note on how to get a plot in landscape orientation to fill the whole page, plus a little example of using BibTex.Just to clarify  my … Continue reading →

## Example 9.14: confidence intervals for logistic regression models

November 15, 2011
Recently a student asked about the difference between confint() and confint.default() functions, both available in the MASS library to calculate confidence intervals from logistic regression models. The following example demonstrates that they yield d...

## Seminar on Monte Carlo methods next Tuesday in Paris

November 13, 2011
Hey there, A quick post on a one-day seminar on Monte Carlo methods for inverse problems in image and signal processing, that will take place at Telecom ParisTech on Tuesday, November 15th. Details and abstracts are on the seminar’s webpage: http://perso.telecom-paristech.fr/~gfort/GdT/GDRisis.html (for English-reading people, here is a google translated version). The seminar is organised by […]

## Particle filtering and pMCMC using R

November 12, 2011
In the previous post I gave a quick introduction to the CRAN R package smfsb, and how it can be used for simulation of Markov processes determined by stochastic kinetic networks. In this post I’ll show how to use data and particle MCMC techniques in order to carry out Bayesian inference for the parameters of [...]

## Applying multiple functions to data frame

November 11, 2011
A very typical task in data analysis is calculation of summary statistics for each variable in data frame. Standard lapply or sapply functions work very nice for this but operate only on single function. The problem is that I o...

## Girl Named Florida solutions

November 10, 2011
In The Drunkard's Walk, Leonard Mlodinow presents "The Girl Named Florida Problem": "In a family with two children, what are the chances, if one of the children is a girl named Florida, that both children are girls?"I like this problem, and I use ...

## Stochastic Modelling for Systems Biology, second edition

November 9, 2011
$Stochastic Modelling for Systems Biology, second edition$

The second edition of my textbook, Stochastic Modelling for Systems Biology was published on 7th November, 2011. One of the new features introduced into the new edition is an R package called smfsb which contains all of the code examples discussed in the text, which allow modelling, simulation and inference for stochastic kinetic models. The [...]

## Example 9.13: Negative binomial regression with proc mcmc

November 8, 2011
In practice, data that derive from counts rarely seem to be fit well by a Poisson model; one more flexible alternative is a negative binomial model. In this SAS-only entry, we discuss how proc mcmc can be used for estimation. An overview of support f...

## Sports statistics

November 8, 2011
There was an article in the New York Times on Sunday about teaching statistics through sports examples. I personally would avoid sports entirely, as I view the subject to be insufficiently serious. Maybe that’s an indication of my being a terrible instructor of introductory statistics: I don’t care that much what the students are interested [...]

## The red-haired girl named Florida

November 7, 2011
In The Drunkard's Walk, Leonard Mlodinow presents "The Girl Named Florida Problem": "In a family with two children, what are the chances, if one of the children is a girl named Florida, that both children are girls?"I like this problem, and I use it on...

## Low rank approximation

November 6, 2011
A little experiment to see what low rank approximation looks like. These are the best rank-k approximations (in the Frobenius norm) to the a natural image for increasing values of k and an original image of rank 512. Python code can be found here. GIF...

## Somebody bet on the Bayes

November 3, 2011
In last week's post I wrote solutions to some of my favorite Bayes's Theorem problems, and posed this new problem: If you meet a man with (naturally) red hair, what is the probability that neither of his parents has red hair?Hints: About 2% of the worl...

## Guide to RNA-seq Analysis in Galaxy

November 1, 2011
James Taylor came to UVA last week and gave an excellent talk on how Galaxy enables transparent and reproducible research in genomics. I'm gearing up to take on several projects that involve next-generation sequencing, and I'm considering installing my...

## Halloween 2011 count

November 1, 2011
We don’t get many kids seeking candy at our house. I’m not sure if there just aren’t many kids in the neighborhood, or if it’s our location (next to the pond, with a big gap before the next house). I decided to keep track. As usual, we bought a huge bag of candy, and we […]

## Example 9.12: simpler ways to carry out permutation tests

October 31, 2011
In a previous entry, as well as section 2.4.3 of the book, we describe how to carry out a 2 group permutation test in SAS as well as with the coin package in R. We demonstrate with comparing the ages of the female and male subjects in the HELP study.I...