Example 9.15: Bar chart with error bars ("Dynamite plot")

November 22, 2011
By
Example 9.15: Bar chart with error bars ("Dynamite plot")

The "dynamite plot", a bar chart plotting the a mean with a error bar, is one of the most reviled types of image among statisticians. Reasons to dislike them are numerous, and are nicely summarized here. (Edward Tufte also suggests they be avoided.) ...

Read more »

Correlation and R-Squared

November 22, 2011
By
Correlation and R-Squared

What is R2? In the context of predictive models (usually linear regression), where y is the true outcome, and f is the model’s prediction, the definition that I see most often is: In words, R2 is a measure of how much of the variance in y is explained by the model, f. Under “general conditions”, [...] Related posts: The Simpler Derivation of Logistic Regression Living in A Lognormal World “I…

Read more »

Comment on "Racism and Meritocracy"

November 21, 2011
By
Comment on "Racism and Meritocracy"

WARNING:  This article is on a topic that elicits emotional reactions.  I welcome comments, but please make them thoughtful and keep them civil. Eric Ries wrote an article for TechCrunch last week, talking about racism and meritocracy a...

Read more »

Career advice regarding tools

November 21, 2011
By
Career advice regarding tools

A few weeks ago, J. D. Long gave some interesting advice in a Google+ discussion. He starts out Lunch today with an analyst 13 years my junior made me think about things I wish I had known about the technical analytical profession when I was 25. Here’s some things that popped into my head: The [...]

Read more »

randu dataset, part 2

November 19, 2011
By
randu dataset, part 2

In my last post I have plotted randu dataset to show that all its points lie on 15 parallel planes. But I was not fully satified with the solution and decided to show this numerically.It can be done in four steps:identifying four points lying...

Read more »

Plotting randu dataset

November 19, 2011
By
Plotting randu dataset

Recently I have stumbled on help description of randu data from datasets package. It contains pseudorandom numbers that are flawed. Help says that "In three dimensional displays it is evident that the triples fall on 15 paralle...

Read more »

Why balloons are better than balls (in urn schemes)

November 18, 2011
By

The below is taken from a work in progress: The Polya urn is a heuristic associated with Dirichlet process mixtures. We present the scheme in a modified format, using balloons instead of balls, where the probability of drawing a balloon from the urn is proportional to its volume. Balloons are preferred because their volume may [...]

Read more »

BioMart Gene ID Converter

November 18, 2011
By
BioMart Gene ID Converter

BioMart recently got a facelift. I'm not sure if this was always available in the old BioMart, but there's now a link to a gene ID converter that worked pretty well for me for converting S. cerevisiae gene IDs to standard gene names. It looks like the ...

Read more »

GEO2R: Web App to Analyze Gene Expression in GEO Datasets Using R

November 17, 2011
By
GEO2R: Web App to Analyze Gene Expression in GEO Datasets Using R

Gene Expression Omnibus is NCBI's repository for publicly available gene expression data with thousands of datasets having over 600,000 samples with array or sequencing data. You can download data from GEO using FTP, or download and load the data direc...

Read more »

Bayesian vs. Frequentist Intervals: Which are more natural to scientists?

November 17, 2011
By

I don't know, of course, because the evidence at hand is based on my experience. But, I'll leave the reader to consider whether these observations generalize. Proponents of Bayesian statistical inference argue that Bayesian credible intervals are more intuitive than the frequentist confidence intervals, because the Bayesian inference is a probability statement about a parameter. [...]

Read more »

Review of “Parallel R” by McCallum and Weston

November 16, 2011
By
Review of “Parallel R” by McCallum and Weston

Introduction This is the first book review I’ve done on this blog, and I don’t intend to make it a regular feature, but I ordered a copy of “Parallel R” a few days ago. It arrived today, and I’m quite disappointed with it, so I wanted to write a quick review to provide some additional [...]

Read more »

Power-laws: choose your x and y variables carefully

November 16, 2011
By
Power-laws: choose your x and y variables carefully

This is a follow-up of the post Power of running world records As suggested by Andrew, plotting running world records could benefit from a change of variables. More exactly the use of different variables sheds light on a [now] well-known [to me] sports result provided in a 2000 Nature paper by Sandra Savaglio and Vincenzo […]

Read more »

Using PROC CANCORR to solve large scale PLS problem

November 16, 2011
By
Using PROC CANCORR to solve large scale PLS problem

Partial Least Square (PLS) is a powerful tool for discriminant analysis with large number of predictors [1].PLS extracts latent factors that maximize the covariance between independent variables and dependent variables. This process is equivalent to Ge...

Read more »

Weather forecast and good development practices

November 16, 2011
By
Weather forecast and good development practices

Inspired by this tutorial, I thought that it would be nice to have the possibility to have access to weather forecast directly from the R command line, for example for a personalized start-up message such as the one below:Weather summary for Trieste, F...

Read more »

Landscape figures in Sweave

November 15, 2011
By

This post is a quick follow up from my initial article on Sweave to add a note on how to get a plot in landscape orientation to fill the whole page, plus a little example of using BibTex.Just to clarify  my … Continue reading →

Read more »

Example 9.14: confidence intervals for logistic regression models

November 15, 2011
By
Example 9.14: confidence intervals for logistic regression models

Recently a student asked about the difference between confint() and confint.default() functions, both available in the MASS library to calculate confidence intervals from logistic regression models. The following example demonstrates that they yield d...

Read more »

Seminar on Monte Carlo methods next Tuesday in Paris

November 13, 2011
By
Seminar on Monte Carlo methods next Tuesday in Paris

Hey there, A quick post on a one-day seminar on Monte Carlo methods for inverse problems in image and signal processing, that will take place at Telecom ParisTech on Tuesday, November 15th. Details and abstracts are on the seminar’s webpage: http://perso.telecom-paristech.fr/~gfort/GdT/GDRisis.html (for English-reading people, here is a google translated version). The seminar is organised by […]

Read more »

Particle filtering and pMCMC using R

November 12, 2011
By
Particle filtering and pMCMC using R

In the previous post I gave a quick introduction to the CRAN R package smfsb, and how it can be used for simulation of Markov processes determined by stochastic kinetic networks. In this post I’ll show how to use data and particle MCMC techniques in order to carry out Bayesian inference for the parameters of [...]

Read more »

Applying multiple functions to data frame

November 11, 2011
By
Applying multiple functions to data frame

A very typical task in data analysis is calculation of summary statistics for each variable in data frame. Standard lapply or sapply functions work very nice for this but operate only on single function. The problem is that I o...

Read more »

Girl Named Florida solutions

November 10, 2011
By
Girl Named Florida solutions

In The Drunkard's Walk, Leonard Mlodinow presents "The Girl Named Florida Problem": "In a family with two children, what are the chances, if one of the children is a girl named Florida, that both children are girls?"I like this problem, and I use ...

Read more »

Stochastic Modelling for Systems Biology, second edition

November 9, 2011
By
Stochastic Modelling for Systems Biology, second edition

The second edition of my textbook, Stochastic Modelling for Systems Biology was published on 7th November, 2011. One of the new features introduced into the new edition is an R package called smfsb which contains all of the code examples discussed in the text, which allow modelling, simulation and inference for stochastic kinetic models. The [...]

Read more »

Example 9.13: Negative binomial regression with proc mcmc

November 8, 2011
By
Example 9.13: Negative binomial regression with proc mcmc

In practice, data that derive from counts rarely seem to be fit well by a Poisson model; one more flexible alternative is a negative binomial model. In this SAS-only entry, we discuss how proc mcmc can be used for estimation. An overview of support f...

Read more »

Sports statistics

November 8, 2011
By
Sports statistics

There was an article in the New York Times on Sunday about teaching statistics through sports examples. I personally would avoid sports entirely, as I view the subject to be insufficiently serious. Maybe that’s an indication of my being a terrible instructor of introductory statistics: I don’t care that much what the students are interested [...]

Read more »


Subscribe

Email:

  Subscribe