Ghastly R code

September 27, 2011
By
Ghastly R code

My R package, R/qtl, contains about 33k lines of R code (and 21k lines of C code). Some of it is quite good; some of it is terrible. Here’s another example of the terrible. I’ve long needed to revise the function scantwo, for performing a two-dimensional genome scan for pairs of loci. I was looking [...]

Read more »

Gamified

September 26, 2011
By
Gamified

Barry Rowlingson gave an interesting talk at UseR 2011, “Why R-help must die!” He suggested the Q-and-A type sites Stack Overflow (on programming) and Cross Validated (on statistics), both part of Stack Exchange. An interesting feature of these sites is that, in addition to voting up and down on the questions and answers, one accrues [...]

Read more »

ZedGraph Box Plot

September 24, 2011
By

It is possible to get a rudimentary box plot in ZedGraph by combing a HiLowBarItem and ErrorBarItem. That looks something like this, The code below assumes that you have a form with a ZedGraph control on. Here is the Boxplot … Continue reading &#...

Read more »

The equivalence of logistic regression and maximum entropy models

September 23, 2011
By

Nina Zumel recently gave a very clear explanation of logistic regression ( The Simpler Derivation of Logistic Regression ). In particular she called out the central role of log-odds ratios and demonstrated how the “deviance” (that mysterious quantity reported by fitting packages) is both a term in “the pseudo-R^2″ (so directly measures goodness of fit) [...] Related posts: The Simpler Derivation of Logistic Regression Learn Logistic Regression (and beyond) Large…

Read more »

Big data and humility

September 22, 2011
By

One of the challenges with big data is to properly estimate your uncertainty. Often “big data” means a huge amount of data that isn’t exactly what you want. As an example, suppose you have data on how a drug acts in monkeys and you want to infer how the drug acts in humans. There are two [...]

Read more »

Infographic

September 22, 2011
By
Infographic

I saw this great infographic at Andrew Gelman’s blog. I’m inclined to buy an art print. Note the relationship between the central panel and the cover of the latest Amstat News (always a good source for embarrassing figures):

Read more »

A Note on Antoniak’s Approximation for Dirichlet Processes

September 21, 2011
By
A Note on Antoniak’s Approximation for Dirichlet Processes

Antoniak's 1974 article titled Mixtures of Dirichlet Processes with Applications to Bayesian Nonparametric Problems (Annals of Statistics 2(6):1152-1174) is a fundamental work for most modern developments in this area. The article gives two expressions for the expected number of distinct values in a sample of size , drawn from a Dirichlet process-distributed probability distribution with [...]

Read more »

Density exploration and Wang-Landau algorithms [with R package]

September 21, 2011
By
Density exploration and Wang-Landau algorithms [with R package]

Hey, Since a new paper that I’ve co-written has appeared on arXiv, here is a quick post summarizing it. The paper is named: An Adaptive Interacting Wang-Landau Algorithm for Automatic Density Exploration and describes improvements over the Wang-Landau algorithm described by Atchadé and Liu, which is itself a generalization of the work of Wang and [...]

Read more »

Week in Review: Pie Charts and Maps, Reproducibility, and Social versus Physical Science

September 19, 2011
By

Graphs from the week  Interesting graphs of the cost of the war on terror. Jon Peltier looks at pie charts showing who is blamed for the mess in Washington. The Monkey Cage presents a pie chart on grading schools. Is … Continue reading →

Read more »

Reproducibility in Observational Studies

September 16, 2011
By

Earlier, I wrote that the editorial policies of journals encourage findings that cannot be reproduced. This was in part motivated by Andrew Gelman's recent post making me think that journal editors work as statistical significance filters, thus creatin...

Read more »

A problem of significance

September 15, 2011
By
A problem of significance

Several people have drawn my attention to a recent article on a common error in published statistical analyses in neuroscience. Sander Nieuwenhuis, Birte Forstmann and Eric-Jan Wagenmakers published (in Nature Neuroscience) a critique of statistical an...

Read more »

A Structure to Encourage Reproducibility

September 15, 2011
By

Scientists' ability to create reproducible research came up several times over the past few days, mainly on Andrew Gelman's blog. A recent article made its rounds on Twitter and suggests that 50% of academic studies are completely wrong. Andrew Gelman ...

Read more »

The Simpler Derivation of Logistic Regression

September 14, 2011
By
The Simpler Derivation of Logistic Regression

Logistic regression is one of the most popular ways to fit models for categorical data, especially for binary response data. It is the most important (and probably most used) member of a class of models called generalized linear models. Unlike linear regression, logistic regression can directly predict probabilities (values that are restricted to the (0,1) [...] Related posts: The equivalence of logistic regression and maximum entropy models Learn Logistic Regression…

Read more »

Help! We need statistical leadership now! Part I: know your study

September 14, 2011
By
Help! We need statistical leadership now! Part I: know your study

It’s time for statisticians to stand up and speak. This is a time where most scientific papers are “probably wrong,” and many of the reasons listed are statistical in nature. A recent paper in Nature Neuroscience noted a major statistical error i...

Read more »

The statistical significance of interactions

September 13, 2011
By
The statistical significance of interactions

Nature Neuroscience recently pointed out a statistical error that has occurred over and over in science journals. Ben Goldacre explains the error in a little detail, and gives his cynical interpretation. Of course, I’ll apply Hanlon’s razor to the ...

Read more »

R to Word, revisited

September 12, 2011
By
R to Word, revisited

In a previous post (a long time ago) I discussed a way to get a R data frame into a Word table. The code in that entry was essentially a brute force way of wrapping R data in RTF code, but that RTF code was the bare minimum. There was no optimization o...

Read more »

More on Centering in Interactive Models

September 12, 2011
By

I just had another look at Brambor, Clark, and Golder's 2006 PA piece (ungated) on interactive models. On p.71, it claims: "Given that the centered and uncentered models are algebraically equivalent, we can unequivocally state that centering does not ...

Read more »

The Worst Mistake Made on a Dissertation Is…

September 6, 2011
By

I have a saying that I like to tell consulting clients, which is easier said than done, but I think are words for doctoral candidates to live by: "The only bad dissertation draft is one that isn't turned-in." The most common factor that unnecessarily slows progress on a dissertation proposal or defense is a propensity to strive for the perfect draft. As a graduate student, we all fantasized of turning-in…

Read more »

The Worst Mistake Made on a Dissertation Is…

September 6, 2011
By

I have a saying that I like to tell consulting clients, which is easier said than done, but I think are words for doctoral candidates to live by: "The only bad dissertation draft is one that isn't turned-in." The most common factor that unnecessarily slows progress on a dissertation proposal or defense is a propensity to strive for the perfect draft. As a graduate student, we all fantasized of turning-in…

Read more »

Bayes isn’t magic

September 6, 2011
By

If a study is completely infeasible using traditional statistical methods, Bayesian methods are probably not going to rescue it. Bayesian methods can’t squeeze blood out of a turnip. The Bayesian approach to statistics has real advantages, but so...

Read more »

Reworked example gallery for scikit-learn

September 4, 2011
By
Reworked example gallery for scikit-learn

I've been working lately in improving the scikit-learn example gallery to show also a small thumbnail of the plotted result. Here is what the gallery looks like now: And the real thing should be already displayed in the development-documentation. The ...

Read more »

An enhanced Kaplan-Meier plot, updated

September 1, 2011
By
An enhanced Kaplan-Meier plot, updated

I’ve updated the R code for the enhanced K-M plot to include additions and improvements by Gil Thomas and Mark Cowley. Thanks fellows for the feedback and updates. http://statbandit.wordpress.com/2011/03/08/an-enhanced-kaplan-meier-plot/

Read more »

BISP7 in Madrid

September 1, 2011
By
BISP7 in Madrid

Hey there, I am currently attending Bayesian Inference in Stochastic Processes 7, hosted by Universidad Carlos III de Madrid. I am going to talk (for a very short 15 minutes) about SMC^2 (arXiv link, google code link) on Saturday. Looking at the conference’s program, I am definitely hoping to interact on closely-related topics with the [...]

Read more »


Subscribe

Email:

  Subscribe