## Classifier progress exaggerated?

April 9, 2012
By

Yesterday Simply Statistics linked to a paper with the provocative title Classifier Technology and the Illusion of Progress. I’ve only skimmed the article so far, but here are a few sentences that stood out. In particular, simple methods typically yield performance almost as good as more sophisticated methods, to the extent that the difference in performance [...]

## Example 9.26: More circular plotting

April 9, 2012
By

SAS's Rick Wicklin showed a simple loess smoother for the temperature data we showed here. Then he came back with a better approach that does away with edge effects. Rick's smoothing was calculated and plotted on a cartesian plane. In this entry we'...

## LaTeX templates

April 9, 2012
By

Some of the most popular pages on this site are my LaTeX templates: for a curriculum vitae, a beamer poster, a beamer talk, a Monash University working paper and a Monash University thesis. Almost all new LaTeX users begin with templates, so it is s...

## Lumia Review Cluster

April 8, 2012
By

Briefly, track // microsoft (a buzz tracking site for Microsoft that I'm experimenting with) is currently sporting a large story cluster focused on the reviews for the new (today!) Nokia Lumia 900. [note - I updated this image due to...

## Taking You Back

April 8, 2012
By

Where statistics meet individuals: US Census Bureau publishes the 1940 Census records. http://www.census.gov/1940census/index.html http://www.archives.gov/research/census/1940/start-research.html And lots of interesting infographics http://www.censu...

## What are the distributions on the positive k-dimensional quadrant with parametrizable covariance matrix? (solved)

April 7, 2012
By

Paulo (from the Instituto de Matemática e Estatística, Universidade de São Paulo, Brazil) has posted an answer to my earlier question both as a comment on the ‘Og and as a solution on StackOverflow (with a much more readable LaTeX output). His solution is based on the observation that the multidimensional log-normal distribution still allows [...]

## Negative Binomial Reparameterization

April 7, 2012
By

In a previous post, I showed that direct estimation of the p and r parameters in a negative binomial distribution could involve bad autocorrelation in the MCMC chains, and I suggested that there must be some standard reparameterization to solve the pro...

## RNA-Seq Methods & March Twitter Roundup

April 6, 2012
By

There were lots of interesting developments this month that didn't work their way into a full blog post. Here is an incomplete list of what I've been tweeting about over the last few weeks. But first I want to draw your attention to the latest manuscri...

## Soon: Stats in Your Glasses

April 6, 2012
By

Google’s virtual reality glasses - a project of wearable computing which will become reality by the end of the year? NY Times reported. This could throw us fully into the information flow and widen our horizon, especially if linked to the emerging offer of open data sources … i.e.stats…. And it could also lead to a new and strange …Read More

## The inevitable perversion of measurement

April 6, 2012
By

Supposedly one of the tactics in the fight against obesity is to change how we measure obesity (from BMI to DXA): that's the key message in an LA Times article (link). This is a great read if only because it covers many common problems of measurement systems. In thinking about invented metrics, such as SAT scores, employee performance ratings and teacher ratings, bear in mind they only have names because…

## Real-time Stats: Wind Map

April 6, 2012
By

Amazing! Click on this pcture. Surface wind data comes from the National Digital Forecast Database. These are near-term forecasts, revised once per hour. And the visualization comes from hint.fm

## When to leave insignificant effects in a model

April 5, 2012
By

You may have noticed conflicting advice about whether to leave insignificant effects in a model or take them out in order to simplify the model. One effect of leaving in insignificant predictors is on p-values–they use up precious df in small samples. But if your sample isn’t small, the effect is negligible. The bigger effect [...]

## April is Visualization Challenge Time!

April 5, 2012
By

While there has been some criticism of a particular type of visualization challenge recently, there are many other challenges that are organized well and provide good opportunities for people to work on their skills. Two challenges in particular have caught my attention, and are presented here with the official EagerEyes Quality Seal and Stamp of Approval. California Healthcare Foundation Challenge The California Healthcare Foundation (CHCF) has published a dataset on…

## Improved Excel box plots

April 4, 2012
By

After putting up my first Excel 2007 Add-In last week I thought what use is it to do box plots if the groups of data have to be the same length and can’t be negative? For these reasons I’ve come … Continue reading →

## Enjoy Low Income Tax Rates

April 4, 2012
By

Tax rates were higher in the past...Joe derisively snorted at the pay stub in his hand. Crumpling it into a ball, he wound up like a baseball pitcher and fast-balled the wad of paper across the room. It bounced unsatisfying off the wall ...

## Data Mining Webinar with Peter Bruce, President, Statistics.com

April 4, 2012
By

Data Mining methods lie at the center of the constellation of techniques under the umbrella of “business analytics.”  These techniques deal with analysis of large existing datasets (as opposed to controlled experiments, or sample surveys). This webinar will give an overview of data mining techniques, which include: In predictive modeling, we build a model to [...]

## Resampling Hierarchically Structured Data Recursively

April 4, 2012
By

That's a mouthful! I presented this topic to a group of Vandy statisticians a few days ago. My notes (essentially reproduced in this post) are recorded at the Dept. of Biostatistics wiki: HowToBootstrapCorrelatedData. The presentation covers some bootstrap strategies for hierarchically structured (correlated) data, but focuses on the multi-stage bootstrap; an extension of that described [...]

## Obama administration unveiled a Big Data Research and Development Initiative with \$200 million

April 4, 2012
By

Yanchang Zhao, RDataMining.com Obama administration unveiled a Big Data Research and Development Initiative with \$200 million on March 29, 2012, to improve the ability to extract knowledge and insights from large and complex collections of digital data. Six Federal departments … Continue reading →

## Review: Kölner R Meeting 30 March 2012

April 4, 2012
By

The first Kölner R user meeting was great fun. About 20 useRs had turned up to exchange their ideas, questions and experience with R. Three talks about R & Excel, ggplot2 & XeLaTeX and Dynamical systems with R & simecol had kicked off the evening, wit...

## What are the distributions on the positive k-dimensional quadrant with parametrizable covariance matrix? (bis)

April 3, 2012
By
$What are the distributions on the positive k-dimensional quadrant with parametrizable covariance matrix? (bis)$

Wondering about the question I posted on Friday (on StackExchange, no satisfactory answer so far!), I looked further at the special case of the gamma distribution I suggested at the end. Starting from the moment conditions, and the [corrected, thanks to David Epstein!] solution is (hopefully) given by the system The resolution of this system [...]

## Some videos about the history of Bayes’ rule

April 3, 2012
By

Some videos about the history of Bayes' rule. Update: Be sure to read the comment from Sharon McGrayne (click the comments link at the end of the post).One by Sharon Bertsch McGrayne (32 min.):One by Bill Bryson (5 min.):Update: Be sure to expand the c...

## Transaction Cost and Execution Price functionality in the Backtesting library in the Systematic Investor Toolbox

April 3, 2012
By

I want to introduce the Transaction Cost and Execution Price functionality in the Backtesting library in the Systematic Investor Toolbox. The Transaction Cost is implemented by a commission parameter in the bt.run() function. You may specify the commissions in \$ per share for “share” type backtest and as a percentage of total trade for “weight” [...]

## Workshop in Chicago, May 4

April 3, 2012
By

I'll be doing a workshop at the meeting of the Midwestern Psychological Association in Chicago, Friday May 4. Details can be found here.A list of future and past workshops can be found here.