## Serious stats book officially published

July 5, 2012
My serious stats book is officially published (in the UK at least). The US release date is next month (August 7th). I'm not sure why the release is later (possibly extra shipping time for the books). The earlier European release date is I suppose compe...

## Weave – Web-based Analysis and Visualization Environment

July 5, 2012
Weave (BETA 1.0) is a new web-based visualization platform designed to enable visualization of any available data by anyone for any purpose. Weave is an application development platform supporting multiple levels of user proficiency – novice to advan...

## Compute the multivariate normal denstity in SAS

July 5, 2012
I've been working on a new book about Simulating Data with SAS. In researching the chapter on simulation of multivariate data, I've noticed that the probability density function (PDF) of multivariate distributions is often specified in a matrix form. Consequently, the multivariate density can usually be computed by using the [...]

## Comment on Falsification

July 4, 2012
The comment box was too small for my reply to Sober on falsification, so I will post it here: I want to understand better Sober’s position on falsification. A pervasive idea to which many still subscribe, myself included, is that the heart of what makes inquiry scientific is the critical attitude: that if a claim [...]

## What is in the Data? The Higgs Boson Explained as Live Infographics

July 4, 2012
Jorge Cham, the cartoonist behind the Piled Higher and Deeper (PhD Comics) has illustrated a very educational interview with Daniel Whiteson, Assistant Professor in Physics and Astronomy at the University of California, Irvine at CERN. The resulting...

## Hamburger Timetable: Train Waiting Times as McDonalds Items to Eat

July 4, 2012
Social media advertising firm DBB, in collaboration with PKP (Polish State Railways), developed a public, electronic timetable that communicates the departure time, destination, platform and waiting time to commuters at Warsaw Central Station. Howev...

## Alternative to Monte Carlo Testing

July 4, 2012
When we backtest a strategy on a portfolio, it is a simple analysis of a single period in time. There are ways to “stress test” a strategy such as monte carlo, random portfolios, or shuffling the returns in a random order. I could never really wrap my head around monte carlo and shuffling the returns … Continue reading →

## Statistics Without Probability (Individual Sequences)

July 4, 2012
$Statistics Without Probability (Individual Sequences)$

Happy Independence Day and Happy Higgs Day. Frequentist statistics treats observations as random and parameters as fixed. Bayesian statistics treats everything as probabilistic. But there is an approach to statistics that removes probability completely. This is the theory of individual sequences which is a subset of online (sequential) learning. You can think of this theory [...]

## “Titanic Thompson: The Man Who Would Bet on Everything”

July 4, 2012
I just finished reading this book by Kevin Cook. Nothing surprising, but it’s got almost all the stories, including many that I’d never previously read. Excellent if you like that sort of thing. It’s just too bad Thompson wasn&#821...

## The power of power

July 4, 2012
Those of you living in the mid-Atlantic region are probably not reading this right now because you don’t have power. I’ve been out of power in my house since last Friday and projections are it won’t come back until the end of the week...

## Three Questions about a Matrix of Coefficient Plots

July 4, 2012
It's Independence Day in the U.S., so I am taking the day off, but I received the following request for advice and thought I'd pass it along to my readers. I wonder if you could help – I am trying to create 9 different coefficient plots , which repr...

## A tutorial on outlier detection techniques

July 4, 2012
by Yanchang Zhao, RDataMining.com There is an excellent tutorial on outlier detection techniques, presented by Hans-Peter Kriegel et al. at ACM SIGKDD 2010. It presents many popular outlier detection algorithms, most of which were published between mid 1990s and 2010, … Continue reading →

July 4, 2012
In the prior post, Factor Attribution 2, I have shown how Factor Attribution can be applied to decompose fund’s returns in to Market, Capitalization, and Value factors, the “three-factor model” of Fama and French. Today, I want to show you a different application of Factor Attribution. First, let’s run Factor Attribution on each the stocks [...]

## Twitter Activity during the 2012 European Football Tournament

July 3, 2012
Nicolas Belmonte, "Data Visualization Scientist" at Twitter, has represented the impact of the #Euro2012 hashtag [twitter.com] during Euro 2012, which is short for the European football tournament (which just finished, btw). The visualization shows da...

## Problems Worth Solving

July 3, 2012
Larry Wasserman is blogging (again), and anyone who finds my writings interesting would do better to read his. Larry's latest post is a call for the biggest unsolved problems in statistics and machine learning. As he says, the current Wikipedia page...

## Books to Read While the Algae Grow in Your Fur, June 2012

July 3, 2012
Attention conservation notice: I have no taste. Warren Fahy, Fragment Mind candy. Predator porn (to use Barbara Ehreneich's phrase), plus biologists making enthusiastic as-you-know-Bob speeches about invasive species and the Cambrian explosion. (...

## Elliott Sober Responds on Foundations of Simplicity

July 3, 2012
Here are a few comments on your recent blog about my ideas on parsimony.  Thanks for inviting me to contribute! You write that in model selection, “’parsimony fights likelihood,’ while, in adequate evolutionary theory, the two are thought to go hand in hand.”  The second part of this statement isn’t correct.  There are sufficient conditions [...]

## Counting gays

July 3, 2012
Gary Gates writes: In a recent study, the author of this article estimated that the self- identified lesbian, gay, bisexual, and transgender (LGBT) community makes up 3.8 percent of the American population. The author’s estimate was far lower than many scholars and activists had contended, and it included a relatively high proportion of persons self-identifying [...]

## International Open Government Data Conference: July 6-12 (Virtual and in Washington DC)

July 3, 2012
July 10-12, 201212:30 GMT/8:30 am ET or convert timeData.gov, the World Bank Open Data Initiative and the Open Development Technology Alliance are joining forces to host the second International Open Government Data Confer...

## An Improvement to Coefficient Plots

July 3, 2012
I recently posted about coefficient plots, discussing my approach and providing some example R code to create the graphs. I had the good fortune of hearing Amanda Driscoll give a talk recently, and she made a small, but really nice improvement to her c...

## Replication and validation in -omics studies – just as important as reproducibility

July 3, 2012
The psychology/social psychology community has made replication a huge focus over the last year. One reason is the recent, public blow-up over a famous study that did not replicate. There are also concerns about the experimental and conceptual design o...

## Combining ggplot Images

July 3, 2012
The ggplot2 package provides an excellent platform for data visualization. One (minor) drawback of this package is that combining ggplot images into one plot, like the par() function does for regular plots, is not a straightforward procedure. Fortunately, R user Stephen Turner has kindly provided a function called “arrange” that does exactly this. The function, [...]