If you have analyzed enough high throughput data you have seen it before: a male sample that is really a female, a liver that is a kidney, etc… As the datasets I analyze get bigger I see more and more sample mix-ups. When I find a couple of sam...

If you have analyzed enough high throughput data you have seen it before: a male sample that is really a female, a liver that is a kidney, etc… As the datasets I analyze get bigger I see more and more sample mix-ups. When I find a couple of sam...

Big data is easy; big models are hard. If you just wanted to use simple models with tons of data, that would be easy. You could resample the data, throwing some of it away until you had a quantity of…Read more ›

LDA explained Counting the total number of… Significance Test for Kendall’s Tau-b dimension reduction in ABC [a review's review] 9 essential LaTeX packages everyone should use Linguistic Notation Inside of R Plots! about knitr knitr Elegant, flexible and fast dynamic report generation with R knitr Performance Report-Attempt 1 knitr Performance Report-Attempt 2 Question: Why you need perl/python if you [...]

Today I want to show how to use Volatility Position Sizing to improve strategy’s Risk Adjusted Performance. I will use the Average True Range (ATR) as a measure of Volatility and will increase allocation during low Volatility periods and will decrease allocation during high Volatility periods. Following are two good references that explain these strategy [...]

In the Follow-Up Part 1, I explored some of the functions in the quantstrat package that allowed us to drill down trade by trade to explain the difference in performance of the two strategies. By doing this, I found that my choice of a volatility measure may not have been the best choice. Although the … Continue reading →

One week ago I made an early announcement about the markdown support in the knitr package and RStudio, and now the version 0.5 of knitr is on CRAN, so I’m back to show you how I made the HTML5 slides. For those who are not familiar with markdown, you may read the traditional documentation, but RStudio has a quicker reference (see below). The problem with markdown is that the original…

Last week I linked to an ad for a Data Editor position at Nature Magazine. I was super excited that Nature was recognizing data as an important growth area. But the ad doesn’t mention anything about statistical analysis skills; it focuses exclus...

To a statistician, the LAG function (which was introduced in SAS/IML 9.22) is useful for time series analysis. To a numerical analyst and a statistical programmer, the function provides a convenient way to compute quantitites that involve adjacent values in any vector. The LAG function is essentially a "shift operator." [...]

How do you engage people with data? How do you make them care and pay attention and remember anything about a particular piece of data? One way is dressing the data up as an information graphic. Another might be to get people to play a little game with the data. Nick Diakopoulos and colleagues have built a fascinating research prototype of what this might look like. The idea of gamification…

Nature genetics has an editorial on the Mayo and Myriad cases. I agree with this bit: “In our opinion, it is not new judgments or legislation that are needed but more innovation. In the era of whole-genome sequencing of highly variable genomes, i...

In the motivating toy example to our ABC model choice paper, we compare summary statistics, mean, median, variance, and… median absolute deviation (mad). The latest is the only one able to discriminate between our normal and Laplace models (as now discussed on Cross Validated!). When rerunning simulations to produce nicer graphical outcomes (for the revision), [...]