Sample mix-ups in datasets from large studies are more common than you think

May 1, 2012
By

If you have analyzed enough high throughput data you have seen it before: a male sample that is really a female, a liver that is a kidney, etc… As the datasets I analyze get bigger I see more and more sample mix-ups. When I find a couple of  sam...

Read more »

Big data is easy

May 1, 2012
By

Big data is easy; big models are hard. If you just wanted to use simple models with tons of data, that would be easy. You could resample the data, throwing some of it away until you had a quantity of…Read more ›

Read more »

How to Make HTML5 Slides with knitr

May 1, 2012
By
How to Make HTML5 Slides with knitr

One week ago I made an early announcement about the markdown support in the knitr package and RStudio, and now the version 0.5 of knitr is on CRAN, so I'm back to show you how I made the HTML5 slides. For those who are not familiar with markdown, you m...

Read more »

Useful for referring–4-30-2012

May 1, 2012
By
Useful for referring–4-30-2012

LDA explained Counting the total number of… Significance Test for Kendall’s Tau-b dimension reduction in ABC [a review's review] 9 essential LaTeX packages everyone should use Linguistic Notation Inside of R Plots! about knitr knitr Elegant, flexible and fast dynamic report generation with R knitr Performance Report-Attempt 1 knitr Performance Report-Attempt 2 Question: Why you need perl/python if you [...]

Read more »

Volatility Position Sizing to improve Risk Adjusted Performance

May 1, 2012
By
Volatility Position Sizing to improve Risk Adjusted Performance

Today I want to show how to use Volatility Position Sizing to improve strategy’s Risk Adjusted Performance. I will use the Average True Range (ATR) as a measure of Volatility and will increase allocation during low Volatility periods and will decrease allocation during high Volatility periods. Following are two good references that explain these strategy [...]

Read more »

The Need for paste2 (part III)

May 1, 2012
By
The Need for paste2 (part III)

Final installment:  Part III of a multi part blog on the paste2 function… In my first post on the paste2 function I promised proof of a few practical uses.  In Part II of this series we looked at using paste2 … Continue reading →

Read more »

Simple Moving Average Strategy with a Volatility Filter: Follow-Up Part 2

May 1, 2012
By
Simple Moving Average Strategy with a Volatility Filter: Follow-Up Part 2

In the Follow-Up Part 1, I explored some of the functions in the quantstrat package that allowed us to drill down trade by trade to explain the difference in performance of the two strategies. By doing this, I found that my choice of a volatility measure may not have been the best choice. Although the … Continue reading →

Read more »

How to Make HTML5 Slides with knitr

How to Make HTML5 Slides with knitr

One week ago I made an early announcement about the markdown support in the knitr package and RStudio, and now the version 0.5 of knitr is on CRAN, so I’m back to show you how I made the HTML5 slides. For those who are not familiar with markdown, you may read the traditional documentation, but RStudio has a quicker reference (see below). The problem with markdown is that the original…

Read more »

High Resolution Movie Reveals the Infrastructure of Humans on Earth

April 30, 2012
By
High Resolution Movie Reveals the Infrastructure of Humans on Earth

The following movie amazes for its beautiful and high-definition rendition of the presence of human presence on the scale of a whole planet. The movie "Welcome to the Anthropocene" [vimeo.com] developed by global education organization Globaia revea...

Read more »

A disappointing response from @NatureMagazine about folks with statistical skills

April 30, 2012
By

Last week I linked to an ad for a Data Editor position at Nature Magazine. I was super excited that Nature was recognizing data as an important growth area. But the ad doesn’t mention anything about statistical analysis skills; it focuses exclus...

Read more »

Graphing Twitter Attention

April 30, 2012
By
Graphing Twitter Attention

track // microsoft (and games and movies) now includes a simple graph indicating the attention being given to each cluster of posts. This graph shows the total of tweets per hour for all posts in the cluster. Below is an...

Read more »

Example 9.29: the perils of for loops

April 30, 2012
By
Example 9.29: the perils of for loops

A recent exchange on the R-sig-teaching list featured a discussion of how best to teach new students R. The initial post included an exercise to write a function, that given a n, will draw n rows of a triangle made up of "*", noting that for a beginne...

Read more »

Statistical leadership part III–shameless plug for PharmaSUG talk

April 30, 2012
By
Statistical leadership part III–shameless plug for PharmaSUG talk

PharmaSUG is a yearly gathering of SAS programmers who program for the pharmaceutical industry. This year, Dr. Katherine Troyer of REGISTRAT-MAPI will be giving a talk entitled “Giving Data a Voice: Partnering with Medical Writing for Best Reporting ...

Read more »

Teaching code, production code, benchmarks and new languages

April 30, 2012
By
Teaching code, production code, benchmarks and new languages

I’m a bit obsessive with words. May be I should have used learning in the title, rather than teaching code. Or perhaps remembering code. You know? Code where one actually has very clear idea of what is going on; for … Continue reading →

Read more »

The LAG function: Useful for more than time series analysis

April 30, 2012
By
The LAG function: Useful for more than time series analysis

To a statistician, the LAG function (which was introduced in SAS/IML 9.22) is useful for time series analysis. To a numerical analyst and a statistical programmer, the function provides a convenient way to compute quantitites that involve adjacent values in any vector. The LAG function is essentially a "shift operator." [...]

Read more »

Playable Data

April 30, 2012
By
Playable Data

How do you engage people with data? How do you make them care and pay attention and remember anything about a particular piece of data? One way is dressing the data up as an information graphic. Another might be to get people to play a little game with the data. Nick Diakopoulos and colleagues have built a fascinating research prototype of what this might look like. The idea of gamification…

Read more »

Embrace Change

April 30, 2012
By
Embrace Change

I love graduations. At the University of Canterbury the academic staff act as marshals, helping the graduands to be in the right place at the right time in the right order wearing the right clothes and doing the right things. … Continue reading →

Read more »

A matter of compactness

April 30, 2012
By
A matter of compactness

Andrew Gelman may have nominated himself the graphics advisor for the World Happiness Report (link). That would be a very good thing. To kick this off, I re-made the Figures 2.1-2.2.8 in the report, which summarized the findings of the...

Read more »

The Need for paste2 (part II)

April 30, 2012
By
The Need for paste2 (part II)

This is Part II of a multi part blog on the paste2 function… In my first post on the paste2 function I promised a proof of a few practical uses.  The first example I have comes from psychometrics and comes out of … Continue reading →

Read more »

Sunday data/statistics link roundup (4/29)

April 29, 2012
By

Nature genetics has an editorial on the Mayo and Myriad cases. I agree with this bit: “In our opinion, it is not new judgments or legislation that are needed but more innovation. In the era of whole-genome sequencing of highly variable genomes, i...

Read more »

mad statistic

April 29, 2012
By
mad statistic

In the motivating toy example to our ABC model choice paper, we compare summary statistics, mean, median, variance, and… median absolute deviation (mad). The latest is the only one able to discriminate between our normal and Laplace models (as now discussed on Cross Validated!). When rerunning simulations to produce nicer graphical outcomes (for the revision), [...]

Read more »

The Need for paste2 (part I)

April 29, 2012
By
The Need for paste2 (part I)

This is Part I of a multi part blog on the paste2 function… I recently generated a new paste function that takes an unspecified list of equal length variables (a column) or multiple columns of a data frame  and pastes … Continue reading →

Read more »

Animating Schelling’s segregation model

April 29, 2012
By
Animating Schelling’s segregation model

Recent blog post on Animations in R inspired me to write a code that generates animations of simulation model. For this task I have chosen Schelling's segregation model.Having written the code I have found that one year ago a similar code has been...

Read more »


Subscribe

Email:

  Subscribe