Standards for statistical data dissemination: a wish list

March 16, 2012
By
Standards for statistical data dissemination: a wish list

Standards for statistical data dissemination: a wish list View more PowerPoint from Xavier Badosa The digitization of information exchange processes has led in many industries to define standards to be used in the B2B side of the value chain for the c...

Read more »

p curves revisited

March 15, 2012
By
p curves revisited

I finally found some time to take a closer look at p curves. I haven't had a chance to follow-up my simulations (and probably won't for a few weeks if not months), but I have had time to think through the ideas the p curve approach raises based on some...

Read more »

Seductive Causation

March 15, 2012
By
Seductive Causation

Causation is a seductive notion. We want to make meaning out of our world. I love playing “the beeping nose” with little children. I press their nose and it beeps. I press my nose and it whirrs. It fascinates them. … Continue reading →

Read more »

Call for chapters: Data Mining Applications with R

March 15, 2012
By
Call for chapters: Data Mining Applications with R

Data Mining Applications with R A book to be published by Elsevier http://www.RDataMining.com/books/book2 Proposal Submission Deadline: April 30, 2012 Introduction R is one of the most widely used data mining tools in scientific and business applications, among dozens of commercial … Continue reading →

Read more »

Ideas on A Really Fast Statistics Journal

March 15, 2012
By

I was writing comments on the blog post A proposal for a really fast statistics journal, and I realized the comment box was too small to write down my ideas. I like the proposal a lot, and I feel really bad about the current model of submitting and rev...

Read more »

Bayesian statistics made simple

March 15, 2012
By
Bayesian statistics made simple

At PyCon last week I taught a tutorial on Bayesian statistics.  It is based on Chapters 5 and 8 of Think Stats.  Here is the web page I created for the tutorial. And here, courtesy of PyCon and pyvideo.org, is the video.  It's three ho...

Read more »

R code for p curves

March 14, 2012
By
R code for p curves

I have finally got around to posting the R code for my p curve simulation. Those familiar with R will realize how crude it is (I've been caught up with other urgent stuff and had no time to explore further).You are welcome to play with (and improve!) t...

Read more »

Portfolio Optimization: Specify constraints with GNU MathProg language

March 14, 2012
By
Portfolio Optimization: Specify constraints with GNU MathProg language

I have previously described a few examples of portfolio construction: Introduction to Asset Allocation Maximum Loss and Mean-Absolute Deviation risk measures 130/30 Portfolio Construction Minimum Investment and Number of Assets Portfolio Cardinality Constraints Multiple Factor Model – Building 130/30 Index (Update) I created a number of helper functions to simplify process of making the constraints( [...]

Read more »

Colour Bar

March 14, 2012
By

I needed to indicate the change in a signal over time on a single chart and so wanted a colour gradient, ideally something like the Jet colour scheme in MATLAB. In the end I basically converted the MATLAB code into … Continue reading →

Read more »

Video Tip: Convert Gene IDs with Biomart

March 14, 2012
By
Video Tip: Convert Gene IDs with Biomart

I get asked frequently how to convert from one gene identifier to another. This can be tricky, especially when relying on gene symbols, as Will pointed out in a previous post a few years ago. There are several tools that can do this, including DAVID an...

Read more »

March Madness! Wanna Win?

March 14, 2012
By
March Madness! Wanna Win?

Description:Winning percentage of all NCAA Men's Basketball Tournament Champions.Analysis:Down by one, the ball spins in his hand as he dribbles up the floor. With tennis shoes squeaking, he feints left, then right. Glancing up at the clock, he sees on...

Read more »

Redesign by Subtraction

March 13, 2012
By
Redesign by Subtraction

GGD has a new look. I was inspired by Gina Trapani (Smarterware, Lifehacker) to remove any extra lines, links, and other "ink" that doesn't serve any purpose, and I hope the site appears cleaner and easier to read. I also wanted the extra horizont...

Read more »

Shapley-Shubik Power Index in R

March 13, 2012
By
Shapley-Shubik Power Index in R

This spring we have Rector Elections at Warsaw School of Economics. One of my collegues Tomasz Szapiro agreed to start in the elections. This induced me to write Shapley-Shubik Power Index calculation snippet in R.Rector elections in Warsaw School...

Read more »

Example 9.23: Demonstrating proportional hazards

March 13, 2012
By
Example 9.23: Demonstrating proportional hazards

A colleague recently asked after a slide suitable for explaining proportional hazards. In particular, she was concerned that her audience not focus on the time to event or probability of the event. An initial thought was to display the cumulative haz...

Read more »

Changes in life expectancy animated with geo charts

March 12, 2012
By
Changes in life expectancy animated with geo charts

The data of the World Bank is absolutely amazing. I had said this before, but their updated iPhone App gives me a reason to return to this topic. Version 3 of the DataFinder App allows you to visualise the data on your phone, including motion maps, s...

Read more »

IS vs. self-normalised IS

March 11, 2012
By
IS vs. self-normalised IS

I was grading my Master projects this morning and came upon this graph: which compares the variability of an importance-sampling estimator versus its self-normalised alternative… This is an interesting case in that self-normalisation does considerably degrade the quality of the approximation in that setting. In other cases, self-normalisation may bring a clear improvement. (This reminded [...]

Read more »

Measuring Site Engagement: Pages or Sessions

March 11, 2012
By

One of our clients is a large media website that faced a simple question: What is the best way to find the most engaged users on the web site? The goal was to focus a marketing effort on these users.A media web site is challenging, because there is n...

Read more »

Discussion forum added to blog

March 11, 2012
By
Discussion forum added to blog

A discussion forum has been added to the blog. The forum is linked in the right sidebar of the blog.My thanks to Anne Standish for searching out information about how to set up the forum, and directing me to it.

Read more »

Bayesian estimation supersedes the t test

March 10, 2012
By
Bayesian estimation supersedes the t test

[Updated here.]Bayesian estimation for two groups provides complete distributions of credible values for the effect size, group means and their difference, standard deviations and their difference, and the normality of the data. The method handles o...

Read more »

German train monitor provides access to train delay data

March 10, 2012
By
German train monitor provides access to train delay data

The German newspaper Süddeutsche Zeitung (SZ) worked together with OpenDataCity to create an online train monitor of the German network: Zugmonitor. This is another great example of the new form of data journalism. The project provides access to dat...

Read more »

Monkeying with Bayes’ theorem

March 9, 2012
By

In Peter Norvig’s talk The Unreasonable Effectiveness of Data, starting at 37:42, he describes a translation algorithm based on Bayes’ theorem. Pick the English word that has the highest posterior probability as the translation. No surprise here. Then at 38:16 he says something curious. So this is all nice and theoretical and pure, but as well [...]

Read more »

find | xargs … Like a Boss

March 9, 2012
By
find | xargs … Like a Boss

*Edit March 12* Be sure to look at the comments, especially the commentary on Hacker News - you can supercharge the find|xargs idea by using find|parallel instead. --- Do you ever discover a trick to do something better, faster, or easier, and wish yo...

Read more »

Experience on using R to build prediction models in business applications

March 8, 2012
By
Experience on using R to build prediction models in business applications

By Yanchang Zhao, RDataMining.com Building prediction/classification models is one of the most widely-seen data mining tasks in business applications. To share experience on building prediction models with R, I have started a discussion at RDataMining group on LinkedIn with the … Continue reading →

Read more »


Subscribe

Email:

  Subscribe