## A bit more on impact coding

August 2, 2012
By

Dr. Nina Zumel recently published an excellent tutorial on a modeling technique she called impact coding. It is a pragmatic machine learning technique that has helped with more than one client project. Impact coding is a bridge from Naive Bayes (where each variable’s impact is added without regard to the known effects of any other [...] Related posts: Modeling Trick: Impact Coding of Categorical Variables with Many Levels The equivalence…

## JSM 2012 in the rearview: reflections on the world’s largest gathering of statisticians

August 2, 2012
By

The joint statistical meetings is an annual gathering of several large professional organizations of statisticians, and annually we descend on some city to share ideas. I'm a perennial attendee, and always find the conference valuable in several ways. ...

## Stephen Senn: Fooling the Patient: an Unethical Use of Placebo? (Phil/Stat/Med)

August 2, 2012
By

Stephen Senn Competence Centre for Methodology and Statistics CRP Santé Strassen, Luxembourg I think the placebo gets a bad press with ethicists. Many do not seem to understand that the only purpose of a placebo as a control in a randomised clinical trial is to permit the trial to be run as double-blind. A common [...]

## Nature: Why great Olympic feats raise suspicions

August 2, 2012
By

[8/9/2012 update]: Got an automated email from Nature: The following post you wrote on the Nature News website has been hidden by the moderator in accordance with our terms and conditions. To editor Brian Owens: I wonder the so called ‘performance profiling’, as mentioned in the title of this article, could be used to judge [...]

## “Based on my experiences, I think you could make general progress by constructing a solution to your specific problem.”

August 2, 2012
By

David Radwin writes: I am seeking a statistic measuring an estimate’s reliability or stability as an alternative to the coefficient of variation (CV), also known as the relative standard error. The CV is the standard error of an estimate (proportion, mean, regression coefficient, etc.) divided by the estimate itself, usually expressed as a percentage. For [...]

## “A Christmas Carol” as applied to plagiarism

August 2, 2012
By

John Mashey sends me this delightful video (not in English but it has subtitles) from the University of Bergen (link comes from this page from Elsevier but I don’t see any direct connection between the controversial academic publisher and the Bergen group). Part of me believes, deep down, that if someone were to send this [...]

## CFP: AusDM 2012, deadline extended to 31 August 2012

August 2, 2012
By

The Tenth Australasian Data Mining Conference (AusDM 2012) Sydney, Australia 5-7 December 2012 http://ausdm12.togaware.com/ Deadline extended to 31 August 2012 The Australasian Data Mining Conference has established itself as the premier Australasian meeting for both practitioners and researchers in data … Continue reading →

## Communicating, coding and intuition for data scientists

August 2, 2012
By

There is a stimulating conversation going on between Cathy O'Neil (mathbabe) and CMU Prof. Cosma Shalizi about whether "data science" is different from "statistics". Cathy started by posting some comments about "how to hire data scientists" (link). Cosma responded with white is the new black (link): a "modern" statistics undergraduate training would prepare one well for such jobs. Cathy disagreed on several fronts, favoring PhD training (to be able to…

August 2, 2012
By

I have argued previously that research papers should be posted online at the same time as they are submitted to a journal. Sometimes people claim that journals don’t allow it, which is nonsense. Almost every journal allows it, and many also allow the...

## Racing Against History

August 2, 2012
By

In a lovely little 3D movie [nytimes.com] created by the New York Times, we see how every Olympic medalist in the Men's 100-meter freestyle event would stack up to each other. France's Alain Bernard would win (2008), with a wide distribution of Olympi...

## Livehoods – Behavioural Neighborhood Mapping

August 1, 2012
By

I'm late to this, but it is certainly worth posting. A team of researchers at CMU have been working on mining foursquare checkin data to determine behaviourally defined neighborhoods ('livehoods'). They have put together a site - livehoods.org - which...

## Genetic algorithms: a simple R example

August 1, 2012
By

Genetic algorithm is a search heuristic. GAs can generate a vast number of possible model solutions and use these to evolve towards an approximation of the best solution of the model. Hereby it mimics evolution in nature. GA generates a population, the individuals in this population (often called chromosomes) have  Read more »The post Genetic algorithms: a simple R example appeared first on FishyOperations.

## Overview of Nonparametric Techniques with Elaine Eisenbeisz

August 1, 2012
By

A distribution of data which is not normal does not mean it is abnormal.  There are many data analysis techniques which do not require the assumption of normality. This webinar will provide information on when it is best to use nonparametric alternatives and provides information on suggested tests to use in lieu of: Independent samples [...]

## Olympic body match and 1:1 BMI

August 1, 2012
By

In my morning attempt to read the whole internet before beginning work, I came across a program on the BBC website which allows you to see which Olympic athletes are your body doubles. Or rather, which athletes share your height and weight, and therefore your body mass index. Being a Canadian, I exist in an

## A book with a bunch of simple graphs

August 1, 2012
By

Howard Friedman sent me a new book, The Measure of a Nation, subtitled How to Regain America’s Competitive Edge and Boost Our Global Standing. Without commenting on the substance of Friedman’s recommendations, I’d like to endorse his strategy of presentation, which is to display graph after graph after graph showing the same message over and [...]

## Cscan: Finding Gene Expression Regulators with ENCODE ChIP-Seq Data

August 1, 2012
By

Recently published in Nucleic Acids Research: F. Zambelli, G. M. Prazzoli, G. Pesole, G. Pavesi, Cscan: finding common regulators of a set of genes by using a collection of genome-wide ChIP-seq datasets., Nucleic acids research 40, W510–5 (2012). ...

## Examples of profiling R code

August 1, 2012
By

by Yanchang Zhao, RDataMining.com Below are simple examples of profiling R code, which help to find out which steps or functions are most time consuming. It is very useful for improving efficiency of R code. # profiling of running time … Continue reading →

## How to get data values out of ODS graphics

August 1, 2012
By

Many SAS procedures can produce ODS statistical graphics as naturally as they produce tables. Did you know that it is possible to obtain the numbers underlying an ODS statistical graph? This post shows how. Suppose that a SAS procedure creates a graph that displays a curve and that you want [...]

## Rook rocks! Example with googleVis

August 1, 2012
By

What is Rook?Rook is a web server interface for R, written by Jeffrey Horner, the author of rApache and brew. But unlike other web frameworks for R, such as brew, R.rsp (which I have used in the past1), Rserve, gWidgetWWWW or sumo (which I haven't ...

## Bare bones beamer

August 1, 2012
By

Beamer is far and away the most popular software for presentations amongst researchers in mathematics and statistics. Most conference and seminar talks I attend these days use beamer. Unfortunately, they all look much the same. I think people find beamer themes too hard to modify easily, so a small number of templates get shared around. Even the otherwise wonderful LaTeX Templates site has no beamer examples. The beamer user guide…

## Differential Privacy

August 1, 2012
By
$Differential Privacy$

Differential Privacy Privacy and confidentiality are of great concern in our era of Big Data. In this post, I want to discuss one formal approach to privacy, called differential privacy. The idea was invented by Dwork, McSherry, Nissim and Smith (2006). A nice review by Cynthia Dwork can be found here. 1. What Is It? [...]

## What’s in a Name? (Gelman’s blog)

August 1, 2012
By

I just noticed Andrew Gelman’s blog today. ..too good to let pass without quick comment: He asks: What is a Bayesian? Deborah Mayo recommended that I consider coming up with a new name for the statistical methods that I used, given that the term “Bayesian” has all sorts of associations that I dislike (as discussed, [...]

## Paying survey respondents

July 31, 2012
By

I agree with Casey Mulligan that participants in government surveys should be paid, and I think it should be part of the code of ethics for commercial pollsters to compensate their respondents also. As Mulligan points out, if a survey is worth doing, it should be worth compensating the participants for their time and effort. [...]