A bit more on impact coding

August 2, 2012
By

Dr. Nina Zumel recently published an excellent tutorial on a modeling technique she called impact coding. It is a pragmatic machine learning technique that has helped with more than one client project. Impact coding is a bridge from Naive Bayes (where each variable’s impact is added without regard to the known effects of any other [...] Related posts: Modeling Trick: Impact Coding of Categorical Variables with Many Levels The equivalence…

Read more »

JSM 2012 in the rearview: reflections on the world’s largest gathering of statisticians

August 2, 2012
By
JSM 2012 in the rearview: reflections on the world’s largest gathering of statisticians

The joint statistical meetings is an annual gathering of several large professional organizations of statisticians, and annually we descend on some city to share ideas. I'm a perennial attendee, and always find the conference valuable in several ways. ...

Read more »

Stephen Senn: Fooling the Patient: an Unethical Use of Placebo? (Phil/Stat/Med)

August 2, 2012
By
Stephen Senn: Fooling the Patient: an Unethical Use of Placebo? (Phil/Stat/Med)

Stephen Senn Competence Centre for Methodology and Statistics CRP Santé Strassen, Luxembourg I think the placebo gets a bad press with ethicists. Many do not seem to understand that the only purpose of a placebo as a control in a randomised clinical trial is to permit the trial to be run as double-blind. A common [...]

Read more »

Nature: Why great Olympic feats raise suspicions

August 2, 2012
By
Nature: Why great Olympic feats raise suspicions

[8/9/2012 update]: Got an automated email from Nature: The following post you wrote on the Nature News website has been hidden by the moderator in accordance with our terms and conditions. To editor Brian Owens: I wonder the so called ‘performance profiling’, as mentioned in the title of this article, could be used to judge [...]

Read more »

“Based on my experiences, I think you could make general progress by constructing a solution to your specific problem.”

August 2, 2012
By

David Radwin writes: I am seeking a statistic measuring an estimate’s reliability or stability as an alternative to the coefficient of variation (CV), also known as the relative standard error. The CV is the standard error of an estimate (proportion, mean, regression coefficient, etc.) divided by the estimate itself, usually expressed as a percentage. For [...]

Read more »

“A Christmas Carol” as applied to plagiarism

August 2, 2012
By

John Mashey sends me this delightful video (not in English but it has subtitles) from the University of Bergen (link comes from this page from Elsevier but I don’t see any direct connection between the controversial academic publisher and the Bergen group). Part of me believes, deep down, that if someone were to send this [...]

Read more »

CFP: AusDM 2012, deadline extended to 31 August 2012

August 2, 2012
By
CFP: AusDM 2012, deadline extended to 31 August 2012

The Tenth Australasian Data Mining Conference (AusDM 2012) Sydney, Australia 5-7 December 2012 http://ausdm12.togaware.com/ Deadline extended to 31 August 2012 The Australasian Data Mining Conference has established itself as the premier Australasian meeting for both practitioners and researchers in data … Continue reading →

Read more »

Communicating, coding and intuition for data scientists

August 2, 2012
By

There is a stimulating conversation going on between Cathy O'Neil (mathbabe) and CMU Prof. Cosma Shalizi about whether "data science" is different from "statistics". Cathy started by posting some comments about "how to hire data scientists" (link). Cosma responded with white is the new black (link): a "modern" statistics undergraduate training would prepare one well for such jobs. Cathy disagreed on several fronts, favoring PhD training (to be able to…

Read more »

Put your pre-prints online

August 2, 2012
By
Put your pre-prints online

I have argued previously that research papers should be posted online at the same time as they are submitted to a journal. Sometimes people claim that journals don’t allow it, which is nonsense. Almost every journal allows it, and many also allow the...

Read more »

Racing Against History

August 2, 2012
By
Racing Against History

In a lovely little 3D movie [nytimes.com] created by the New York Times, we see how every Olympic medalist in the Men's 100-meter freestyle event would stack up to each other. France's Alain Bernard would win (2008), with a wide distribution of Olympi...

Read more »

Livehoods – Behavioural Neighborhood Mapping

August 1, 2012
By
Livehoods – Behavioural Neighborhood Mapping

I'm late to this, but it is certainly worth posting. A team of researchers at CMU have been working on mining foursquare checkin data to determine behaviourally defined neighborhoods ('livehoods'). They have put together a site - livehoods.org - which...

Read more »

Genetic algorithms: a simple R example

August 1, 2012
By
Genetic algorithms: a simple R example

Genetic algorithm is a search heuristic. GAs can generate a vast number of possible model solutions and use these to evolve towards an approximation of the best solution of the model. Hereby it mimics evolution in nature. GA generates a population, the individuals in this population (often called chromosomes) have  Read more »The post Genetic algorithms: a simple R example appeared first on FishyOperations.

Read more »

Overview of Nonparametric Techniques with Elaine Eisenbeisz

August 1, 2012
By
Overview of Nonparametric Techniques with Elaine Eisenbeisz

A distribution of data which is not normal does not mean it is abnormal.  There are many data analysis techniques which do not require the assumption of normality. This webinar will provide information on when it is best to use nonparametric alternatives and provides information on suggested tests to use in lieu of: Independent samples [...]

Read more »

Olympic body match and 1:1 BMI

August 1, 2012
By
Olympic body match and 1:1 BMI

In my morning attempt to read the whole internet before beginning work, I came across a program on the BBC website which allows you to see which Olympic athletes are your body doubles. Or rather, which athletes share your height and weight, and therefore your body mass index. Being a Canadian, I exist in an

Read more »

A book with a bunch of simple graphs

August 1, 2012
By
A book with a bunch of simple graphs

Howard Friedman sent me a new book, The Measure of a Nation, subtitled How to Regain America’s Competitive Edge and Boost Our Global Standing. Without commenting on the substance of Friedman’s recommendations, I’d like to endorse his strategy of presentation, which is to display graph after graph after graph showing the same message over and [...]

Read more »

Cscan: Finding Gene Expression Regulators with ENCODE ChIP-Seq Data

August 1, 2012
By
Cscan: Finding Gene Expression Regulators with ENCODE ChIP-Seq Data

Recently published in Nucleic Acids Research: F. Zambelli, G. M. Prazzoli, G. Pesole, G. Pavesi, Cscan: finding common regulators of a set of genes by using a collection of genome-wide ChIP-seq datasets., Nucleic acids research 40, W510–5 (2012). ...

Read more »

Examples of profiling R code

August 1, 2012
By
Examples of profiling R code

by Yanchang Zhao, RDataMining.com Below are simple examples of profiling R code, which help to find out which steps or functions are most time consuming. It is very useful for improving efficiency of R code. # profiling of running time … Continue reading →

Read more »

How to get data values out of ODS graphics

August 1, 2012
By
How to get data values out of ODS graphics

Many SAS procedures can produce ODS statistical graphics as naturally as they produce tables. Did you know that it is possible to obtain the numbers underlying an ODS statistical graph? This post shows how. Suppose that a SAS procedure creates a graph that displays a curve and that you want [...]

Read more »

Rook rocks! Example with googleVis

August 1, 2012
By
Rook rocks! Example with googleVis

What is Rook?Rook is a web server interface for R, written by Jeffrey Horner, the author of rApache and brew. But unlike other web frameworks for R, such as brew, R.rsp (which I have used in the past1), Rserve, gWidgetWWWW or sumo (which I haven't ...

Read more »

Bare bones beamer

August 1, 2012
By
Bare bones beamer

Beamer is far and away the most popular software for presentations amongst researchers in mathematics and statistics. Most conference and seminar talks I attend these days use beamer. Unfortunately, they all look much the same. I think people find beamer themes too hard to modify easily, so a small number of templates get shared around. Even the otherwise wonderful LaTeX Templates site has no beamer examples. The beamer user guide…

Read more »

Differential Privacy

August 1, 2012
By
Differential Privacy

Differential Privacy Privacy and confidentiality are of great concern in our era of Big Data. In this post, I want to discuss one formal approach to privacy, called differential privacy. The idea was invented by Dwork, McSherry, Nissim and Smith (2006). A nice review by Cynthia Dwork can be found here. 1. What Is It? [...]

Read more »

What’s in a Name? (Gelman’s blog)

August 1, 2012
By
What’s in a Name? (Gelman’s blog)

I just noticed Andrew Gelman’s blog today. ..too good to let pass without quick comment: He asks: What is a Bayesian? Deborah Mayo recommended that I consider coming up with a new name for the statistical methods that I used, given that the term “Bayesian” has all sorts of associations that I dislike (as discussed, [...]

Read more »

Paying survey respondents

July 31, 2012
By

I agree with Casey Mulligan that participants in government surveys should be paid, and I think it should be part of the code of ethics for commercial pollsters to compensate their respondents also. As Mulligan points out, if a survey is worth doing, it should be worth compensating the participants for their time and effort. [...]

Read more »

Subscribe

Email:

  Subscribe