Dirichlet Process, Infinite Mixture Models, and Clustering

April 7, 2013
By
Dirichlet Process, Infinite Mixture Models, and Clustering

The Dirichlet process provides a very interesting approach to understand group assignments and models for clustering effects.   Often time we encounter the k-means approach.  However, it is necessary to have a fixed number of clusters.  Often we encounter situations where we don’t know how many fixed clusters we need.  Suppose we’re trying to identify [...]

Read more »

A quick guide to non-transitive Grime Dice

April 7, 2013
By
A quick guide to non-transitive Grime Dice

A very special package that I am rather excited about arrived in the mail recently. The package contained a set of 6-sided dice. These dice, however, don’t have the standard numbers one to six on their faces. Instead, they have assorted numbers between zero and nine. Here’s the exact configuration: Aside from maybe making for […]

Read more »

Scatterplot charades!

April 7, 2013
By
Scatterplot charades!

What are the x and y-axes here? P.S. Popeye nails it (see comments).

Read more »

X on JLP

April 7, 2013
By

Christian Robert writes on the Jeffreys-Lindley paradox. I have nothing to add to this beyond my recent comments: To me, the Lindley paradox falls apart because of its noninformative prior distribution on the parameter of interest. If you really think there’s a high probability the parameter is nearly exactly zero, I don’t see the point [...]

Read more »

Sync

April 7, 2013
By
Sync

I am listening to the audiobook Sync: How Order Emerges from Chaos in the Universe, Nature, and Daily Lifeby Steven Strogatz which I got from Audible. Obviously a mathematical book is not ideal to listen to, but lacking illustrations I can ma...

Read more »

Travis CI for R?

April 7, 2013
By
Travis CI for R?

I'm always worried about CRAN: a system maintained by FTP and emails from real humans (basically one of Uwe, Kurt or Prof Ripley). I'm worried for two reasons: the number of R packages is growing exponentially; time and time again I see frustrations ...

Read more »

Retirement : simulating wealth with random returns, inflation and withdrawals – Shiny web application

April 6, 2013
By
Retirement : simulating wealth with random returns, inflation and withdrawals – Shiny web application

Today, I want to share the Retirement : simulating wealth with random returns, inflation and withdrawals – Shiny web application (code at GitHub). This application was developed and contributed by Pierre Chretien, I only made minor updates. This is application is a great example of how easy it is to convert your R script into […]

Read more »

Who is allowed to cheat? I.J. Good and that after dinner comedy hour….

April 6, 2013
By
Who is allowed to cheat? I.J. Good and that after dinner comedy hour….

It was from my Virginia Tech colleague I.J. Good (in statistics), who died four years ago (April 5, 2009), at 93, that I learned most of what I call “howlers” on this blog. His favorites were based on the “paradoxes” of stopping rules. “In conversation I have emphasized to other statisticians, starting in 1950, that, […]

Read more »

Calling Jenny Davidson . . .

April 6, 2013
By
Calling Jenny Davidson . . .

Now that you have some free time again, you’ll have to check out these books and tell us if they’re worth reading. Claire Kirch reports: Lizzie Skurnick Books launches in September with the release of Debutante Hill by Lois Duncan. The novel, which was originally published by Dodd, Mead, in 1958, has been out of [...]

Read more »

Bootstrap et régression

April 6, 2013
By
Bootstrap et régression

Lors du dernier cours, on a évoqué l’utilisation du bootstrap pour obtenir des intervalles de confiance sur des prévisions. Je mets en ligne les codes tapés en cours (très sommairement commentés, je peux renvoyer vers des vieux billets du cours ACT6420 pour des compléments). On va travailler sur ma base préférée pour évoquer la régression linéaire (avant de parler triangles de provisionnement, revenons cinq minutes sur des choses simples). >…

Read more »

Worry about correctness and repeatability, not p-values

April 5, 2013
By
Worry about correctness and repeatability, not p-values

In data science work you often run into cryptic sentences like the following: Age adjusted death rates per 10,000 person years across incremental thirds of muscular strength were 38.9, 25.9, and 26.6 for all causes; 12.1, 7.6, and 6.6 for cardiovascular disease; and 6.1, 4.9, and 4.2 for cancer (all P < 0.01 for linear […] Related posts: Level fit summaries can be tricky in R How to test XCOM…

Read more »

Academic Impostor Syndrome

April 5, 2013
By
Academic Impostor Syndrome

This is a little outside my usual blogging oeuvre, but I saw an article in the Chronicle that I really think is worth a read: http://chronicle.com/article/An-Academic-With-Impostor/138231/ It’s something that strongly spoke to my experience as an academic. Methodologists are often required to demonstrate the utility of our method by using it to critique existing research. […]

Read more »

David Brooks writes that technical knowledge—”the statistical knowledge you need to understand what market researchers do, the biological knowledge you need to grasp the basics of what nurses do”—can be “memorized by rote”

April 5, 2013
By
David Brooks writes that technical knowledge—”the statistical knowledge you need to understand what market researchers do, the biological knowledge you need to grasp the basics of what nurses do”—can be “memorized by rote”

The popular New York Times columnist writes: The best part of the rise of online education is that it forces us to ask: What is a university for? . . . My own stab at an answer would be that universities are places where young people acquire two sorts of knowledge, what the philosopher Michael [...]

Read more »

Super-efficiency: “The Nasty, Ugly Little Fact”

April 5, 2013
By
Super-efficiency: “The Nasty, Ugly Little Fact”

Super-efficiency: The Nasty, Ugly Little Fact I just read Steve Stigler’s wonderful article entitled: “The Epic Story of Maximum Likelihood.” I don’t know why I didn’t read this paper earlier. Like all of Steve’s papers, it is at once entertaining and scholarly. I highly recommend it to everyone. As the title suggests, the paper discusses […]

Read more »

US Census Bureau Named a 2013 Computerworld Honors Laureate for Open Data API

April 5, 2013
By
US Census Bureau Named a 2013 Computerworld Honors Laureate for Open Data API

From: http://blog.programmableweb.com/2013/04/04/us-census-bureau-named-a-2013-computerworld-honors-laureate-for-open-data-api/Janet Wagner, April 4th, 2013Computerworld has named the US Census Bureau a 2013 Honors Laureate for the development of the&n...

Read more »

Elites have alcohol problems too!

April 5, 2013
By
Elites have alcohol problems too!

Speaking of Tyler Cowen, I’m puzzled by this paragraph of his: Guns, like alcohol, have many legitimate uses, and they are enjoyed by many people in a responsible manner. In both cases, there is an elite which has absolutely no problems handling the institution in question, but still there is the question of whether the [...]

Read more »

Data science is statistics

April 5, 2013
By
Data science is statistics

When physicists do mathematics, they don’t say they’re doing “number science”. They’re doing math. If you’re analyzing data, you’re doing statistics. You can call it data science or informatics or analytics or whatever, but it’s still statistics. If you say that one kind of data analysis is statistics and another kind is not, you’re not […]

Read more »

Wanna be the next Tyler Cowen? It’s not as easy as you might think!

April 5, 2013
By

Someone told me he ran into someone who said his goal was to be Tyler Cowen. OK, fine, it’s a worthy goal, but I don’t think it’s so easy.

Read more »

Announcing eeptools 0.2

April 5, 2013
By

My R package eeptools has reached version 0.2. As with the last release, this is still a preliminary release which means that functionality is not full, function names and code behavior may still change from version to version, and I am still looking f...

Read more »

List of Bioinformatics Workshops and Training Resources

April 4, 2013
By
List of Bioinformatics Workshops and Training Resources

I frequently get asked to recommend workshops or online learning resources for bioinformatics, genomics, statistics, and programming. I compiled a list of both online learning resources and in-person workshops (preferentially highlighting those where w...

Read more »

When is there “hidden structure in data” to be discovered?

April 4, 2013
By

Michael Collins sent along the following announcement for a talk: Fast learning algorithms for discovering the hidden structure in data Daniel Hsu, Microsoft Research 11am, Wednesday April 10th, Interschool lab, 7th floor CEPSR, Columbia University A major challenge in machine learning is to reliably and automatically discover hidden structure in data with minimal human intervention. [...]

Read more »

The Revolution Will Be Visualized

April 4, 2013
By
The Revolution Will Be Visualized

In the 1970s, it was the protest songs. In the 1980s, it was the anti-war movies. Today, the protest is no longer happening in songs or movies. Today, it’s online, based on data, and using visualization. Gun Deaths It’s a very abstract and yet very clear image: something moves along a trajectory, is suddenly stopped, and drops to the ground. A gun has been fired, somebody has been killed. Periscopic’s…

Read more »

The Revolution Will Be Visualized

April 4, 2013
By
The Revolution Will Be Visualized

In the 1970s, it was the protest songs. In the 1980s, it was the anti-war movies. Today, the protest is no longer happening in songs or movies. Today, it’s online, based on data, and using visualization. Gun Deaths It’s a very abstract and yet very clear image: something moves along a trajectory, is suddenly stopped, and drops to the ground. A gun has been fired, somebody has been killed. Periscopic’s…

Read more »


Subscribe

Email:

  Subscribe