## Performing Logistic Regression in R and SAS

Introduction My statistics education focused a lot on normal linear least-squares regression, and I was even told by a professor in an introductory statistics class that 95% of statistical consulting can be done with knowledge learned up to and including a course in linear regression.  Unfortunately, that advice has turned out to vastly underestimate the […]

## The hype cycle starts again

November 24, 2014
Completely uncritical press coverage of a speculative analysis. But, hey, it was published in the prestigious Proceedings of the National Academy of Sciences (PPNAS)! What could possibly go wrong? Here's what Erik Larsen writes: In a paper published in the Proceedings of the National Academy of Sciences, People search for meaning when they approach a […]

## On deck this week

November 24, 2014
Mon: The hype cycle starts again Tues: I (almost and inadvertently) followed Dan Kahan's principles in my class today, and that was a good thing (would've even been more of a good thing had I realized what I was doing and done it better, but I think I will do better in the future, which […]

## More on Big Data

November 24, 2014
An earlier post, "Big Data the Big Hassle," waxed negative. So let me now give credit where credit is due.What's true in time-series econometrics is that it's very hard to list the third-most-important, or even second-most-important, contribution of Bi...

## R and Data Mining Workshop at AusDM 2014, Brisbane, 27 November

November 24, 2014
R and Data Mining Workshop at AusDM 2014 http://ausdm14.ausdm.org/workshop There will be a half-day workshop on R and Data Mining at the AusDM 2014 conference in Brisbane, Thursday afternoon, 27 November. The workshop will be composed of several sessions on …

## Overview of new features in SAS/IML 13.1

November 24, 2014
SAS software contains a lot of features, and each release adds more.To make sure that you do not miss new features that appear in the SAS/IML language, the word cloud on the right sidebar of my blog contains numbers that relate to SAS or SAS/IML releases. For example, you can […]

## GTrendsR package to Explore Google trending for Field Dependent Terms

November 24, 2014
My friend, Steve Simpson, introduced me to Philippe Massicotte and Dirk Eddelbuettel's GTrendsR GitHub package this week. It's a pretty nifty wrapper to the Google Trends API that enables one to search phrase trends over time. The trend indices that …

## What do Rick Santorum and Andrew Cuomo have in common?

November 24, 2014
Besides family values, that is? Both these politicians seem to have a problem with the National Weather Service: The Senator: Santorum also accused the weather service's National Hurricane Center of flubbing its forecasts for Hurricane Katrina's initial landfall in Florida, despite the days of all-too-prescient warnings the agency had given that the storm would subsequently […]

## an ABC experiment

November 23, 2014
In a cross-validated forum exchange, I used the code below to illustrate the working of an ABC algorithm: Hence I used the median and the mad as my summary statistics. And the outcome is rather surprising, for two reasons: the first one is that the posterior on the mean μ is much wider than […]

## Princeton Abandons Grade Deflation Plan . . .

November 23, 2014
. . . and Kaiser Fung is unhappy. In a post entitled, "Princeton's loss of nerve," Kaiser writes: This development is highly regrettable, and a failure of leadership. (The new policy leaves it to individual departments to do whatever they want.) The recent Alumni publication has two articles about this topic, one penned by President […]

## Slides of keynote speeches, tutorials and panelist presentations at IEEE Big Data 2014

November 23, 2014
Slides of keynote speeches, tutorials and panelist presentations at the 2014 IEEE International Conference on Big Data can be found at the conference website at links below. (1) Keynote speech http://cci.drexel.edu/bigdata/bigdata2014/keynotespeech.htm – Never-Ending Language Learning, Tom Mitchell – E. Fredkin …

## When should I change to snow tires in Netherlands

November 23, 2014
The Royal Netherlands Meteorological Institute has weather information by day for a number of Dutch stations. In this post I want to use those data for a practical problem: when should I switch to winter tires? (or is that snow tires? In any case nails...

## Msc Kvetch: “You are a Medical Statistic”, or “How Medical Care Is Being Corrupted”

November 22, 2014
A NYT op-ed the other day,”How Medical Care Is Being Corrupted” (by Pamela Hartzband and Jerome Groopman, physicians on the faculty of Harvard Medical School), gives a good sum-up of what I fear is becoming the new normal, even under so-called “personalized medicine”.  “It is obsolete for the doctor to approach each patient strictly as an individual; medical decisions should […]

## Statistical computing languages at the RSS

November 22, 2014
On Friday the Royal Statistical Society hosted a meeting on Statistical computing languages, organised by my colleague Colin Gillespie. Four languages were presented at the meeting: Python, Scala, Matlab and Julia. I presented the talk on Scala. The slides I presented are available, in addition to the code examples and instructions on how to run …

## Statistics for Big Data

November 22, 2014
Doctoral programme in cloud computing for big data I've spent much of this year working to establish our new EPSRC Centre for Doctoral Training in Cloud Computing for Big Data, which partly explains the lack of posts on this blog in recent months. The CDT is now established, with 11 students in the first cohort, …

November 22, 2014
Tweeting has its virtues, I'm sure. But over and over I'm seeing these blog vs. twitter battles where the blogger wins. It goes like this: blogger gives tons and tons of evidence, tweeter responds with a content-free dismissal. The most recent example (as of this posting; remember we're on an approx 2-month delay here; yes, […]

## Factor Analysis vs Principal Component Analysis

November 22, 2014
By
$Factor Analysis vs Principal Component Analysis$

Recently some papers discussed in our journal club  are focused on integrative clustering of multiple omics data sets. I found that they are all originated from factor analysis and make use of the advantage of factor analysis over principal component analysis. Let’s recall the model for factor analysis: where () and , with mean and […]

## 50 shades of gray goes pie-chart

November 22, 2014
Rogier Kievit sends in this under the heading, "Worst graph of the year . . . horribly unclear . . . Even the report doesn't have a legend!": My reply: It's horrible but I still think the black-and-white Stroop test remains the worst visual display of all time: What's particularly amusing about the Stroop image […]

## Flowers/Fractals

November 22, 2014
Last week, I attended a "Flower Fest" where I had the opportunity to admire several of the most beautiful and awarded flowers, orchids, and decoration plants. Surprisingly, though, I never had thought of flowers like fractals the way I did this time. Fractals attract lots of interest, especially from mathematicians who actually spend some time […]

## Ordinal probit regression: Transforming polr() parameter values to make them more intuitive

November 21, 2014
In R, the polr function in the MASS package does ordinal probit regression (and ordinal logistic regression, but I focus here on probit). The polr function yields parameter estimates that are difficult to interpret intuitively because they assume a bas...

## “If you’re not using a proper, informative prior, you’re leaving money on the table.”

November 21, 2014
Well put, Rob Weiss. This is not to say that one must always use an informative prior; oftentimes it can make sense to throw away some information for reasons of convenience. But it’s good to remember that, if you do use a noninformative prior, ...

## Three good charts

November 21, 2014
Alberto Cairo, Stephen McDaniel and I were asked about our "favorite" data visualization at the Qlik Conference this week. Stephen wrote up our answers here.

## Free Stanford online course on Statistical Learning (with R) starting on 19 Jan 2015

November 21, 2014
This is an introductory-level course in supervised learning, with a focus on regression and classification methods. The syllabus includes: linear and polynomial regression, logistic regression and linear discriminant analysis; cross-validation and the bootstrap, model selection and regularization methods (ridge and … Continue reading →