Statistics

Statistics Blogs

Regularization for Long Memory

June 26, 2016
By

Two earlier regularization posts focused on panel data and generic time series contexts. Now consider a specific time-series context: long memory. For exposition consider the simplest case of a pure long memory DGP,  \( (1-L)^d y_t = \varep...

Read more »

Which countries have Regrexit?

June 26, 2016
By
Which countries have Regrexit?

This doesn't have a lot to do with bio part of biostatistics, but is an interesting data analysis that I just started. In the wake of the Brexit vote, there is a petition for a redo. The data for the petition is here, in JSON format.Fortunately, in R, ...

Read more »

Choosing Between the Logit and Probit Models

June 25, 2016
By
Choosing Between the Logit and Probit Models

I've had quite a bit say about Logit and Probit models, and the Linear Probability Model (LPM), in various posts in recent years. (For instance, see here.) I'm not going to bore you by going over old ground again.However, an important question came up ...

Read more »

Observed Info vs. Estimated Expected Info

June 23, 2016
By

All told, after decades of research, it seems that Efron-Hinkley holds up -- observed information dominates estimated expected information MLE standard errors. It's both easier to calculate and more accurate. Let me know if you disagree.[Efron, B. and ...

Read more »

Teaching sampling with dragon data cards

June 23, 2016
By
Teaching sampling with dragon data cards

Data cards for teaching statistics Data cards are a wonderful way for students to get a feel for data. As a University lecturer in the 1990s, I found that students often didn’t understand about the multivariate nature of data. This … Continue reading →

Read more »

y-aware scaling in context

June 22, 2016
By

Nina Zumel introduced y-aware scaling in her recent article Principal Components Regression, Pt. 2: Y-Aware Methods. I really encourage you to read the article and add the technique to your repertoire. The method combines well with other methods and can drive better predictive modeling results. From feedback I am not sure everybody noticed that in … Continue reading y-aware scaling in context

Read more »

Mixed-Frequency High-Dimensional Time Series

June 22, 2016
By

Notice that high dimensions and mixed frequencies go together in time series. (If you're looking at a huge number of series, it's highly unlikely that all will be measured at the same frequency, unless you arbitrarily exclude all frequencies but one.) ...

Read more »

Conditional Dependence and Partial Correlation

June 21, 2016
By

In the multivariate normal case, conditional independence is the same as zero partial correlation.  (See below.) That makes a lot of things a lot simpler.  In particular, determining ordering in a DAG is just a matter of assessing partial cor...

Read more »

Your emails are being read (though I also think this is a hoax)

June 20, 2016
By

CNBC reports that Goldman Sachs flags employee emails based on a long list of "offending" phrases. If an employee types a profanity, apparently a window pops up to confirm that the person really truly wants to say that word. The other objective given is to detect fraudulent behavior. The list they published apparently came from 2008, so very aged, but I think it is a hoax. Many of the terms…

Read more »

Ce que la courbe ROC (et l’AUC) ne raconte pas

June 18, 2016
By
Ce que la courbe ROC (et l’AUC) ne raconte pas

En préparant une intervention pour mardi prochain, j’épluchais les résultats renvoyés pour un exercice, et j’ai eu un résultat assez étrange avec un modèle de classification. J’avais donné la même base cet automne à l’ensae, et j’avais donc près d’une trentaine d’autres modèles, pour comparer (disons plutôt que sur la même base de test, j’ai une trentaine de prévisions). Les observations noires sont celles obtenues cet automne (le trait correspond aux…

Read more »


Subscribe

Email:

  Subscribe