# Posts Tagged ‘ statistics ’

## Be careful evaluating model predictions

December 3, 2016
One thing I teach is: when evaluating the performance of regression models you should not use correlation as your score. This is because correlation tells you if a re-scaling of your result is useful, but you want to know if the result in your hand is in fact useful. For example: the Mars Climate Orbiter … Continue reading Be careful evaluating model predictions

## ratio-of-uniforms [#4]

December 1, 2016
Possibly the last post on random number generation by Kinderman and Monahan’s (1977) ratio-of-uniform method. After fiddling with the Gamma(a,1) distribution when a<1 for a while, I indeed figured out a way to produce a bounded set with this method: considering an arbitrary cdf Φ with corresponding pdf φ, the uniform distribution on the set […]

## asymptotically exact inference in likelihood-free models [a reply from the authors]

November 30, 2016
[Following my post of lastTuesday, Matt Graham commented on the paper with force détails. Here are those comments. A nicer HTML version of the Markdown reply below is also available on Github.] Thanks for the comments on the paper! A few additional replies to augment what Amos wrote: This however sounds somewhat intense in that […]

November 30, 2016
Visual storytelling Visualising data helps understanding facts. Sometimes it’s very easy to understand a graph; sometimes it’s necessary to read it and to study it to discover unknown territory. Such graphs are little masterpieces. Here’s one of these and I am sure the authors had more than one iteration and discussion while creating it. The … Continue reading Reading a Picture

## vtreat data cleaning and preparation article now available on arXiv

November 30, 2016
Nina Zumel and I are happy to announce a formal article discussing data preparation and cleaning using the vtreat methodology is now available from arXiv.org as citation arXiv:1611.09477 [stat.AP]. vtreat is an R data.frame processor/conditioner that prepares real-world data for predictive modeling in a statistically sound manner. It prepares variables so that data has fewer … Continue reading vtreat data cleaning and preparation article now available on arXiv

## Votes Populaires et Grands Électeurs

November 28, 2016
La semaine dernière, j’avais mis un billet en ligne où je m’interrogeais sur la pertinence de faire commencer l’axe des ordonnées en 0, quand on regardait les élections américaines, en représentant le nombre de votes obtenus. Mon point était que 0 n’avait pas vraiment de sens, quand on regardait républicains et démocrates (en 2016 en tous cas) car ces deux parties obtiendront toujours plusieurs dizaines de millions de voix, quoi…

## sampling by exhaustion

November 24, 2016
The riddle set by The Riddler of last week sums up as follows: Within a population of size N, each individual in the population independently selects another individual. All individuals selected at least once are removed and the process iterates until one or zero individual is left. What is the probability that there is zero […]

## Les gens bien élevés font commencer les barres en zéro

November 24, 2016
Il y a quelques jours, je mentionnais sur twitter un graphique mis en ligne sur imgur.com, au sujet des trois dernières élections présidentielles américaines, Le premier point est que le graphique a été fait alors que les dépouillements n’étaient pas encore finis. 72 heures après la clôture des bureaux de votes, plusieurs millions de bulletins n’avaient pas été dépouillés, comme me l’ont fait noté François Gourio et Tom Roud. Comme…

## Monty Python generator

November 22, 2016
By some piece of luck I came across a paper by the late George Marsaglia, genial contributor to the field of simulation, and Wai Wan Tang, entitled The Monty Python method for generating random variables. As shown by the below illustration, the concept is to flip the piece H outside the rectangle back inside the […]

## Dominance stochastique, ordre 1 ou ordre 2 ?

November 18, 2016
$H_0:F=G$

Mercredi dernier, on commençait le cours avec une présentation rapide du test de Wilcoxon-Mann-Whitney (parfois appelé test de la somme des rangs, de Wilcoxon). Ce test a été proposé par Frank Wilcoxon en 1945, et (presque en même temps) par Henry Mann et Donald Whitney (en 1947), dans un article intitulé On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other. Le titre est…