Graphs – beauty and truth (with apologies to Keats) A good graph is elegant I really like graphs. I like the way graphs turn numbers into pictures. A good graph is elegant. It uses a few well-placed lines to communicate … Continue reading →

For my final Jerzy Neyman item, here’s the post I wrote for his birthday last year: A local acting group is putting on a short theater production based on a screenplay I wrote: “Les Miserables Citations” (“Those Miserable Quotes”) [1]. The “miserable” citations are those everyone loves to cite, from their early joint 1933 paper: We are inclined to think that as […]

I’ll continue to post Neyman-related items this week in honor of his birthday. This isn’t the only paper in which Neyman makes it clear he denies a distinction between a test of statistical hypotheses and significance tests. He and E. Pearson also discredit the myth that the former is only allowed to report pre-data, fixed error probabilities, and are […]

There are now more than 10,000 R packages available from CRAN, much more if you include those available only on github. So, to be honest, it become difficult to know all of them. But sometimes, you discover a nice function in one of them, and that is really awesome. Consider for instance some (standard) censored lifetime data, n=10000 idx=sample(1:4,size=n,replace=TRUE) pd=LETTERS[idx] lambda=1+(idx-1)/3 t=rexp(n,lambda) x=rexp(n) c=t>x y=pmin(t,x) df=data.frame(time=y,status=c,product=pd) (yes, I will generate…

(or: how to correctly use xgboost from R) R has "one-hot" encoding hidden in most of its modeling paths. Asking an R user where one-hot encoding is used is like asking a fish where there is water; they can’t point to it as it is everywhere. For example we can see evidence of one-hot encoding … Continue reading Encoding categorical variables: one-hot and beyond

I was just reading a paper by Martin and Liu (2014) in which they allude to the “questionable logic of proving H0 false by using a calculation that assumes it is true”(p. 1704). They say they seek to define a notion of “plausibility” that “fits the way practitioners use and interpret p-values: a small p-value means […]

The riddle of this week is about an optimisation of positioning the four digits of a multiplication of two numbers with two digits each and is open to a coding resolution: Four digits are drawn without replacement from {0,1,…,9}, one at a time. What is the optimal strategy to position those four digits, two digits […]

Il y a plusieurs semaines, Ashley Kirk et Patrick Scott publiaient French presidential election: How the polls are shaping up in the race to become president. Force est de constater qu’ils avaient raison: les sondages prennent une place essentielle dans cette période électorale. Avant, une précaution d’usage s’impose, je pense. Les instituts de sondages le rappellent sans cesse, et ils ont raison : leur métier n’est pas véritablement de faire…