# Posts Tagged ‘ statistics ’

## EasyER Version 1.2

June 10, 2018
By

Version 1.2 of EasyER has now been released. The main change is the addition of a chart builder interface for interactively creating charts. The chart builder plugs into the popular ggplot2 package allowing ggplot2 charts in Excel. Some bugs have also ...

## maximal spacing around order statistics [#2]

June 7, 2018
By

The proposed solution of the riddle from the Riddler discussed here a few weeks ago is rather approximative, in that the distribution of when the n-sample is made of iid Normal variates is (a) replaced with the distribution of one arbitrary minimum and (b) the distribution of the minimum is based on an assumption of […]

June 3, 2018
By

rquery is an R package for specifying data transforms using piped Codd-style operators. It has already shown great performance on PostgreSQL and Apache Spark. rqdatatable is a new package that supplies a screaming fast implementation of the rquery system in-memory using the data.table package. rquery is already one of the fastest and most teachable (due … Continue reading rqdatatable: rquery Powered by data.table

## Computing extreme normal tail probabilities

June 1, 2018
By

Let me say up front that relying on the normal distribution as an accurate model of extreme events is foolish under most circumstances. The main reason to calculate the probability of, say, a 40 sigma event is to show how absurd it is to talk about 40 sigma events. See my previous post on six-sigma […]

## Talking about clinical significance

June 1, 2018
By

In statistical work in the age of big data we often get hung up on differences that are statistically significant (reliable enough to show up again and again in repeated measurements), but clinically insignificant (visible in aggregation, but too small to make any real difference to individuals). An example would be: a diet that changes … Continue reading Talking about clinical significance

## Six sigma events

May 31, 2018
By

I saw on Twitter this afternoon a paraphrase of a quote from Nassim Taleb to the effect that if you see a six-sigma event, that’s evidence that it wasn’t really a six-sigma event. What does that mean? Six sigma means six standard deviations away from the mean of a probability distribution, sigma (σ) being the […]

## Classification from scratch, logistic with splines 2/8

May 30, 2018
By

Today, second post of our series on classification from scratch, following the brief introduction on the logistic regression. Piecewise linear splines To illustrate what’s going on, let us start with a “simple” regression (with only one explanatory variable). The underlying idea is natura non facit saltus, for “nature does not make jumps”, i.e. process governing equations for natural things are continuous. That seems to be a rather strong assumption, because…

## Classification from scratch, logistic regression 1/8

May 30, 2018
By

Let us start today our series on classification from scratch… The logistic regression is based on the assumption that given covariates , has a Bernoulli distribution,The goal is to estimate parameter . Recall that the heuristics for the use of that function for the probability is that Maximimum of the (log)-likelihood function The log-likelihood is here where . Numerical techniques are based on (numerical) gradient descent to compute the maximum…

## Classification from scratch, overview 0/8

May 29, 2018
By

Before my course on « big data and economics » at the university of Barcelona in July, I wanted to upload a series of posts on classification techniques, to get an insight on machine learning tools. According to some common idea, machine learning algorithms are black boxes. I wanted to get back on that saying. First of all, isn’t it the case also for regression models, like generalized additive models (with splines)…

## “Intentions (in your head)” is the code word for “error probabilities (of a procedure)”: Allan Birnbaum’s Birthday

May 27, 2018
By

Today is Allan Birnbaum’s Birthday. Birnbaum’s (1962) classic “On the Foundations of Statistical Inference,” in Breakthroughs in Statistics (volume I 1993), concerns a principle that remains at the heart of today’s controversies in statistics–even if it isn’t obvious at first: the Likelihood Principle (LP) (also called the strong likelihood Principle SLP, to distinguish it from the […]