## Classification from scratch, logistic with splines 2/8

May 30, 2018
Today, second post of our series on classification from scratch, following the brief introduction on the logistic regression. Piecewise linear splines To illustrate what’s going on, let us start with a “simple” regression (with only one explanatory variable). The underlying idea is natura non facit saltus, for “nature does not make jumps”, i.e. process governing equations for natural things are continuous. That seems to be a rather strong assumption, because…

## Classification from scratch, logistic regression 1/8

May 30, 2018
Let us start today our series on classification from scratch… The logistic regression is based on the assumption that given covariates , has a Bernoulli distribution,The goal is to estimate parameter . Recall that the heuristics for the use of that function for the probability is that Maximimum of the (log)-likelihood function The log-likelihood is here where . Numerical techniques are based on (numerical) gradient descent to compute the maximum…

## Classification from scratch, overview 0/8

May 29, 2018
Before my course on « big data and economics » at the university of Barcelona in July, I wanted to upload a series of posts on classification techniques, to get an insight on machine learning tools. According to some common idea, machine learning algorithms are black boxes. I wanted to get back on that saying. First of all, isn’t it the case also for regression models, like generalized additive models (with splines)…

## On the interpretation of a regression model

May 18, 2018
Yesterday, NaytaData (aka @NaytaData ) posted a nice graph on reddit, with bicycle traffic and mean air temperature, in Helsinki, Finland, per day, I found that graph interesting, so I did ask for the data (NaytaData kindly sent them to me tonight). df=read.csv("cyclistsTempHKI.csv") library(ggplot2) ggplot(df, aes(meanTemp, cyclists)) + geom_point() + geom_smooth(span = 0.3) But as mentioned by someone on twitter, the interpretation is somehow trivial : people get out on their…

## Les modèles prédictifs peuvent-il être justes ?

January 24, 2018
Dans Nosedive (traduit par le titre Chute Libre en France), le premier épisode de la saison 3 de la série télévisée Black Mirror, on découvre la dystopie d’une une société régie par une « cote personnelle », une note, un score allant de 0 à 5. Dans ce monde, chaque personne note les autres, les mieux notés ayant accès à de meilleurs services (priorité dans les services, meilleurs taux, meilleurs prix, etc).…

## Visualizing effects of a categorical explanatory variable in a regression

January 20, 2018
Recently, I’ve been working on two problems that might be related to semiotic issues in predictive modeling (i.e. instead of a standard regression table, how can we plot coefficient values in a regression model). To be more specific, I have a variable of interest that is observed for several individuals , with explanatory variables , year , in a specific region . Suppose that we have a simple (standard) linear model…

## Holt-Winters with a Quantile Loss Function

January 8, 2018
Exponential Smoothing is an old technique, but it can perform extremely well on real time series, as discussed in Hyndman, Koehler, Ord & Snyder (2008)), when Gardner (2005) appeared, many believed that exponential smoothing should be disregarded because it was either a special case of ARIMA modeling or an ad hoc procedure with no statistical rationale. As McKenzie (1985) observed, this opinion was expressed in numerous references to my paper. Since…

## Justice “actuarielle”, algorithmes… et données

December 19, 2017
Il y a un peu plus d’un an, Virginie Gautron m’envoyait plein de documents sur la “justice actuarielle”, concept que je découvrais alors. Pour comprendre un peu de quoi il s’agit, je peux renvoyer vers pénalité et gestion des risques : vers une justice « actuarielle » en Europe ? qui dresse un état de l’art, en francais. J’avoue avoir un peu mis de côté ensuite (faute de temps), et…

## The myth of interpretability of econometric models

December 9, 2017
There are important discussions nowadays about data modeling, to choose between the “two cultures” (as mentioned in Breiman (2001)), i.e. either econometrics models or machine/statistical learning models. We did discuss this issue recently in Econométrie et Machine Learning (so far only in French) with Emmanuel Flachaire and Antoine Ly. One argument often used by econometricians is the interpretability of econometric models. Or at least the attempt to get an interpretable…

## A la recherche des homonymes…

October 6, 2017
Il y a quelques mois, Baptiste Coulmont m’avait contacté avec une question passionnante (comme chaque fois qu’il me contacte). Baptiste voulait travailler sur la proportion de personnes qui ont un homonymes dans une population de taille donnée, ou sur la probabilité de ne pas avoir d’homonymes dans un bureau de vote, par exemple. Or ce dernier problème n’est pas sans rappeler le “paradoxe des anniversaires“. Dans un groupe de 23 personnes,…