# Posts Tagged ‘ statistics ’

## Principal Components Regression, Pt. 3: Picking the Number of Components

May 30, 2016
By

In our previous note we demonstrated Y-Aware PCA and other y-aware approaches to dimensionality reduction in a predictive modeling context, specifically Principal Components Regression (PCR). For our examples, we selected the appropriate number of principal components by eye. In this note, we will look at ways to select the appropriate number of principal components in … Continue reading Principal Components Regression, Pt. 3: Picking the Number of Components

## On ranger respect.unordered.factors

May 30, 2016
By

It is often said that “R is its packages.” One package of interest is ranger a fast parallel C++ implementation of random forest machine learning. Ranger is great package and at first glance appears to remove the “only 63 levels allowed for string/categorical variables” limit found in the Fortran randomForest package. Actually this appearance is … Continue reading On ranger respect.unordered.factors

## the random variable that was always less than its mean…

May 29, 2016
By

Although this is far from a paradox when realising why the phenomenon occurs, it took me a few lines to understand why the empirical average of a log-normal sample is apparently a biased estimator of its mean. And why conversely the biased plug-in estimator does not appear to present a bias. To illustrate this “paradox” […]

## Principal Components Regression, Pt. 2: Y-Aware Methods

May 23, 2016
By

In our previous note, we discussed some problems that can arise when using standard principal components analysis (specifically, principal components regression) to model the relationship between independent (x) and dependent (y) variables. In this note, we present some dimensionality reduction techniques that alleviate some of those problems, in particular what we call Y-Aware Principal Components … Continue reading Principal Components Regression, Pt. 2: Y-Aware Methods

## occupancy rules

May 22, 2016
By
$occupancy rules$

While the last riddle on The Riddler was rather anticlimactic, namely to find the mean of the number Y of empty bins in a uniform multinomial with n bins and m draws, with solution [which still has a link with e in that the fraction of empty bins converges to e⁻¹ when n=m], this led […]

## ABC random forests for Bayesian parameter inference

May 19, 2016
By

Before leaving Helsinki, we arXived [from the Air France lounge!] the paper Jean-Michel presented on Monday at ABCruise in Helsinki. This paper summarises the experiments Louis conducted over the past months to assess the great performances of a random forest regression approach to ABC parameter inference. Thus validating in this experimental sense the use of […]

## Using MCMC output to efficiently estimate Bayes factors

May 18, 2016
By
$Using MCMC output to efficiently estimate Bayes factors$

As I was checking for software to answer a query on X validated about generic Bayes factor derivation, I came across an R software called BayesFactor, which only applies in regression settings and relies on the Savage-Dickey representation of the Bayes factor when the null hypothesis writes as θ=θ⁰ (and possibly additional nuisance parameters with […]

## Cepstrum, quefrency, and pitch

May 18, 2016
By

John Tukey coined many terms that have passed into common use, such as bit (a shortening of binary digit) and software. Other terms he coined are well known within their niche: boxplot, ANOVA, rootogram, etc. Some of his terms, such as jackknife and vacuum cleaner, were not new words per se but common words he […]

## Principal Components Regression, Pt.1: The Standard Method

May 17, 2016
By

In this note, we discuss principal components regression and some of the issues with it: The need for scaling. The need for pruning. The lack of “y-awareness” of the standard dimensionality reduction step. The purpose of this article is to set the stage for presenting dimensionality reduction techniques appropriate for predictive modeling, such as y-aware … Continue reading Principal Components Regression, Pt.1: The Standard Method

## Sharp-R May Update

May 16, 2016
By

Another update has been made to Sharp-R bringing it to version 1.2. The main changes are, Multiple function files. Built in standard functions. Changes to the XML function files. Bug fixes. We have increased the number of function files that can loaded...