Posts Tagged ‘ statistics ’

Principal Components Regression, Pt. 3: Picking the Number of Components

May 30, 2016
By
Principal Components Regression, Pt. 3: Picking the Number of Components

In our previous note we demonstrated Y-Aware PCA and other y-aware approaches to dimensionality reduction in a predictive modeling context, specifically Principal Components Regression (PCR). For our examples, we selected the appropriate number of principal components by eye. In this note, we will look at ways to select the appropriate number of principal components in … Continue reading Principal Components Regression, Pt. 3: Picking the Number of Components

Read more »

On ranger respect.unordered.factors

May 30, 2016
By
On ranger respect.unordered.factors

It is often said that “R is its packages.” One package of interest is ranger a fast parallel C++ implementation of random forest machine learning. Ranger is great package and at first glance appears to remove the “only 63 levels allowed for string/categorical variables” limit found in the Fortran randomForest package. Actually this appearance is … Continue reading On ranger respect.unordered.factors

Read more »

the random variable that was always less than its mean…

May 29, 2016
By
the random variable that was always less than its mean…

Although this is far from a paradox when realising why the phenomenon occurs, it took me a few lines to understand why the empirical average of a log-normal sample is apparently a biased estimator of its mean. And why conversely the biased plug-in estimator does not appear to present a bias. To illustrate this “paradox” […]

Read more »

Principal Components Regression, Pt. 2: Y-Aware Methods

May 23, 2016
By
Principal Components Regression, Pt. 2: Y-Aware Methods

In our previous note, we discussed some problems that can arise when using standard principal components analysis (specifically, principal components regression) to model the relationship between independent (x) and dependent (y) variables. In this note, we present some dimensionality reduction techniques that alleviate some of those problems, in particular what we call Y-Aware Principal Components … Continue reading Principal Components Regression, Pt. 2: Y-Aware Methods

Read more »

occupancy rules

May 22, 2016
By
occupancy rules

While the last riddle on The Riddler was rather anticlimactic, namely to find the mean of the number Y of empty bins in a uniform multinomial with n bins and m draws, with solution [which still has a link with e in that the fraction of empty bins converges to e⁻¹ when n=m], this led […]

Read more »

ABC random forests for Bayesian parameter inference

May 19, 2016
By
ABC random forests for Bayesian parameter inference

Before leaving Helsinki, we arXived [from the Air France lounge!] the paper Jean-Michel presented on Monday at ABCruise in Helsinki. This paper summarises the experiments Louis conducted over the past months to assess the great performances of a random forest regression approach to ABC parameter inference. Thus validating in this experimental sense the use of […]

Read more »

Using MCMC output to efficiently estimate Bayes factors

May 18, 2016
By
Using MCMC output to efficiently estimate Bayes factors

As I was checking for software to answer a query on X validated about generic Bayes factor derivation, I came across an R software called BayesFactor, which only applies in regression settings and relies on the Savage-Dickey representation of the Bayes factor when the null hypothesis writes as θ=θ⁰ (and possibly additional nuisance parameters with […]

Read more »

Cepstrum, quefrency, and pitch

May 18, 2016
By
Cepstrum, quefrency, and pitch

John Tukey coined many terms that have passed into common use, such as bit (a shortening of binary digit) and software. Other terms he coined are well known within their niche: boxplot, ANOVA, rootogram, etc. Some of his terms, such as jackknife and vacuum cleaner, were not new words per se but common words he […]

Read more »

Principal Components Regression, Pt.1: The Standard Method

May 17, 2016
By
Principal Components Regression, Pt.1: The Standard Method

In this note, we discuss principal components regression and some of the issues with it: The need for scaling. The need for pruning. The lack of “y-awareness” of the standard dimensionality reduction step. The purpose of this article is to set the stage for presenting dimensionality reduction techniques appropriate for predictive modeling, such as y-aware … Continue reading Principal Components Regression, Pt.1: The Standard Method

Read more »

Sharp-R May Update

May 16, 2016
By

Another update has been made to Sharp-R bringing it to version 1.2. The main changes are, Multiple function files. Built in standard functions. Changes to the XML function files. Bug fixes. We have increased the number of function files that can loaded...

Read more »


Subscribe

Email:

  Subscribe