Posts Tagged ‘ statistics ’

Why do linear prediction confidence regions flare out?

June 26, 2017
By

Suppose you’re tracking some object based on its initial position x0 and initial velocity v0. The initial position and initial velocity are estimated from normal distributions with standard deviations σx and σv. (To keep things simple, let’s assume our object is moving in only one dimension and that the distributions around initial position and velocity […]

Read more »

scala-glm: Regression modelling in Scala

June 21, 2017
By
scala-glm: Regression modelling in Scala

Introduction As discussed in the previous post, I’ve recently constructed and delivered a short course on statistical computing with Scala. Much of the course is concerned with writing statistical algorithms in Scala, typically making use of the scientific and numerical computing library, Breeze. Breeze has all of the essential tools necessary for building statistical algorithms, … Continue reading scala-glm: Regression modelling in Scala

Read more »

Extreme beta distributions

June 20, 2017
By
Extreme beta distributions

A beta probability distribution has two parameters, a and b. You can think of these as the number of successes and failures out of a+b trials. The PDF of a beta distribution is approximately normal if a and b are approximately equal and a + b is large. If a and b are close, they don’t have to be very large for the beta […]

Read more »

wrapr Implementation Update

June 19, 2017
By
wrapr Implementation Update

Introduction The development version of our R helper function wrapr::let() has switched from string-based substitution to abstract syntax tree based substitution (AST based subsitution, or language based substitution). I am looking for some feedback from wrapr::let() users already doing substantial work with wrapr::let(). If you are already using wrapr::let() please test if the current development … Continue reading wrapr Implementation Update

Read more »

Non-Standard Evaluation and Function Composition in R

June 16, 2017
By

In this article we will discuss composing standard-evaluation interfaces (SE) and composing non-standard-evaluation interfaces (NSE) in R. In R the package tidyeval/rlang is a tool for building domain specific languages intended to allow easier composition of NSE interfaces. To use it you must know some of its structure and notation. Here are some details paraphrased … Continue reading Non-Standard Evaluation and Function Composition in R

Read more »

An easy way to accidentally inflate reported R-squared in linear regression models

June 15, 2017
By

Here is an absolutely horrible way to confuse yourself and get an inflated reported R-squared on a simple linear regression model in R. We have written about this before, but we found a new twist on the problem (interactions with categorical variable encoding) which we would like to call out here. First let’s set up … Continue reading An easy way to accidentally inflate reported R-squared in linear regression models

Read more »

Use a Join Controller to Document Your Work

June 13, 2017
By
Use a Join Controller to Document Your Work

This note describes a useful replyr tool we call a "join controller" (and is part of our "R and Big Data" series, please see here for the introduction, and here for one our big data courses). When working on real world predictive modeling tasks in production, the ability to join data and document how you … Continue reading Use a Join Controller to Document Your Work

Read more »

thinning a Markov chain, statistically

June 12, 2017
By
thinning a Markov chain, statistically

Art Owen has arXived a new version of his thinning MCMC paper, where he studies how thinning or subsampling can improve computing time in MCMC chains. I remember quite well the message set by Mark Berliner and Steve MacEachern in an early 1990’s paper that subsampling was always increasing the variance of the resulting estimators. […]

Read more »

Maths trauma can be healed

June 12, 2017
By
Maths trauma can be healed

Maths trauma and earthquakes Trauma is a deeply distressing or disturbing experience. Many people in my home town of Christchurch still suffer from post traumatic stress disorder (PTSD) as a result of our earthquakes five or so years ago. I … Continue reading →

Read more »

Likelihood calculation for the g-and-k distribution

June 10, 2017
By
Likelihood calculation for the g-and-k distribution

    Hello, An example often used in the ABC literature is the g-and-k distribution (e.g. reference [1] below), which is defined through the inverse of its cumulative distribution function (cdf). It is easy to simulate from such distributions by drawing uniform variables and applying the inverse cdf to them. However, since there is no closed-form […]

Read more »


Subscribe

Email:

  Subscribe