Blog Archives

Replace data with measurements

March 26, 2015
By

To tell whether a statement about data is over-hyped, see whether it retains its meaning if you replace data with measurements. So a request like “Please send me the data from your experiment” becomes “Please send me the measurements from your experiment.” Same thing. But rousing statements about the power of data become banal or even […]

Read more »

Fitting a triangular distribution

March 24, 2015
By

Sometimes you only need a rough fit to some data and a triangular distribution will do. As the name implies, this is a distribution whose density function graph is a triangle. The triangle is determined by its base, running between points a and b, and a point c somewhere in between where the altitude intersects the base. […]

Read more »

A subtle way to over-fit

March 17, 2015
By

If you train a model on a set of data, it should fit that data well. The hope, however, is that it will fit a new set of data well. So in machine learning and statistics, people split their data into two parts. They train the model on one half, and see how well it […]

Read more »

Finding the best dose

February 24, 2015
By

In a dose-finding clinical trial, you have a small number of doses to test, and you hope find the one with the best response. Here “best” may mean most effective, least toxic, closest to a target toxicity, some combination of criteria, etc. Since your goal is to find the best dose, it seems natural to compare dose-finding […]

Read more »

Miscellaneous math resources

February 4, 2015
By

Every Wednesday I’ve been pointing out various resources on my web site. So far they’ve all been web pages, but the following are all PDF files. Probability and statistics: How to test a random number generator Predictive probabilities for normal outcomes One-arm binary predictive probability Relating two definitions of expectation Illustrating the error in the […]

Read more »

Probability approximations

January 28, 2015
By

This week’s resource post lists notes on probability approximations. Do we even need probability approximations anymore? They’re not as necessary for numerical computation as they once were, but they remain vital for understanding the behavior of probability distributions and for theoretical calculations. Textbooks often leave out details such as quantifying the error when discussion approximations. The […]

Read more »

More data, less accuracy

January 27, 2015
By

Statistical methods should do better with more data. That’s essentially what the technical term “consistency” means. But with improper numerical techniques, the the numerical error can increase with more data, overshadowing the decreasing statistical error. There are three ways Bayesian posterior probability calculations can degrade with more data: Polynomial approximation Missing the spike Underflow Elementary numerical integration algorithms, […]

Read more »

R resources

December 3, 2014
By

-+*This is the third in my weekly series of posts pointing out resources on this site. This week’s topic is R. R language for programmers Default arguments and lazy evaluation in R Distributions in R Moving data between R and Excel via the clipboard Sweave: First steps toward reproducible analyses Troubleshooting Sweave Regular expressions in […]

Read more »

Random probability tweets

December 3, 2014
By

-+*For the next few weeks, I’ve scheduled @ProbFact tweets to come out at random times. They will follow a Poisson distribution with an average of two per day. (Times are truncated to multiples of 5 minutes because my scheduling software requires that.)  

Read more »

First two impressions of statistics

November 25, 2014
By

-+*When I was a postdoc I asked a statistician a few questions and he gave me an overview of his subject. (My area was PDEs; I knew nothing about statistics.) I remember two things that he said. A big part of being a statistician is knowing what to do when your assumptions aren’t met, because […]

Read more »


Subscribe

Email:

  Subscribe