In my last blog, Big Data, R and SAP HANA: Analyze 200 Million Data Points and Later Visualize Using Google Maps, I analyzed historical airlines performance data set using R and SAP HANA and put the aggregated analysis on Google Maps. Undoub...

One of the key things every statistician needs to learn is how to create informative figures and graphs. Sometimes, it is easy to use off-the-shelf plots like barplots, histograms, or if one is truly desperate a pie-chart. But sometimes the informat...

At this Monday’s Montreal R User Group meeting, Arthur Charpentier gave an interesting talk on the subject of quantile regression. One of the main messages I took away from the workshop was that quantile regression can be used to determine if extreme events are becoming more extreme. The example given was hurricane intensity since 1978.

Hi, for the next GTB meeting at Crest, 3rd May, I will present Peter Orbanz‘ work on Projective limit random probabilities on Polish spaces. It will follow my previous presentation about Bayesian nonparametrics on the Dirichlet process. The article provides a means of constructing any arbitrary prior distribution on the set of probability measures by […]

The meaning of the term ”Biological Replicate” unfortunately often does not get adequately addressed in many publications. “Biological Replicate” can have multiple meanings, depending upon the context of the study. A general definition could be that biological replicates are when the same type of organism is grown/treated under the same conditions. For example, if one [...]

Jeff Leek, Reeves Anderson, and I recently wrote a correspondence to Nature (subscription req.) regarding the Supreme Court decision in Mayo v. Prometheus and the recent Institute of Medicine report related to the Duke Clinical Trials Saga. The bas...

I never got round to doing a post last week as I’ve been sidetracked by a plethora of free courses being offered. The Stanford professors that ran the AI course I did last year are now offering courses through udacity.com. … Continue readin...

M. A. Álvarez, L. Rosasco and N. D. Lawrence, Kernels for vector-valued functions: a review, tech report, 2011. A. Argyriou, M. Pontil, and C.A. Micchelli, When is there a representer theorem? Vector versus matrix regularizers, Journal of Machine Learning Research, 10:2507-2529, 2009. G. Bakir, T. Hofmann, B. Schölkopf, A. Smola, B. Taskar and S. Vishwanathan (Eds.), Predicting Structured Data, MIT [...]

My newest project is a Python library for monitoring memory consumption of arbitrary process, and one of its most useful features is the line-by-line analysis of memory usage for Python code. I wrote a basic prototype six months ago after being surpris...

Information Geometry is applying differential geometry to families of probability distributions, and so to statistical models. Information does however play two roles in it: Kullback-Leibler information, or relative entropy, features as a measure of divergence (not quite a metric, because it’s asymmetric), and Fisher information takes the role of curvature. One very nice thing about [...]

Analyzing transactions in quantstrat This post will be part 1 of a follow up to the original post, Simple Moving Average Strategy with a Volatility Filter. In this follow up, I will take a closer look at the individual trades of each strategy. This may provide valuable information to explain the difference in performance of the SMA … Continue reading →

Please tell me what stats terms you think are the most confusing! Please order the terms you choose, according to how confusing they are (with #1 being most confusing). The results will dictate what topics are covered in future blogs! Blog entries for Confusing Stats Terms #10, #9, and #8 are already posted, so I'm only asking for terms #7 through #1. Thanks for your input! http://www.statsmakemecry.com/confusing-stats-terms/

As newspaper graphics go, scatterplots are a fairly advanced technique. They tend to show a reasonably large amount of data as single points, and they require the reader to have an idea what to look for. Most newspapers never bother using scatterplots for that reason, which is really too bad. With some explanation, a scatterplot can be a very effective means of displaying data, and in particular to allow the…