Unsupervised correction of optical character misrecognition

October 22, 2013
By
Unsupervised correction of optical character misrecognition

For a good overview of what OCR is, check out this overview I found myself cutting the spines off books, again. This time it was because I couldn’t find an e-book copy of ‘Animal Liberation’ anywhere on the net, and I’ve amassed quite a few physical copies--mostly from garage sales--that I could afford to experiment »more

Read more »

Ivy Jew update

October 22, 2013
By

Nurit Baytch posted a document, A Critique of Ron Unz’s Article “The Myth of American Meritocracy”, that is relevant to an ongoing discussion we had on this blog. Baytch’s article begins: In “The Myth of American Meritocracy,” Ron Unz, the publisher of The American Conservative, claimed that Harvard discriminates against non-Jewish white and Asian students […]The post Ivy Jew update appeared first on Statistical Modeling, Causal Inference, and Social Science.

Read more »

Unsampling: how a meaningless word invaded Big Data

October 22, 2013
By
Unsampling: how a meaningless word invaded Big Data

If you know any statistics, you know "sampling". It's the idea of measuring some subset of the population. Using the Law of Large Numbers, you are able to learn from the sample and generalize to the population. Your Stats professor never told you what "unsampling" is. You're not going to find this word in a statistics textbook either. What does it mean? The "un" implies that you can recover the…

Read more »

Quant finance blogs

October 22, 2013
By

What I’ve learned from updating the blogroll. New entries The easy option is to go to The Whole Street which aggregates lots of quant finance blogs. Somehow Bookstaber missed out being on the blogroll before — definitely an oversight. Timely Portfolio was another that I was surprised wasn’t already there. The R Trader talks about … Continue reading →

Read more »

A gentle introduction to learning R

October 22, 2013
By
A gentle introduction to learning R

There are many good resources online for learning R. However, I recently discovered Try R from Code school – which is interactive, goes at a very gentle pace and also looks very pretty: http://tryr.codeschool.com/Filed under: serious stats Tagged...

Read more »

A gentle introduction to learning R

October 22, 2013
By
A gentle introduction to learning R

There are many good resources online for learning R. However, I recently discovered Try R from Code school – which is interactive, goes at a very gentle pace and also looks very pretty: http://tryr.codeschool.com/Filed under: serious stats Tagged...

Read more »

Review: Kölner R Meeting 18 October 2013

October 22, 2013
By
Review: Kölner R Meeting 18 October 2013

The Cologne R user group met last Friday for two talks on split apply combine in R and XLConnect by Bernd Weiß and Günter Faes respectively, before the usual Schnitzel and Kölsch at the Lux.Split apply combine in RThe apply family of functions in R ...

Read more »

"Significance Tests for Adaptive Modelling" (Today at the Statistics Seminar)

October 22, 2013
By

Attention conservation notice: Late notice of a very technical presentation about theoretical statistics in a city you don't live in. Today's speaker needs no introduction for those interested in modern, high-dimensional statistics (but will get an ...

Read more »

Simulation I: Generating Random Variables (Introduction to Statistical Computing)

October 22, 2013
By

Lecture 14: Why simulate? Generating random variables as first step. The built-in R commands: rnorm, runif, etc.; sample. Some uses of sampling: permutation tests; bootstrap standard errors and confidence intervals. Transforming uniformly-distribu...

Read more »

Simulation II: Markov Chains (Introduction to Statistical Computing)

October 22, 2013
By

Lecture 15: Combing multiple dependent random variables in a simulation; ordering the simulation to do the easy parts first. Markov chains as a particular example of doing the easy parts first. The Markov property. How to write a Markov chain simul...

Read more »

machine learning [book review, part 2]

October 21, 2013
By
machine learning [book review, part 2]

The chapter (Chap. 3) on Bayesian updating or learning (a most appropriate term) for discrete data is well-done in Machine Learning, a probabilistic perspective if a bit stretched (which is easy with 1000 pages left!). I like the remark (Section 3.5.3) about the log-sum-exp trick. While lengthy, the chapter (Chap. 4) on Gaussian models has […]

Read more »

The Big Bayes theorem theory

October 21, 2013
By
The Big Bayes theorem theory

While we were eating a forkful of what was supposed to be a frittata, but turned out to be very fluffy mushroom scrambled eggs earlier, we were half watching an episode of The Big Bang Theory. Long story short, my eye was caught by Sheldon ex...

Read more »

Useful Unix/Linux One-Liners for Bioinformatics

October 21, 2013
By
Useful Unix/Linux One-Liners for Bioinformatics

Much of the work that bioinformaticians do is munging and wrangling around massive amounts of text. While there are some "standardized" file formats (FASTQ, SAM, VCF, etc.) and some tools for manipulating them (fastx toolkit, samtools, vcftools, etc.),...

Read more »

Most Popular Girl Names by State over Time

October 21, 2013
By

The following should be catnip for Andrew. It combines (a) statistics on baby names, (b) time series, and (c) statistics broken down by state. All in one really fun animated visualization by Reuben Fischer-Baum: Sixty Years of the Most Popular Names for Girls As Mark Liberman commented in his re-post on Language Log, this data […]The post Most Popular Girl Names by State over Time appeared first on Statistical Modeling,…

Read more »

Lawrence R. Klein, 1920-2013

October 21, 2013
By

I am sad to report that Lawrence R. Klein has passed away. He was in many respects the father of modern econometrics and empirical macroeconomics; indeed his 1980 Nobel Prize citation was "for the creation of econometric models and their application to...

Read more »

Why are the best relievers not used when they are most needed?

October 21, 2013
By

During Saturday's ALCS game 6 the Red Sox's manager John Farrell took out his starter in the 6th inning. They were leading by 1, but had runners on first and second with no outs. This is a hard situation to … Continue reading →

Read more »

The future (and past) of statistical sciences

October 21, 2013
By

In connection with this workshop, I was asked to write a few paragraphs describing my perspective on “the current and near-term future state of the statistical sciences you are most familiar with.” Here’s what I wrote: I think that, at any given time, the field of statistics has a core, but that core changes over […]The post The future (and past) of statistical sciences appeared first on Statistical Modeling, Causal…

Read more »

Bad teacher(s)

October 21, 2013
By
Bad teacher(s)

This morning there has been some frenzy on the UK media (eg here or here) after the publication of a pamphlet by David Willetts, a junior minister for University and Science under the infamous coalition government.The minister's point is...

Read more »

There’s nothing wrong with Eli Manning on this chart

October 21, 2013
By
There’s nothing wrong with Eli Manning on this chart

The Giants QB Eli Manning is in the news for the wrong reason this season. His hometown paper, the New York Times, looked the other way, focusing on one metric that he still excels at, which is longevity. This is...

Read more »

Deriving distributions vs fitting distributions

October 21, 2013
By

Sometimes you can derive a probability distributions from a list of properties it must have. For example, there are several properties that lead inevitably to the normal distribution or the Poisson distribution. Although such derivations are attractive, they don’t apply that often, and they’re suspect when they do apply. There’s often some effect that keeps […]

Read more »

Assign the diagonal elements of a matrix

October 21, 2013
By
Assign the diagonal elements of a matrix

SAS/IML programmers know that the VECDIAG matrix can be used to extract the diagonal elements of a matrix. For example, the following statements extract the diagonal of a 3 x 3 matrix: proc iml; m = {1 2 3, 4 5 6, 7 8 9}; v = vecdiag(m); /* v = {1,5,9} [...]

Read more »

Tracking the 2013 Hurricane Season

October 21, 2013
By
Tracking the 2013 Hurricane Season

With it nearing the end of hurricane season it’s only appropriate to do a brief summary of the activity this year.   It’s been a surprisingly low-key season as far as hurricanes are concerned.  There have been only a few hurricanes and the barometric pressure of any hurricane this season has not even come close […]

Read more »

Review: Isabel Meirelles, Design for Information

October 21, 2013
By
Review: Isabel Meirelles, Design for Information

When I’m asked for a good book about visualization, I usually try to change the subject. There is no book I really love, they all have their issues. But thanks to Isabel Meirelles, I can now give a straight answer: Design for Information. In the interest of full disclosure: I was sent a free copy […]

Read more »


Subscribe

Email:

  Subscribe