## essential cover!

October 23, 2013
Our book is nearly out..! The Springer webpage is ready, we have sent the proofs back, amazon is missing has now included the above picture, things are moving towards the publication date, supposed to be November 30. Just in time for Christmas! And not too early given that we packed off in early February… Filed under: […]

## PubMed Commons: A system for commenting on articles in PubMed

October 23, 2013
Rob “Lasso” Tibshirani writes: We all read a lot of papers and often have useful things to say about them, but there is no systematic way to do this ­ lots of journals have commenting systems, but they’re clunky, and, most importantly, they’re scattered across thousands of sites. Journals don’t encourage critical comments from readers, […]The post PubMed Commons: A system for commenting on articles in PubMed appeared first on…

## Output percentiles of multiple variables in a tabular format

October 23, 2013
A challenge for statistical programmers is getting data into the right form for analysis. For graphing or analyzing data, sometimes the "wide format" (each subject is represented by one row and many variables) is required, but other times the "long format" (observations for each subject span multiple rows) is more [...]

## GLM, non-linearity and heteroscedasticity

October 23, 2013
$Y_i=\beta_0+\beta_1 X_i +\varepsilon_i$

Last week in the non-life insurance course, we’ve seen the theory of the Generalized Linear Models, emphasizing the two important components the link function (which is actually the key component in predictive modeling) the distribution, or the variance function Just to illustrate, consider my favorite dataset ­lin.mod = lm(dist~speed,data=cars) A linear model means here where the residuals are assumed to be centered, independent, and with identical variance. If we visualize that linear…

## Farewell to d8taplex.com

October 22, 2013
Last night I shutdown d8taplex.com. This was a site that I used to demonstrate a number of experimental systems that I'd been playing with. These included: A search engine for tabular data (indexing over 1 million time series) A specialized...

## PubMed Commons: One post-publication peer review forum to rule them all?

October 22, 2013
Several post-publication peer review forums already exist, such as Faculty of 1000 or PubPeer, that facilitate discussion of papers after they have already been published. F1000 only allows a small number of "faculty" to comment on articles, and access...

## Blog posts that impact real science – software review and GTEX

October 22, 2013
There was a flurry of activity on social media yesterday surrounding a blog post by Lior Pachter. He was speaking about the GTEX project - a large NIH funded project that has the goal of understanding expression variation within and … Continue reading →

October 22, 2013
## Knoxville R User’s Group Meeting November 1

October 22, 2013
The next meeting of the Knoxville R User’s Group will consist of four 20-minute talks followed by an open planning session. It will take place on Friday, November 1, from 2:00 p.m. to 4:00 p.m. at The University of Tennessee, … Continue reading →

## Frightfully Boring? Not at all!

October 22, 2013
Statistical information is frightfully boring, it doesn’t regard me as a person! Yes and no. Yes, official statistics is not …Continue reading →

## Unsupervised correction of optical character misrecognition

October 22, 2013
For a good overview of what OCR is, check out this overview I found myself cutting the spines off books, again. This time it was because I couldn’t find an e-book copy of ‘Animal Liberation’ anywhere on the net, and I’ve amassed quite a few physical copies--mostly from garage sales--that I could afford to experiment »more

## Ivy Jew update

October 22, 2013
Nurit Baytch posted a document, A Critique of Ron Unz’s Article “The Myth of American Meritocracy”, that is relevant to an ongoing discussion we had on this blog. Baytch’s article begins: In “The Myth of American Meritocracy,” Ron Unz, the publisher of The American Conservative, claimed that Harvard discriminates against non-Jewish white and Asian students […]The post Ivy Jew update appeared first on Statistical Modeling, Causal Inference, and Social Science.

## Unsampling: how a meaningless word invaded Big Data

October 22, 2013
If you know any statistics, you know "sampling". It's the idea of measuring some subset of the population. Using the Law of Large Numbers, you are able to learn from the sample and generalize to the population. Your Stats professor never told you what "unsampling" is. You're not going to find this word in a statistics textbook either. What does it mean? The "un" implies that you can recover the…

## Quant finance blogs

October 22, 2013
What I’ve learned from updating the blogroll. New entries The easy option is to go to The Whole Street which aggregates lots of quant finance blogs. Somehow Bookstaber missed out being on the blogroll before — definitely an oversight. Timely Portfolio was another that I was surprised wasn’t already there. The R Trader talks about … Continue reading →

## A gentle introduction to learning R

October 22, 2013
There are many good resources online for learning R. However, I recently discovered Try R from Code school – which is interactive, goes at a very gentle pace and also looks very pretty: http://tryr.codeschool.com/Filed under: serious stats Tagged...

## Review: Kölner R Meeting 18 October 2013

October 22, 2013
The Cologne R user group met last Friday for two talks on split apply combine in R and XLConnect by Bernd Weiß and Günter Faes respectively, before the usual Schnitzel and Kölsch at the Lux.Split apply combine in RThe apply family of functions in R ...

## "Significance Tests for Adaptive Modelling" (Today at the Statistics Seminar)

October 22, 2013
Attention conservation notice: Late notice of a very technical presentation about theoretical statistics in a city you don't live in. Today's speaker needs no introduction for those interested in modern, high-dimensional statistics (but will get an ...

## Simulation I: Generating Random Variables (Introduction to Statistical Computing)

October 22, 2013
Lecture 14: Why simulate? Generating random variables as first step. The built-in R commands: rnorm, runif, etc.; sample. Some uses of sampling: permutation tests; bootstrap standard errors and confidence intervals. Transforming uniformly-distribu...

## Simulation II: Markov Chains (Introduction to Statistical Computing)

October 22, 2013
Lecture 15: Combing multiple dependent random variables in a simulation; ordering the simulation to do the easy parts first. Markov chains as a particular example of doing the easy parts first. The Markov property. How to write a Markov chain simul...

## machine learning [book review, part 2]

October 21, 2013
The chapter (Chap. 3) on Bayesian updating or learning (a most appropriate term) for discrete data is well-done in Machine Learning, a probabilistic perspective if a bit stretched (which is easy with 1000 pages left!). I like the remark (Section 3.5.3) about the log-sum-exp trick. While lengthy, the chapter (Chap. 4) on Gaussian models has […]

## The Big Bayes theorem theory

October 21, 2013
While we were eating a forkful of what was supposed to be a frittata, but turned out to be very fluffy mushroom scrambled eggs earlier, we were half watching an episode of The Big Bang Theory. Long story short, my eye was caught by Sheldon ex...

## Useful Unix/Linux One-Liners for Bioinformatics

October 21, 2013
Much of the work that bioinformaticians do is munging and wrangling around massive amounts of text. While there are some "standardized" file formats (FASTQ, SAM, VCF, etc.) and some tools for manipulating them (fastx toolkit, samtools, vcftools, etc.),...