## God/leaf/tree

February 28, 2014
Govind Manian writes: I wanted to pass along a fragment from Lichtenberg’s Waste Books — which I am finding to be great stone soup — that reminded me of God is in Every Leaf: To the wise man nothing is great and nothing small…I believe he could write treatises on keyholes that sounded as weighty […]The post God/leaf/tree appeared first on Statistical Modeling, Causal Inference, and Social Science.

## r4stats.com 2013 in review

February 28, 2014
The WordPress.com stats helper monkeys prepared a 2013 annual report for this blog. Here’s an excerpt: The Louvre Museum has 8.5 million visitors per year. This blog was viewed about 150,000 times in 2013. If it were an exhibit at … Continue reading →

## Useful Functions in R for Manipulating Text Data

$Useful Functions in R for Manipulating Text Data$

Introduction In my current job, I study HIV at the genetic and biochemical levels.  Thus, I often work with data involving the sequences of nucleotides or amino acids of various patient samples of HIV, and this type of work involves a lot of manipulating text.  (Strictly speaking, I analyze sequences of nucleotides from DNA that are reverse-transcribed from […]

## Foundations of Statistical Algorithms [book review]

February 27, 2014
There is computational statistics and there is statistical computing. And then there is statistical algorithmic. Not the same thing, by far. This 2014 book by Weihs, Mersman and Ligges, from TU Dortmund, the later being also a member of the R Core team, stands at one end of this wide spectrum of techniques required by […]

## Old School Reproducibility

February 27, 2014
The Replication crisis in science has brought out the Philosopher of Science in everyone. Great pronouncements are being made as to the way science should be done. So it’s worth recalling how Science achieved reproducible results in the past. Take fo...

## Visualising Mill Road: Informing Communities by Infographics in the Street

February 27, 2014
Visualising Mill Road [visualisingmillroad.com] by Lisa Koeman, Vaiva Kalnikaite and Yvonne Rogers from ICRI Cities was a community project that combined citizen participation and public data visualization to inform a community on what other members o...

## Example 2014.3: Allow different variances by group

February 27, 2014
One common violation of the assumptions needed for linear regression is heterscedasticity by group membership. Both SAS and R can easily accommodate this setting. Our data today comes from a real example of vitamin D supplementation of milk. Four sup...

## “What Can we Learn from the Many Labs Replication Project?”

February 27, 2014
Aki points us to this discussion from Rolf Zwaan: The first massive replication project in psychology has just reached completion (several others are to follow). . . . What can we learn from the ManyLabs project? The results here show the effect sizes for the replication efforts (in green and grey) as well as the […]The post “What Can we Learn from the Many Labs Replication Project?” appeared first on…

## On Assessing Convergence to a Steady State

February 27, 2014
I recently listened to a stimulating statistics talk, "Discerning a Steady State Sequentially," by Moshe Pollak (with Tom Hope), presently visiting Penn. Of course it's impossible to know with certainty whether we're in steady state based on a finite s...

## Easily generate correlated variables from any distribution

February 27, 2014
In this post I will demonstrate in R how to draw correlated random variables from any distributionThe idea is simple.  1. Draw any number of variables from a joint normal distribution. 2. Apply the univariate normal CDF of variables to derive pro...

## More time series data online

February 27, 2014
Earlier this week I had coffee with Ben Fulcher who told me about his online collection comprising about 30,000 time series, mostly medical series such as ECG measurements, meteorological series, birdsong, etc. There are some finance series, but not ma...

## Phil6334: Feb 24, 2014: Induction, Popper and pseudoscience (Day #4)

February 27, 2014
Phil 6334* Day #4: Mayo slides follow the comments below. (Make-up for Feb 13 snow day.) Popper reading is from Conjectures and Refutations. As is typical in rereading any deep philosopher, I discover (or rediscover) different morsals of clues to understanding—whether fully intended by the philosopher or a byproduct of their other insights, and a more contemporary reading. […]

## Drowning in insignificance

February 26, 2014
Some researchers (in both science and marketing) abuse a slavish view of p-values to try and falsely claim credibility. The incantation is: “we achieved p = x (with x ≤ 0.05) so you should trust our work.” This might be true if the published result had been performed as a single project (and not as […] Related posts: Bayesian and Frequentist Approaches: Ask the Right Question Worry about correctness and…

## Taking a Random Sample on Amazon Redshift

February 26, 2014
Recently, I was approached by Vicky whom I'm working with at a client, to help with a particular problem.  She wanted to calculate page view summaries for a random sample of visitors from a table containing about a billion page views.  This i...

## A good comment on one of my papers

February 26, 2014
An anonymous reviewer wrote: I appreciate informal writing styles as a means of increasing accessibility. However, the informality here seems to decrease accessibility – partly because of the assumed knowledge of the reader for concepts and terms, and also for its wandering style. Many concepts are introduced without explanation and are not clearly and decisively […]The post A good comment on one of my papers appeared first on Statistical Modeling,…

## Econometrics, political science, epidemiology, etc.: Don’t model the probability of a discrete outcome, model the underlying continuous variable

February 26, 2014
This is an echo of yesterday’s post, Basketball Stats: Don’t model the probability of win, model the expected score differential. As with basketball, so with baseball: as the great Bill James wrote, if you want to predict a pitcher’s win-loss record, it’s better to use last year’s ERA than last year’s W-L. As with basketball […]The post Econometrics, political science, epidemiology, etc.: Don’t model the probability of a discrete outcome,…

## Data Science is Hard, But So is Talking

February 26, 2014
Jeff, Brian, and I had to record nine separate introductory videos for our Data Science Specialization and, well, some of us were better at it than others. It takes a bit of practice to read effectively from a teleprompter, something … Continue reading →

## Good guys in sports need a dose of reality

February 26, 2014
I will be speaking at the Agilone Data Driven Marketing Summit (link) in San Francisco on Thursday. I will be talking about hiring for numbersense. Drop by if you are in the area. Future events are listed on the right column of the blog >>> *** I feel bad piling on the "good guys" in the sports doping spectacle but sometimes, you need someone to point you to the mirror.…

## How to automatically select a smooth curve for a scatter plot in SAS

February 26, 2014
My last blog post described three ways to add a smoothing spline to a scatter plot in SAS. I ended the post with a cautionary note: From a statistical point of view, the smoothing spline is less than ideal because the smoothing parameter must be chosen manually by the user. [...]

## Winner of the Febrary 2014 palindrome contest (rejected post)

February 26, 2014
Winner of February 2014 Palindrome Contest Samuel Dickson Palindrome: Rot, Cadet A, I’ve droned! Elba, revile deviant, naïve, deliverable den or deviated actor. The requirement was: A palindrome with Elba plus deviate with an optional second word: deviant. A palindrome that uses both deviate and deviant tops an acceptable palindrome that only uses deviate. Bio: Sam Dickson is […]

## Improved evolution of correlations

February 26, 2014
Update June 2013: A systematic analysis of the topic has been published:Schönbrodt, F. D., & Perugini, M. (2013). At what sample size do correlations stabilize? Journal of Research in Personality, 47, 609-612. doi:10.1016/j.jrp.2013.05.009 Check ...

## The first CREDAM Award for creative data management goes to … the German government!

February 26, 2014
“If you torture the data long enough, it will confess.” This aphorism, attributed to Ronald Coase, sometimes has been used in a disrespective manner, as if was wrong to do creative data analysis. This view obviously is misleading. In contra...

## Further thoughts on post-publication peer review (PPPR)

February 26, 2014
Sanjay Srivastava blogged some interesting thoughts about the process of post-publication peer review (PPPR), reflecting about his own comment on a PLOS ONE publication. I agree that open peer commentaries after publication are one important part of th...

