## God/leaf/tree

February 28, 2014
By

Govind Manian writes: I wanted to pass along a fragment from Lichtenberg’s Waste Books — which I am finding to be great stone soup — that reminded me of God is in Every Leaf: To the wise man nothing is great and nothing small…I believe he could write treatises on keyholes that sounded as weighty […]The post God/leaf/tree appeared first on Statistical Modeling, Causal Inference, and Social Science.

Read more »

## r4stats.com 2013 in review

February 28, 2014
By

The WordPress.com stats helper monkeys prepared a 2013 annual report for this blog. Here’s an excerpt: The Louvre Museum has 8.5 million visitors per year. This blog was viewed about 150,000 times in 2013. If it were an exhibit at … Continue reading →

Read more »

## Useful Functions in R for Manipulating Text Data

$Useful Functions in R for Manipulating Text Data$

Introduction In my current job, I study HIV at the genetic and biochemical levels.  Thus, I often work with data involving the sequences of nucleotides or amino acids of various patient samples of HIV, and this type of work involves a lot of manipulating text.  (Strictly speaking, I analyze sequences of nucleotides from DNA that are reverse-transcribed from […]

Read more »

## Foundations of Statistical Algorithms [book review]

February 27, 2014
By

There is computational statistics and there is statistical computing. And then there is statistical algorithmic. Not the same thing, by far. This 2014 book by Weihs, Mersman and Ligges, from TU Dortmund, the later being also a member of the R Core team, stands at one end of this wide spectrum of techniques required by […]

Read more »

## Old School Reproducibility

February 27, 2014
By

The Replication crisis in science has brought out the Philosopher of Science in everyone. Great pronouncements are being made as to the way science should be done. So it’s worth recalling how Science achieved reproducible results in the past. Take fo...

Read more »

## Visualising Mill Road: Informing Communities by Infographics in the Street

February 27, 2014
By

Visualising Mill Road [visualisingmillroad.com] by Lisa Koeman, Vaiva Kalnikaite and Yvonne Rogers from ICRI Cities was a community project that combined citizen participation and public data visualization to inform a community on what other members o...

Read more »

## Example 2014.3: Allow different variances by group

February 27, 2014
By

One common violation of the assumptions needed for linear regression is heterscedasticity by group membership. Both SAS and R can easily accommodate this setting. Our data today comes from a real example of vitamin D supplementation of milk. Four sup...

Read more »

## “What Can we Learn from the Many Labs Replication Project?”

February 27, 2014
By

Aki points us to this discussion from Rolf Zwaan: The first massive replication project in psychology has just reached completion (several others are to follow). . . . What can we learn from the ManyLabs project? The results here show the effect sizes for the replication efforts (in green and grey) as well as the […]The post “What Can we Learn from the Many Labs Replication Project?” appeared first on…

Read more »

## On Assessing Convergence to a Steady State

February 27, 2014
By

I recently listened to a stimulating statistics talk, "Discerning a Steady State Sequentially," by Moshe Pollak (with Tom Hope), presently visiting Penn. Of course it's impossible to know with certainty whether we're in steady state based on a finite s...

Read more »

## Easily generate correlated variables from any distribution

February 27, 2014
By

In this post I will demonstrate in R how to draw correlated random variables from any distributionThe idea is simple.  1. Draw any number of variables from a joint normal distribution. 2. Apply the univariate normal CDF of variables to derive pro...

Read more »

## More time series data online

February 27, 2014
By

Earlier this week I had coffee with Ben Fulcher who told me about his online collection comprising about 30,000 time series, mostly medical series such as ECG measurements, meteorological series, birdsong, etc. There are some finance series, but not ma...

Read more »

## Phil6334: Feb 24, 2014: Induction, Popper and pseudoscience (Day #4)

February 27, 2014
By

Phil 6334* Day #4: Mayo slides follow the comments below. (Make-up for Feb 13 snow day.) Popper reading is from Conjectures and Refutations. As is typical in rereading any deep philosopher, I discover (or rediscover) different morsals of clues to understanding—whether fully intended by the philosopher or a byproduct of their other insights, and a more contemporary reading. […]

Read more »

## Drowning in insignificance

February 26, 2014
By

Some researchers (in both science and marketing) abuse a slavish view of p-values to try and falsely claim credibility. The incantation is: “we achieved p = x (with x ≤ 0.05) so you should trust our work.” This might be true if the published result had been performed as a single project (and not as […] Related posts: Bayesian and Frequentist Approaches: Ask the Right Question Worry about correctness and…

Read more »

## Taking a Random Sample on Amazon Redshift

February 26, 2014
By

Recently, I was approached by Vicky whom I'm working with at a client, to help with a particular problem.  She wanted to calculate page view summaries for a random sample of visitors from a table containing about a billion page views.  This i...

Read more »

## A good comment on one of my papers

February 26, 2014
By

An anonymous reviewer wrote: I appreciate informal writing styles as a means of increasing accessibility. However, the informality here seems to decrease accessibility – partly because of the assumed knowledge of the reader for concepts and terms, and also for its wandering style. Many concepts are introduced without explanation and are not clearly and decisively […]The post A good comment on one of my papers appeared first on Statistical Modeling,…

Read more »

## Econometrics, political science, epidemiology, etc.: Don’t model the probability of a discrete outcome, model the underlying continuous variable

February 26, 2014
By

This is an echo of yesterday’s post, Basketball Stats: Don’t model the probability of win, model the expected score differential. As with basketball, so with baseball: as the great Bill James wrote, if you want to predict a pitcher’s win-loss record, it’s better to use last year’s ERA than last year’s W-L. As with basketball […]The post Econometrics, political science, epidemiology, etc.: Don’t model the probability of a discrete outcome,…

Read more »

## Data Science is Hard, But So is Talking

February 26, 2014
By

Jeff, Brian, and I had to record nine separate introductory videos for our Data Science Specialization and, well, some of us were better at it than others. It takes a bit of practice to read effectively from a teleprompter, something … Continue reading →

Read more »

## Good guys in sports need a dose of reality

February 26, 2014
By

I will be speaking at the Agilone Data Driven Marketing Summit (link) in San Francisco on Thursday. I will be talking about hiring for numbersense. Drop by if you are in the area. Future events are listed on the right column of the blog >>> *** I feel bad piling on the "good guys" in the sports doping spectacle but sometimes, you need someone to point you to the mirror.…

Read more »

## How to automatically select a smooth curve for a scatter plot in SAS

February 26, 2014
By

My last blog post described three ways to add a smoothing spline to a scatter plot in SAS. I ended the post with a cautionary note: From a statistical point of view, the smoothing spline is less than ideal because the smoothing parameter must be chosen manually by the user. [...]

Read more »

## Winner of the Febrary 2014 palindrome contest (rejected post)

February 26, 2014
By

Winner of February 2014 Palindrome Contest Samuel Dickson Palindrome: Rot, Cadet A, I’ve droned! Elba, revile deviant, naïve, deliverable den or deviated actor. The requirement was: A palindrome with Elba plus deviate with an optional second word: deviant. A palindrome that uses both deviate and deviant tops an acceptable palindrome that only uses deviate. Bio: Sam Dickson is […]

Read more »

## Improved evolution of correlations

February 26, 2014
By

Update June 2013: A systematic analysis of the topic has been published:Schönbrodt, F. D., & Perugini, M. (2013). At what sample size do correlations stabilize? Journal of Research in Personality, 47, 609-612. doi:10.1016/j.jrp.2013.05.009 Check ...

Read more »

## The first CREDAM Award for creative data management goes to … the German government!

February 26, 2014
By

“If you torture the data long enough, it will confess.” This aphorism, attributed to Ronald Coase, sometimes has been used in a disrespective manner, as if was wrong to do creative data analysis. This view obviously is misleading. In contra...

Read more »

## Further thoughts on post-publication peer review (PPPR)

February 26, 2014
By

Sanjay Srivastava blogged some interesting thoughts about the process of post-publication peer review (PPPR), reflecting about his own comment on a PLOS ONE publication. I agree that open peer commentaries after publication are one important part of th...

Read more »

 Tweet

Email: