Some statistical dirty laundry

June 1, 2013
By
Some statistical dirty laundry

I finally had a chance to fully read the 2012 Tilberg Report* on “Flawed Science” last night. Here are some stray thoughts… 1. Slipping into pseudoscience. The authors of the Report say they never anticipated giving a laundry list of “undesirable conduct” by which researchers can flout pretty obvious requirements for the responsible practice of science. It […]

Read more »

Loading Historical Stock Data

June 1, 2013
By
Loading Historical Stock Data

Historical Stock Data is critical for testing your investment strategies. I illustrated all my back-test examples with getSymbols function from quantmod package. For example, following is a back-test comparison for a few portfolio allocation methods: The getSymbols function, from quantmod package, downloads historical stock prices from Yahoo Fiance. I often get questions about alternative ways […]

Read more »

Benford’s law and addresses

June 1, 2013
By
Benford’s law and addresses

One example we give to illustrate Benford’s law is the first digits of addresses. Javier Marquez Pena had a survey and, just for laffs, he looked the distribution of first digits: Cool—it really works! P.S. The y-axis shouldn’t go below zero, and I’d much prefer an L-type graphics box (par(bty=”l”)) rather than the square, but [...]The post Benford’s law and addresses appeared first on Statistical Modeling, Causal Inference, and Social…

Read more »

Tweetanalytics – Interactively analyzing tweets from accounts of 5 universities

June 1, 2013
By
Tweetanalytics – Interactively analyzing tweets from accounts of 5 universities

This is an attempt at learning and interactively displaying few results using twitter data using text mining. Interactivity is implemented using RStudio's shiny server. Their documentation of demo scripts came in very handy. As a non-user of twitter, I...

Read more »

Flotsam 12: early June linkathon

June 1, 2013
By

A list of interesting R/Stats quickies to keep the mind distracted: A long draft Advanced Data Analysis from an Elementary Point of View by Cosma Shalizi, in which he uses R to drive home the message. Not your average elementary point of view. Good notes by Frank Davenport on starting using R with data from […]

Read more »

Regression regularization example

May 31, 2013
By
Regression regularization example

Recently I needed a simple example showing when application of regularization in regression is worthwhile. Here is the code I came up with (along with basic application of parallelization of code execution). Assume you have 60 observations and 50 expla...

Read more »

How to fix the tabloids? Toward replicable social science research

May 31, 2013
By
How to fix the tabloids?  Toward replicable social science research

This seems to be the topic of the week. Yesterday I posted on the sister blog some further thoughts on those “Psychological Science” papers on menstrual cycles, biceps size, and political attitudes, tied to a horrible press release from the journal Psychological Science hyping the biceps and politics study. Then I was pointed to these [...]The post How to fix the tabloids? Toward replicable social science research appeared first on…

Read more »

accurate ABC: comments by Oliver Ratman [guest post]

May 31, 2013
By
accurate ABC: comments by Oliver Ratman [guest post]

Here are comments by Olli following my post: I think we found a general means to obtain accurate ABC in the sense of matching the posterior mean or MAP exactly, and then minimising the KL distance between the true posterior and its ABC approximation subject to this condition. The construction works on an auxiliary probability […]

Read more »

Belly Button Biodiversity: The End Game

May 30, 2013
By
Belly Button Biodiversity: The End Game

In the previous installment of this saga, I admitted that my predictions had completely failed, and I outlined the debugging process I began.  Then the semester happened, so I didn't get to work on it again until last week.It turns out that there ...

Read more »

PLATO, an Alternative to PLINK

May 30, 2013
By
PLATO, an Alternative to PLINK

Since the near beginning of genome-wide association studies, the PLINK software package (developed by Shaun Purcell’s group at the Broad Institute and MGH) has been the standard for manipulating the large-scale data produced by these studies.  O...

Read more »

There are no outliers

May 30, 2013
By

Matt Brigg’s comment on outliers in his post Tyranny of the mean: Coontz used the word “outliers”. There are no such things. There can be mismeasured data, i.e. incorrect data, say when you tried to measure air temperature but your…Read more ›

Read more »

Infill asymptotics and sprawl asymptotics

May 30, 2013
By
Infill asymptotics and sprawl asymptotics

Anirban Bhattacharya, Debdeep Pati, Natesh Pillai, and David Dunson write: Penalized regression methods, such as L1 regularization, are routinely used in high-dimensional applications, and there is a rich literature on optimality properties under sparsity assumptions. In the Bayesian paradigm, sparsity is routinely induced through two-component mixture priors having a probability mass at zero, but such [...]The post Infill asymptotics and sprawl asymptotics appeared first on Statistical Modeling, Causal Inference, and…

Read more »

Chance to ask me a question this Friday

May 30, 2013
By

I will be at Book Expo this Friday signing books at the McGraw-Hill booth. If you're in NYC, drop by and say hi between 11 and 12. Yes, it's a new book! The title is Numbersense: How to Use Big...

Read more »

Chance to ask me a question this Friday

May 30, 2013
By

I will be at Book Expo this Friday signing books at the McGraw-Hill booth. If you're in NYC, drop by and say hi between 11 and 12. Yes, it's a new book! The title is Numbersense: How to Use Big Data to Your Advantage (link). If you read my blogs, you already know where I'm going with this. How can we be smart consumers of data analyses in a world…

Read more »

Using simulation to estimate the power of a statistical test

May 30, 2013
By
Using simulation to estimate the power of a statistical test

The power of a statistical test measures the test's ability to detect a specific alternate hypothesis. For example, educational researchers might want to compare the mean scores of boys and girls on a standardized test. They plan to use the well-known two-sample t test. The null hypothesis is that the [...]

Read more »

K. Staley: review of Error & Inference

May 30, 2013
By
K. Staley: review of Error & Inference

K. W. Staley Associate Professor Department of Philosophy, Saint Louis University (Almost) All about error BOOK REVIEW Metascience (2012) 21:709–713 DOI 10.1007/s11016-011-9618-1 Deborah G. Mayo and Aris Spanos (eds): Error and inference: Recent exchanges on experimental reasoning, reliability, objectivity, and rationality. New York: Cambridge University Press, 2010, xvii+419 pp The ERROR’06 (experimental reasoning, reliability, objectivity, […]

Read more »

What statistics should do about big data: problem forward not solution backward

May 29, 2013
By

There has been a lot of discussion among statisticians about big data and what statistics should do to get involved. Recently Steve M. and Larry W. took up the same issue on their blog. I have been thinking about this … Continue reading →

Read more »

Another one of those “Psychological Science” papers (this time on biceps size and political attitudes among college students)

May 29, 2013
By

Paul Alper writes: Unless I missed it, you haven’t commented on the recent article of Michael Bang Peterson [with Daniel Sznycer, Aaron Sell, Leda Cosmides, and John Tooby]. It seems to have been reviewed extensively in the lay press. A typical example is here. This review begins with “If you are physically strong, social science [...]The post Another one of those “Psychological Science” papers (this time on biceps size and…

Read more »

SAS Dominates Analytics Job Market; R up 42%

May 29, 2013
By
SAS Dominates Analytics Job Market; R up 42%

I’m continuing to gather and analyze data to update The Popularity of Data Analysis Software. In this installment I cover the latest employment figures. Employment is important to us all, so what software skills are employers seeking? A thorough answer … Continue reading →

Read more »

The 3D Trajectories of the Tennis Ball during the Final ATP Matches

May 29, 2013
By
The 3D Trajectories of the Tennis Ball during the Final ATP Matches

Corona Perspectives [coronaperspectives.com] developed by advertising agency JWT Spain and web development studio Espada y Santa Cruz provides an interactive and 3D perspective of all the tennis ball trajectories during 3 past ATP tennis matches. The...

Read more »

Will Mu Go Out With Median

May 29, 2013
By
Will Mu Go Out With Median

True story (no really, this did actually happen).  While in grad school one of the other teaching assistants was approached by one of the students and was asked “will mu go out with median?”  The teaching assistant thought the play on words was pretty funny, laughed, and then cluelessly walked away.  All of us other grad students […]

Read more »

Why doesn’t R have a MaxDiff package?

May 28, 2013
By

Almost once every year someone asks if R has a package for running the MaxDiff procedure sold by Sawtooth.  One such inquiry recently received a reply with a link showing in some detail the R code needed to generate a balanced incomplete...

Read more »

Escalatingly uncomfortable

May 28, 2013
By

Aggressive, fizzing nonconformity. The post Escalatingly uncomfortable appeared first on Statistical Modeling, Causal Inference, and Social Science.

Read more »


Subscribe

Email:

  Subscribe