In this post I will demonstrate in R how to draw correlated random variables from any distributionThe idea is simple. 1. Draw any number of variables from a joint normal distribution. 2. Apply the univariate normal CDF of variables to derive pro...

Phil 6334* Day #4: Mayo slides follow the comments below. (Make-up for Feb 13 snow day.) Popper reading is from Conjectures and Refutations. As is typical in rereading any deep philosopher, I discover (or rediscover) different morsals of clues to understanding—whether fully intended by the philosopher or a byproduct of their other insights, and a more contemporary reading. […]

Some researchers (in both science and marketing) abuse a slavish view of p-values to try and falsely claim credibility. The incantation is: “we achieved p = x (with x ≤ 0.05) so you should trust our work.” This might be true if the published result had been performed as a single project (and not as […] Related posts: Bayesian and Frequentist Approaches: Ask the Right Question Worry about correctness and…

Recently, I was approached by Vicky whom I'm working with at a client, to help with a particular problem. She wanted to calculate page view summaries for a random sample of visitors from a table containing about a billion page views. This i...

An anonymous reviewer wrote: I appreciate informal writing styles as a means of increasing accessibility. However, the informality here seems to decrease accessibility – partly because of the assumed knowledge of the reader for concepts and terms, and also for its wandering style. Many concepts are introduced without explanation and are not clearly and decisively […]The post A good comment on one of my papers appeared first on Statistical Modeling,…

This is an echo of yesterday’s post, Basketball Stats: Don’t model the probability of win, model the expected score differential. As with basketball, so with baseball: as the great Bill James wrote, if you want to predict a pitcher’s win-loss record, it’s better to use last year’s ERA than last year’s W-L. As with basketball […]The post Econometrics, political science, epidemiology, etc.: Don’t model the probability of a discrete outcome,…

Jeff, Brian, and I had to record nine separate introductory videos for our Data Science Specialization and, well, some of us were better at it than others. It takes a bit of practice to read effectively from a teleprompter, something … Continue reading →

I will be speaking at the Agilone Data Driven Marketing Summit (link) in San Francisco on Thursday. I will be talking about hiring for numbersense. Drop by if you are in the area. Future events are listed on the right column of the blog >>> *** I feel bad piling on the "good guys" in the sports doping spectacle but sometimes, you need someone to point you to the mirror.…

Winner of February 2014 Palindrome Contest Samuel Dickson Palindrome: Rot, Cadet A, I’ve droned! Elba, revile deviant, naïve, deliverable den or deviated actor. The requirement was: A palindrome with Elba plus deviate with an optional second word: deviant. A palindrome that uses both deviate and deviant tops an acceptable palindrome that only uses deviate. Bio: Sam Dickson is […]

Update June 2013: A systematic analysis of the topic has been published:Schönbrodt, F. D., & Perugini, M. (2013). At what sample size do correlations stabilize? Journal of Research in Personality, 47, 609-612. doi:10.1016/j.jrp.2013.05.009 Check ...

“If you torture the data long enough, it will confess.” This aphorism, attributed to Ronald Coase, sometimes has been used in a disrespective manner, as if was wrong to do creative data analysis. This view obviously is misleading. In contra...

Sanjay Srivastava blogged some interesting thoughts about the process of post-publication peer review (PPPR), reflecting about his own comment on a PLOS ONE publication. I agree that open peer commentaries after publication are one important part of th...

Update Feb 17, 2014: WRS moved to Github – This installation procedure has been updated and still is valid Some users had trouble installing the WRS package from R-Forge. Here’s a method that should work automatically and fail-safe: [cc lan...

Maybe you have encountered this situation: you run a large-scale study over the internet, and out of curiosity, you frequently the correlation between two variables. My experience with this practice is usually frustrating, as in small sample sizes (a...

[Update June 12: Data.tables functions have been improved (thanks to a comment by Matthew Dowle); for a similar approach see also Tal Galili's post] The guys from RStudio now provide CRAN download logs (see also this blog post). Great work! I always as...

One critique frequently heard about Bayesian statistics is the subjectivity of the assumed prior distribution. If one is cherry-picking a prior, of course the posterior can be tweaked, especially when only few data points are at hand. For example, see ...

Today a new version (0.23.1) of the WRS package (Wilcox’ Robust Statistics) has been released. This package is the companion to his rather exhaustive book on robust statistics, “Introduction to Robust Estimation and Hypothesis Testing”...

The probably most frequent criticism of Bayesian statistics sounds something like “It’s all subjective – with the ‘right’ prior, you can get any result you want.”. In order to approach this criticism it has been sugg...

My last lesson introduced the matched pairs experimental design, which is a special type of the randomized blocked design. Let’s now talk about how to analyze the data from such a design. Since the experimental units are organized in pairs, the units between pairs (blocks) are not independently assigned. (The units within each pair are […]

My friends Randal Douc and Éric Moulines just published this new time series book with David Stoffer. (David also wrote Time Series Analysis and its Applications with Robert Shumway a year ago.) The books reflects well on the research of Randal and Éric over the past decade, namely convergence results on Markov chains for validating […]

Interview with Nick Chamandy, statistician at Google You and Your Research + video Trustworthy Online Controlled Experiments: Five Puzzling Outcomes Explained A Survival Guide to Starting and Finishing a PhD Six Rules For Wearing Suits For Beginners Why I Created C++ More advice to scientists on blogging Software engineering practices for graduate students Statistics Matter […]