Don’t say “improper prior.” Say “non-generative model.”

June 18, 2017
By

[cat picture] In Bayesian Data Analysis, we write, “In general, we call a prior density p(θ) proper if it does not depend on data and integrates to 1.” This was a step forward from the usual understanding which is that a prior density is improper if an infinite integral. But I’m not so thrilled with […] The post Don’t say “improper prior.” Say “non-generative model.” appeared first on Statistical Modeling,…

Read more »

Where’d the $2500 come from?

June 17, 2017
By

Brad Buchsbaum writes: Sometimes I read the New York Times “Well” articles on science and health. It’s a mixed bag, sometimes it’s quite good and sometimes not. I came across this yesterday: What’s the Value of Exercise? $2,500 For people still struggling to make time for exercise, a new study offers a strong incentive: You’ll […] The post Where’d the $2500 come from? appeared first on Statistical Modeling, Causal Inference,…

Read more »

Non-Standard Evaluation and Function Composition in R

June 16, 2017
By

In this article we will discuss composing standard-evaluation interfaces (SE) and composing non-standard-evaluation interfaces (NSE) in R. In R the package tidyeval/rlang is a tool for building domain specific languages intended to allow easier composition of NSE interfaces. To use it you must know some of its structure and notation. Here are some details paraphrased … Continue reading Non-Standard Evaluation and Function Composition in R

Read more »

Some like it packed, some like it piled, and some like it wrapped

June 16, 2017
By
Some like it packed, some like it piled, and some like it wrapped

Kaiser Fung, founder of Principal Analytics Prep and Junk Charts, discusses chart innovations by Stephen Few and the Human-Computer Interaction Lab at the University of Maryland.

Read more »

Stan Weekly Roundup, 16 June 2017

June 16, 2017
By

We’re going to be providing weekly updates for what’s going on behind the scenes with Stan. Of course, it’s not really behind the scenes, because the relevant discussions are at stan-dev GitHub organization: this is the home of all of our source repos; design discussions are on the Stan Wiki Stan Discourse Groups: this is […] The post Stan Weekly Roundup, 16 June 2017 appeared first on Statistical Modeling, Causal…

Read more »

SPEED: Parallelizing Stan using the Message Passing Interface (MPI)

June 16, 2017
By
SPEED:  Parallelizing Stan using the Message Passing Interface (MPI)

Sebastian Weber writes: Bayesian inference has to overcome tough computational challenges and thanks to Stan we now have a scalable MCMC sampler available. For a Stan model running NUTS, the computational cost is dominated by gradient calculations of the model log-density as a function of the parameters. While NUTS is scalable to huge parameter spaces, […] The post SPEED: Parallelizing Stan using the Message Passing Interface (MPI) appeared first on…

Read more »

An easy way to accidentally inflate reported R-squared in linear regression models

June 15, 2017
By

Here is an absolutely horrible way to confuse yourself and get an inflated reported R-squared on a simple linear regression model in R. We have written about this before, but we found a new twist on the problem (interactions with categorical variable encoding) which we would like to call out here. First let’s set up … Continue reading An easy way to accidentally inflate reported R-squared in linear regression models

Read more »

Download a Chapter of Data Mining Techniques (3rd Edition) for Free

June 15, 2017
By

As seen on KDNuggets, you may now download Chapter 19, Derived Variables: Making the Data Mean More for free, thanks to our friends at JMP. This chapter is one of my personal favorites because it is about the part of data mining I find most e...

Read more »

Pizzagate gets even more ridiculous: “Either they did not read their own previous pizza buffet study, or they do not consider it to be part of the literature . . . in the later study they again found the exact opposite, but did not comment on the discrepancy.”

June 15, 2017
By
Pizzagate gets even more ridiculous:  “Either they did not read their own previous pizza buffet study, or they do not consider it to be part of the literature . . . in the later study they again found the exact opposite, but did not comment on the discrepancy.”

Background Several months ago, Jordan Anaya​, Tim van der Zee, and Nick Brown reported that they’d uncovered 150 errors in 4 papers published by Brian Wansink, a Cornell University business school professor and who describes himself as a “world-renowned eating behavior expert for over 25 years.” 150 errors is pretty bad! I make mistakes myself […] The post Pizzagate gets even more ridiculous: “Either they did not read their own…

Read more »

Ride a Crooked Mile

June 14, 2017
By

Joachim Krueger writes: As many of us rely (in part) on p values when trying to make sense of the data, I am sending a link to a paper Patrick Heck and I published in Frontiers in Psychology. The goal of this work is not to fan the flames of the already overheated debate, but […] The post Ride a Crooked Mile appeared first on Statistical Modeling, Causal Inference, and…

Read more »

Unintentional deception of area expansion #bigdata #piechart

June 14, 2017
By
Unintentional deception of area expansion #bigdata #piechart

Kaiser Fung, founder of Principal Analytics Prep, deconstructs a visualization of Big Data adoption in industry

Read more »

Two ways to compute maximum likelihood estimates in SAS

June 14, 2017
By
Two ways to compute maximum likelihood estimates in SAS

In a previous article, I showed two ways to define a log-likelihood function in SAS. This article shows two ways to compute maximum likelihood estimates (MLEs) in SAS: the nonlinear optimization subroutines in SAS/IML and the NLMIXED procedure in SAS/STAT. To illustrate these methods, I will use the same data [...] The post Two ways to compute maximum likelihood estimates in SAS appeared first on The DO Loop.

Read more »

Use a Join Controller to Document Your Work

June 13, 2017
By
Use a Join Controller to Document Your Work

This note describes a useful replyr tool we call a "join controller" (and is part of our "R and Big Data" series, please see here for the introduction, and here for one our big data courses). When working on real world predictive modeling tasks in production, the ability to join data and document how you … Continue reading Use a Join Controller to Document Your Work

Read more »

Kaiser Fung’s data analysis bootcamp

June 13, 2017
By

Kaiser Fung announces a new educational venture he’s created, a bootcamp (12-week full-time in-person program with a curriculum) of short courses with a goal of getting people their first job in an analytics role for a business unit (not engineering or software development, so he is not competing directly with MS Data Science or data […] The post Kaiser Fung’s data analysis bootcamp appeared first on Statistical Modeling, Causal Inference,…

Read more »

STOP PRESS Introductory Bayesian data analysis workshops for social scientists (June 2017 Nottingham UK)

June 13, 2017
By
STOP PRESS Introductory Bayesian data analysis workshops for social scientists (June 2017 Nottingham UK)

The third and (possibly) final round of the workshops of our introductory workshops was overbooked in April, but we have managed to arrange some additional dates in June.There are still places left on these. More details at: http://www.p...

Read more »

Statistical Challenges of Survey Sampling and Big Data (my remote talk in Bologna this Thurs, 15 June, 4:15pm)

June 13, 2017
By

Statistical Challenges of Survey Sampling and Big Data Andrew Gelman, Department of Statistics and Department of Political Science, Columbia University, New York Big Data need Big Model. Big Data are typically convenience samples, not random samples; observational comparisons, not controlled experiments; available data, not measurements designed for a particular study. As a result, it is […] The post Statistical Challenges of Survey Sampling and Big Data (my remote talk in…

Read more »

CNTK integrated into Keras

June 13, 2017
By
CNTK integrated into Keras

Keras is a very convenient tool to build your deep learning model from scratch, it is so easy to use that it almost becomes the de-facto deep learning modeling framework in Kaggle competition.Keras used to support only TensorFlow and Theano, now, CNTK,...

Read more »

The future of education is plain text

June 13, 2017
By
The future of education is plain text

I was recently at a National Academy meeting on Envisioning the Data Science Curriculum. It was a fun meeting and one of the questions that came up was what kind of infrastructure do we need to enable shared curricula, compatibility across schools, and...

Read more »

thinning a Markov chain, statistically

June 12, 2017
By
thinning a Markov chain, statistically

Art Owen has arXived a new version of his thinning MCMC paper, where he studies how thinning or subsampling can improve computing time in MCMC chains. I remember quite well the message set by Mark Berliner and Steve MacEachern in an early 1990’s paper that subsampling was always increasing the variance of the resulting estimators. […]

Read more »

Criminology corner: Type M error might explain Weisburd’s Paradox

June 12, 2017
By

[silly cartoon found by googling *cat burglar*] Torbjørn Skardhamar, Mikko Aaltonen, and I wrote this article to appear in the Journal of Quantitative Criminology: Simple calculations seem to show that larger studies should have higher statistical power, but empirical meta-analyses of published work in criminology have found zero or weak correlations between sample size and […] The post Criminology corner: Type M error might explain Weisburd’s Paradox appeared first on…

Read more »

Two simple ways to construct a log-likelihood function in SAS

June 12, 2017
By
Two simple ways to construct a log-likelihood function in SAS

Maximum likelihood estimation (MLE) is a powerful statistical technique that uses optimization techniques to fit parametric models. The technique finds the parameters that are "most likely" to have produced the observed data. SAS provides many tools for nonlinear optimization, so often the hardest part of maximum likelihood is writing down [...] The post Two simple ways to construct a log-likelihood function in SAS appeared first on The DO Loop.

Read more »

Maths trauma can be healed

June 12, 2017
By
Maths trauma can be healed

Maths trauma and earthquakes Trauma is a deeply distressing or disturbing experience. Many people in my home town of Christchurch still suffer from post traumatic stress disorder (PTSD) as a result of our earthquakes five or so years ago. I … Continue reading →

Read more »

Julia: Installation and Editors

June 12, 2017
By
Julia: Installation and Editors

If you have been following this blog, you may have noticed that I don't have any update for more than a year now. The reason is that I've been busy with my research, my work, and I promised not to share anything here until I finished my degree (Master ...

Read more »


Subscribe

Email:

  Subscribe