Greeks have been quite volatile on their opinion whether they should accept or not a proposal by the country’s creditors for more austerity to keep aid flowing. The polls conducted over this week look like crazy, though that “belly” was likely pr...

What’s the problem? Which data are needed to solve it? Who gets an advantage of it? These few questions are valuable key for implementing the open data culture. Open data not as ‘l’art pour l’art’ but in a pragmatic approach, demonstrating that the ‘proof of the pudding is in the eating’. It seems to work … Continue reading The Cui-bono Approach to Open Data

NOTE: This post was written as a chapter for the not-yet-released Handbook on Statistics Education. Data are eating the world, but our collective ability to analyze data is going on a starvation diet. Everywhere you turn, data are being generated somehow. By the time you read this piece, you’ll probably have collected some data. (For

Deborah Mayo points us to a post by Stephen Senn discussing various aspects of induction and statistics, including the famous example of estimating the probability the sun will rise tomorrow. Senn correctly slams a journalistic account of the math problem: The canonical example is to imagine that a precocious newborn observes his first sunset, and […] The post “Why should anyone believe that? Why does it make sense to model…

Professor Larry Laudan Lecturer in Law and Philosophy University of Texas at Austin “When the ‘Not-Guilty’ Falsely Pass for Innocent” by Larry Laudan While it is a belief deeply ingrained in the legal community (and among the public) that false negatives are much more common than false positives (a 10:1 ratio being the preferred guess), […]

In my previous post, An Attempt to Understand Boosting Algorithm(s), I was puzzled by the boosting convergence when I was using some spline functions (more specifically linear by parts and continuous regression functions). I was using > library(splines) > fit=lm(y~bs(x,degree=1,df=3),data=df) The problem with that spline function is that knots seem to be fixed. The iterative boosting algorithm is start with some regression model compute the residuals, including some shrinkage parameter, then…

Not sure how I missed this, but the Linux Foundation just announced the R Consortium for supporting the "world’s most popular language for analytics and data science and support the rapid growth of the R user community". From the Linux Foundation: The R language is used by statisticians, analysts and data scientists to unlock value

Brian MacGillivray and Nick Pidgeon write: Daniel Gilbert maintains that people generally make bad decisions on risk issues, and suggests that communication strategies and education programmes would help (Nature 474, 275–277; 2011). This version of the deficit model pervades policy-making and branches of the social sciences. In this model, conflicts between expert and public perceptions […] The post Humility needed in decision-making appeared first on Statistical Modeling, Causal Inference, and…

25 helpful pieces of advice. Comportment: Walk like you're walking away from an explosion in a Hollywood movie. Tuck your chin in, tilt your head down and look at people from out of the top of your eyes. Squint. Lecturing: Show up late, then run ove...

When is the death penalty okay? A court with no Protestants How much does advertising matter in presidential elections? Bartenders are Democrats, beer wholesalers are Republicans The ambiguity of racial categories No, public opinion is not driven by ‘unreasoning bias and emotion’ Political science: Who is it for? Modern campaigning has big effects on voter […] The post Recently in the sister blog appeared first on Statistical Modeling, Causal Inference,…

A natural technique to select variables in the context of generalized linear models is to use a stepŵise procedure. It is natural, but contreversial, as discussed by Frank Harrell in a great post, clearly worth reading. Frank mentioned about 10 points against a stepwise procedure. It yields R-squared values that are badly biased to be high. The F and chi-squared tests quoted next to each variable on the printout do not have the…

From Venturebeat: Back then we knew so little about the business that any insight was groundbreaking; data infrastructure was fast, stable, and real-time (I was querying our production MySQL database); the company was so small that everyone was in the loop about every decision; and the data team (me) was aligned around a singular set

When you count the outcomes of an experiment, you do not always observe all of the possible outcomes. For example, if you roll a six-sided die 10 times, it might be that the "1" face does not appear in those 10 rolls. Obviously, this situation occurs more frequently with small […] The post Merge observed outcomes into a list of all outcomes appeared first on The DO Loop.

There are some tools that I use regularly, and I would like my research students and post-docs to learn them too. Here are some great online tutorials that might help. ggplot tutorial from Winston Chang Writing an R package from Karl Broman Rmarkdown from RStudio Shiny from RStudio git/github guide from Karl Broman minimal make tutorial from Karl […]