## Trifacta, an attempt to simplify the analyst’s life

September 8, 2014
A LinkedIn contact and 538 reader pointed me to this demo video by Joe Hellerstein, from a Bay Area startup called Trifacta. They have a neat product that tries to automate data cleaning/processing tasks for analysts. I love that people are working on this problem. It's an area that I'm interested in getting involved in. Also, they have a sleek user interface, well thought out, and innovative. There is a…

## Playing with orientation and style

September 8, 2014
I saw this nifty chart in the Wall Street Journal last week. The Post Office is competing with Fedex and UPS on pricing. The nice feature about this small dataset is that the story is very clear. In almost every...

## Network Econometrics at Dinner

September 8, 2014
At a seminar dinner at Duke last week, I asked the leading young econometrician at the table for his forecast of the Next Big Thing, now that the partial-identification set-estimation literature has matured. The speed and forcefulness of his answer -- ...

## Tim Harford on forecasting

September 8, 2014
A few weeks ago I had a Skype chat with Tim Harford, the “Undercover Economist” for Britain’s Financial Times. He was working on an article for the FT on forecasting, and wanted my perspective as an academic forecaster. I mostly talked about what makes some things more predictable than others, as discussed in this blog […]

## Order variables by values of a statistic

September 8, 2014
When I create a graph of data that contains a categorical variable, I rarely want to display the categories in alphabetical order. For example, the box plot to the left is a plot of 10 standardized variables where the variables are ordered by their median value. The ordering makes it […]

## Generating quantile forecasts in R

September 8, 2014
From today’s email: I have just finished reading a copy of ‘Forecasting:Principles and Practice’ and I have found the book really interesting. I have particularly enjoyed the case studies and focus on practical applications. After finishing the book I have joined a forecasting competition to put what I’ve learnt to the test. I do have […]

## The Semantics of the Y Axis

September 8, 2014
The vertical axis is not just important because it embodies one of the most important visual properties, but also because it is much more semantically loaded than the horizontal. Not only does the right choice of mapping help with reading a chart, it can also be confuse people if done wrong. It’s not a coincidence […]

## Likelihood from quantiles?

September 7, 2014
Michael McLaughlin writes: Many observers, esp. engineers, have a tendency to record their observations as {quantile, CDF} pairs, e.g., x CDF(x) 3.2 0.26 4.7 0.39 etc. I suspect that their intent is to do some kind of “least-squares” analysis by computing theoretical CDFs from a model, e.g. Gamma(a, b), then regressing the observed CDFs against […] The post Likelihood from quantiles? appeared first on Statistical Modeling, Causal Inference, and Social…

## Mapping products in a space

September 7, 2014
I have read about people doing a Bayesian PCA at some points and always wondered how that would work. Then, at some point I thought of a way to do so. As ideas evolved my interest became not PCA as such, but rather in a prefmap. As a first step in that...

## Slides of 12 tutorials at ACM SIGKDD 2014

September 7, 2014
Slides of 12 tutorials taught by data science experts and thought leaders at ACM SIGKDD 2014 are provided at http://www.kdd.org/kdd2014/tutorials.html. Below is a list of them. 1.Scaling Up Deep Learning Yoshua Bengio 2. Constructing and mining web-scale knowledge graphs Antoine … Continue reading →

## Statistical Science: The Likelihood Principle issue is out…!

September 7, 2014
Abbreviated Table of Contents: Here are some items for your Saturday-Sunday reading.  Link to complete discussion:  Mayo, Deborah G. On the Birnbaum Argument for the Strong Likelihood Principle (with discussion & rejoinder). Statistical Science 29 (2014), no. 2, 227-266. Links to individual papers: Mayo, Deborah G. On the Birnbaum Argument for the Strong Likelihood Principle. Statistical […]

## Translation Invariant of Lebesgue Outer Measure

September 7, 2014
Another proving problem, this time on Real Analysis.ProblemProve that the Lebesgue outer measure is translation invariant. (Use the property that, the length of an interval $l$ is translation invariant.) SolutionProof. The outer measure is translation ...

## Some time in the past 200 years the neighborhood has changed

September 7, 2014
“In that pleasant district of Merry England which is watered by the river Don, there extended in ancient times a large forest, covering the greater part of the beautiful hills and valleys which lie between Sheffield and the pleasant town of Doncaster.  The remains of this extensive wood are still to be seen at the […] The post Some time in the past 200 years the neighborhood has changed appeared…

## How does inference for next year’s data differ from inference for unobserved data from the current year?

September 6, 2014
Juliet Price writes: I recently came across your blog post from 2009 about how statistical analysis differs when analyzing an entire population rather than a sample. I understand the part about conceptualizing the problem as involving a stochastic data generating process, however, I have a query about the paragraph on ‘making predictions about future cases, […] The post How does inference for next year’s data differ from inference for unobserved…

## 1.2 Million Deaths by Ebola projected within Six Months?

September 6, 2014
The World Health Organization, Samaratins Purse, Doctors Without Borders, and other international medical emergency relief programs are desperately calling for additional resources in the international fight against Ebola that has already killed thousa...

## Mathematical and Applied Statistics Lesson of the Day – The Motivation and Intuition Behind Chebyshev’s Inequality

$Mathematical and Applied Statistics Lesson of the Day – The Motivation and Intuition Behind Chebyshev’s Inequality$

In 2 recent Statistics Lessons of the Day, I introduced Markov’s inequality. explained the motivation and intuition behind Markov’s inequality. Chebyshev’s inequality is just a special version of Markov’s inequality; thus, their motivations and intuitions are similar. Markov’s inequality roughly says that a random variable is most frequently observed near its expected value, .  Remarkably, it quantifies just […]

## EM Algorithm for Bayesian Lasso R Cpp Code

September 5, 2014
Bayesian Lasso \begin{align*} p(Y_{o}|\beta,\phi)&=N(Y_{o}|1\alpha+X_{o}\beta,\phi^{-1} I_{n{o}})\\ \pi(\beta_{i}|\phi,\tau_{i}^{2})&=N(\beta_{i}|0, \phi^{-1}\tau_{i}^{2})\\ \pi(\tau_{i}^{2})&=Exp \left( \frac{\lambda}{2} \right)\\ \pi(\phi)&\propto \phi^{-1}\\ \pi(\alpha)&\propto 1\\ \end{align*} Marginalizing over $$\alpha$$ equates to centering the observations and losing a degree of freedom and working with the centered $$Y_{o}$$. Mixing over $$\tau_{i}^{2}$$ leads to a Laplace or Double Exponential prior on $$\beta_{i}$$ with rate parameter $$\sqrt{\phi\lambda}$$ […] The post EM Algorithm for Bayesian Lasso R Cpp Code appeared first on Lindons…

## Confirmationist and falsificationist paradigms of science

September 5, 2014
Deborah Mayo and I had a recent blog discussion that I think might be of general interest so I’m reproducing some of it here. The general issue is how we think about research hypotheses and statistical evidence. Following Popper etc., I see two basic paradigms: Confirmationist: You gather data and look for evidence in support […] The post Confirmationist and falsificationist paradigms of science appeared first on Statistical Modeling, Causal…

## R: Image Analysis using EBImage

September 5, 2014
Currently, I am taking Statistics for Image Analysis on my masteral, and have been exploring this topic in R. One package that has the capability in this field is the EBImage from Bioconductor, which will be showcased in this post.InstallationFor those...

September 5, 2014
I’m enthusiastic about having R notify me when my script is done. But among my early uses of this, my script threw an error, and I never got a text or pushbullet about that. And really, I’m even more interested in being notified about such errors than anything else. It’s relatively easy to get notified […]

## Bayesian First Aid: Poisson Test

September 5, 2014
As the normal distribution is sort of the default choice when modeling continuous data (but not necessarily the best choice), the Poisson distribution is the default when modeling counts of events. Indeed, when all you know is the number of events du...

## All She Wrote (so far): Error Statistics Philosophy Contents-3 years on

September 5, 2014
Error Statistics Philosophy: Blog Contents By: D. G. Mayo[i] Each month, I will mark (in red) 3 relevant posts (from that month 3 yrs ago) for readers wanting to catch-up or review central themes and discussions. September 2011 (9/3) Frequentists in Exile: The Purpose of this Blog (9/3) Overheard at the comedy hour at the Bayesian retreat (9/4) Drilling Rule #1 […]

## Spell Checker for R…qdap::check_spelling

September 4, 2014
I often have had requests for a spell checker for R character vectors. The utils::aspell function can be used to check spelling but many Windows users have reported difficulty with the function. I came across an article on spelling in … Continue reading →