## Hopper – new in the travel space

January 19, 2014
By

Briefly - Hopper is something new in the travel / local space. In their own words: What if you could plan an amazing trip based on a vague idea — like “spring surfing in California” or “Mediterranean cruise”? What if...

## What is volatility?

January 19, 2014
By

Some facts and some speculation. Definition Volatility is the annualized standard deviation of returns — it is often expressed in percent. A volatility of 20 means that there is about a one-third probability that an asset’s price a year from now will have fallen or risen by more than 20% from its present value. In … Continue reading →

## Transformations for non-normal data

January 19, 2014
By

Steve Peterson writes: I recently submitted a proposal on applying a Bayesian analysis to gender comparisons on motivational constructs. I had an idea on how to improve the model I used and was hoping you could give me some feedback. The data come from a survey based on 5-point Likert scales. Different constructs are measured […]The post Transformations for non-normal data appeared first on Statistical Modeling, Causal Inference, and Social…

## Sir Harold Jeffreys’ (tail area) one-liner: Sat night comedy [draft ii]

January 19, 2014
By

You might not have thought there could be new material for 2014, but there is, and if you look a bit more closely, you’ll see that it’s actually not Jay Leno who is standing up there at the mike …. It’s Sir Harold Jeffreys himself! And his (very famous) joke, I admit, is funny. So, since […]

## Le Monde puzzle [#849]

January 18, 2014
By

A straightforward Le Monde mathematical puzzle: Find a pair (a,b) of integers such that a has an odd number d of digits larger than 2 and ab is written as 10d+1+10a+1. Find the smallest possible values of a and of b. I ran the following R code which produced a=137 (and b=83) as the unique […]

## Measurement and Measurement Error, Weight, Success and Failure

January 18, 2014
By

This blog currently weights 200 pounds. It's inscribed in my data base, so it must be true. 200 is the latest in a series of daily morning readings wearing the same clothing, at the same time of my day. But how is that 200 measured? And is 200 good or ...

## Converting plots to data

January 18, 2014
By

It is a problem which occurs ever so often in applied work, you have a plot, but you want the data. There are at least two programs which can help you there; PlotDigitizer and Engauge Digitizer. I got both on my openSuse machine. Both are available for...

## A course in sample surveys for political science

January 18, 2014
By

A colleague asked if I had any material for a course in sample surveys. And indeed I do. See here. It’s all the slides for a 14-week course, also the syllabus (“surveyscourse.pdf”), the final exam (“final2012.pdf”) and various misc files. Also more discussion of final exam questions here (keep scrolling thru the “previous entries” until […]The post A course in sample surveys for political science appeared first on Statistical Modeling,…

## Machine Learning Lesson of the Day – Cross-Validation

Validation is a good way to assess the predictive accuracy of a supervised learning algorithm, and the rule of thumb of using 70% of the data for training and 30% of the data for validation generally works well.  However, what if the data set is not very large, and the small amount of data for […]

## Metaphors Matter: Factor Structure vs. Correlation Network Maps

January 17, 2014
By

The psych R package includes a data set called "bfi" with self-report ratings on 25 personality items along a 6-point agreement scale. All the details are provided in the documentation accompanying the package. My focus is how to represent the correlat...

## Animated choropleths using animation, ggplot2, rCharts, googleVis and Shiny to visualize violent crime rates in different US States across 5 decades

January 17, 2014
By

UPDATE: THE BLOG/SITE HAS MOVED TO GITHUB. THE NEW LINK FOR THE BLOG/SITE IS patilv.github.io and THE LINK TO THIS POST IS: http://bit.ly/1jccIBN. PLEASE UPDATE ANY BOOKMARKS YOU MAY HAVE.This post uses animated choropleths to visualize violent crime r...

## Causality and T-Consistency vs. Correlation and P-Consistency

January 17, 2014
By

Consider a standard linear regression setting with $$K$$ regressors and sample size $$N$$. We will say that an estimator $$\hat{\beta}$$ is consistent for a treatment effect (T-consistent") if \(plim \hat{\beta}_k = {\partial E(y|x) }/{\partial x_k}\...

## An Interesting New Book

January 17, 2014
By

Here's a new book that looks as if it will be interesting, and I'm looking forward to reading it myself: Panel Data Analysis Using Eviews, writen by I Gusti Ngurah Agung. Two other related books by this author have been published perviously - see here....

## Happy New Year, It’s Too Late

January 17, 2014
By

Couple people wished me happy new year yesterday, Jan 16th.  But, you realize, the year is already 1/24th over?  From R, rounding by W > 16/365 [1] 0.0438 > 1/24 [1] 0.0417 > 15/365 [1] 0.0411 Somewhere between the 15th and the 16th we cros...

## Missing not at random data makes some Facebook users feel sad

January 17, 2014
By

This article, published last week, explained how "some younger users of Facebook say that using the site often leaves them feeling sad, lonely and inadequate".  Being a statistician  gives you an advantage here because we know that naive estimates from missing … Continue reading →

## How to think about the statistical evidence when the statistical evidence can’t be conclusive?

January 17, 2014
By

There’s a paradigm in applied statistics that goes something like this: 1. There is a scientific or policy question of some theoretical or practical importance. 2. Researchers gather data on relevant outcomes and perform a statistical analysis, ideally leading to a clear conclusion (p less than 0.05, or a strong posterior distribution, or good predictive […]The post How to think about the statistical evidence when the statistical evidence can’t be…

## Applied Statistics Lesson of the Day – The Completely Randomized Design with 1 Factor

The simplest experimental design is the completely randomized design with 1 factor.  In this design, each experimental unit is randomly assigned to each factor level.  This design is most useful for a homogeneous population (one that does not have major differences between any sub-populations).  It is appealing because of its simplicity and flexibility – it can […]

## Estimating the Generalized Pareto Distribution

January 16, 2014
By

The generalized Pareto distribution (GPD) arises in the modelling of "extremes", especially if the "peaks-over-threshold" approach is being used. Estimating the parameters of the GPD by the method of maximum likelihood is especially challenging. The ch...

## Objective/subjective, dirty hands and all that: Gelman/ Wasserman blogolog (ii)

January 16, 2014
By

Andrew Gelman says that as a philosopher, I should appreciate his blog today in which he records his frustration: “Against aggressive definitions: No, I don’t think it helps to describe Bayes as ‘the analysis of subjective beliefs’…”  Gelman writes: I get frustrated with what might be called “aggressive definitions,” where people use a restrictive definition of something […]

## edge.org asks famous scientists what scientific concept to throw out & they say statistics

January 16, 2014
By

I don't think I've ever been forwarded one link on the web more than I have been forwarded the edge.org post on "What scientific idea is ready for retirement?". Here are a few of the comments with my responses. I'm … Continue reading →

## Against overly restrictive definitions: No, I don’t think it helps to describe Bayes as “the analysis of subjective  beliefs” (nor, for that matter, does it help to characterize the statements of Krugman or Mankiw as not being “economics”)

January 16, 2014
By

I get frustrated when people use aggressive overly restrictive definitions of something they don’t like. [I originally used the term "aggressive definitions" but I think the whole "aggressive" thing was misleading as it implies aggressive intent, which I did not mean to imply. So I changed to "overly restrictive definition."] Here’s an example of an […]The post Against overly restrictive definitions: No, I don’t think it helps to describe Bayes…

## My business statistics and data visualization courses

January 16, 2014
By

I have been busy working on syllabuses for my Spring 2014 courses at NYU, and that's why posting has been more haphazard than usual. I don't think I have said much about my teaching here on the blog, so let me take this opportunity to introduce the classes that I teach. Statistics For Management I (link) This is an introductory statistics course with a business/management emphasis. Many students take this…

## BMHE & BCEA get a shout in published paper

January 15, 2014
By

Panagiotis Petrou has posted a link to a recent paper of his, which develops a cost-effectiveness analysis of a drug used as a second-line treatment of renal carcinoma. The analysis is based on a Bayesian Markov model. But (from an incredibly self...