## Le Monde puzzle [#849]

January 18, 2014
By

A straightforward Le Monde mathematical puzzle: Find a pair (a,b) of integers such that a has an odd number d of digits larger than 2 and ab is written as 10d+1+10a+1. Find the smallest possible values of a and of b. I ran the following R code which produced a=137 (and b=83) as the unique […]

## Measurement and Measurement Error, Weight, Success and Failure

January 18, 2014
By

This blog currently weights 200 pounds. It's inscribed in my data base, so it must be true. 200 is the latest in a series of daily morning readings wearing the same clothing, at the same time of my day. But how is that 200 measured? And is 200 good or ...

## Converting plots to data

January 18, 2014
By

It is a problem which occurs ever so often in applied work, you have a plot, but you want the data. There are at least two programs which can help you there; PlotDigitizer and Engauge Digitizer. I got both on my openSuse machine. Both are available for...

## A course in sample surveys for political science

January 18, 2014
By

A colleague asked if I had any material for a course in sample surveys. And indeed I do. See here. It’s all the slides for a 14-week course, also the syllabus (“surveyscourse.pdf”), the final exam (“final2012.pdf”) and various misc files. Also more discussion of final exam questions here (keep scrolling thru the “previous entries” until […]The post A course in sample surveys for political science appeared first on Statistical Modeling,…

## Machine Learning Lesson of the Day – Cross-Validation

Validation is a good way to assess the predictive accuracy of a supervised learning algorithm, and the rule of thumb of using 70% of the data for training and 30% of the data for validation generally works well.  However, what if the data set is not very large, and the small amount of data for […]

## Metaphors Matter: Factor Structure vs. Correlation Network Maps

January 17, 2014
By

The psych R package includes a data set called "bfi" with self-report ratings on 25 personality items along a 6-point agreement scale. All the details are provided in the documentation accompanying the package. My focus is how to represent the correlat...

## Animated choropleths using animation, ggplot2, rCharts, googleVis and Shiny to visualize violent crime rates in different US States across 5 decades

January 17, 2014
By

UPDATE: THE BLOG/SITE HAS MOVED TO GITHUB. THE NEW LINK FOR THE BLOG/SITE IS patilv.github.io and THE LINK TO THIS POST IS: http://bit.ly/1jccIBN. PLEASE UPDATE ANY BOOKMARKS YOU MAY HAVE.This post uses animated choropleths to visualize violent crime r...

## Causality and T-Consistency vs. Correlation and P-Consistency

January 17, 2014
By

Consider a standard linear regression setting with $$K$$ regressors and sample size $$N$$. We will say that an estimator $$\hat{\beta}$$ is consistent for a treatment effect (T-consistent") if \(plim \hat{\beta}_k = {\partial E(y|x) }/{\partial x_k}\...

## An Interesting New Book

January 17, 2014
By

Here's a new book that looks as if it will be interesting, and I'm looking forward to reading it myself: Panel Data Analysis Using Eviews, writen by I Gusti Ngurah Agung. Two other related books by this author have been published perviously - see here....

## Happy New Year, It’s Too Late

January 17, 2014
By

Couple people wished me happy new year yesterday, Jan 16th.  But, you realize, the year is already 1/24th over?  From R, rounding by W > 16/365 [1] 0.0438 > 1/24 [1] 0.0417 > 15/365 [1] 0.0411 Somewhere between the 15th and the 16th we cros...

## Missing not at random data makes some Facebook users feel sad

January 17, 2014
By

This article, published last week, explained how "some younger users of Facebook say that using the site often leaves them feeling sad, lonely and inadequate".  Being a statistician  gives you an advantage here because we know that naive estimates from missing … Continue reading →

## How to think about the statistical evidence when the statistical evidence can’t be conclusive?

January 17, 2014
By

There’s a paradigm in applied statistics that goes something like this: 1. There is a scientific or policy question of some theoretical or practical importance. 2. Researchers gather data on relevant outcomes and perform a statistical analysis, ideally leading to a clear conclusion (p less than 0.05, or a strong posterior distribution, or good predictive […]The post How to think about the statistical evidence when the statistical evidence can’t be…

## Applied Statistics Lesson of the Day – The Completely Randomized Design with 1 Factor

The simplest experimental design is the completely randomized design with 1 factor.  In this design, each experimental unit is randomly assigned to each factor level.  This design is most useful for a homogeneous population (one that does not have major differences between any sub-populations).  It is appealing because of its simplicity and flexibility – it can […]

## Estimating the Generalized Pareto Distribution

January 16, 2014
By

The generalized Pareto distribution (GPD) arises in the modelling of "extremes", especially if the "peaks-over-threshold" approach is being used. Estimating the parameters of the GPD by the method of maximum likelihood is especially challenging. The ch...

## Objective/subjective, dirty hands and all that: Gelman/ Wasserman blogolog (ii)

January 16, 2014
By

Andrew Gelman says that as a philosopher, I should appreciate his blog today in which he records his frustration: “Against aggressive definitions: No, I don’t think it helps to describe Bayes as ‘the analysis of subjective beliefs’…”  Gelman writes: I get frustrated with what might be called “aggressive definitions,” where people use a restrictive definition of something […]

## edge.org asks famous scientists what scientific concept to throw out & they say statistics

January 16, 2014
By

I don't think I've ever been forwarded one link on the web more than I have been forwarded the edge.org post on "What scientific idea is ready for retirement?". Here are a few of the comments with my responses. I'm … Continue reading →

## Against overly restrictive definitions: No, I don’t think it helps to describe Bayes as “the analysis of subjective  beliefs” (nor, for that matter, does it help to characterize the statements of Krugman or Mankiw as not being “economics”)

January 16, 2014
By

I get frustrated when people use aggressive overly restrictive definitions of something they don’t like. [I originally used the term "aggressive definitions" but I think the whole "aggressive" thing was misleading as it implies aggressive intent, which I did not mean to imply. So I changed to "overly restrictive definition."] Here’s an example of an […]The post Against overly restrictive definitions: No, I don’t think it helps to describe Bayes…

## My business statistics and data visualization courses

January 16, 2014
By

I have been busy working on syllabuses for my Spring 2014 courses at NYU, and that's why posting has been more haphazard than usual. I don't think I have said much about my teaching here on the blog, so let me take this opportunity to introduce the classes that I teach. Statistics For Management I (link) This is an introductory statistics course with a business/management emphasis. Many students take this…

## BMHE & BCEA get a shout in published paper

January 15, 2014
By

Panagiotis Petrou has posted a link to a recent paper of his, which develops a cost-effectiveness analysis of a drug used as a second-line treatment of renal carcinoma. The analysis is based on a Bayesian Markov model. But (from an incredibly self...

## Clarity and Kindness

January 15, 2014
By

I'm editing a generally well written, near-final draft of a biostatistics paper. Worth broadcasting are several writing problems that occur in almost all grad student writing.  Don't denigrate your contributions.  Original: A simple way to achieve th...

## DNS/AFNS Yield Curve Modeling FAQ’s

January 15, 2014
By

It's hard to believe that I haven't yet said anything about yield-curve modeling and forecasting in the dynamic Nelson-Siegel (DNS) tradition, whether the original Diebold-Li (2006) DNS version or the Christensen-Diebold-Rudebusch (2011) arbitrage...

## Postdoc involving pathbreaking work in MRP, Stan, and the 2014 election!

January 15, 2014
By

We’re working with polling company YouGov to track public opinion, state-by-state and district-by-district, during the 2014 campaign. We’ll be using multilevel regression and poststratification, and implementing it in Stan, and developing the necessary new parts of Stan to get this running scalably and efficiently. And we’ll be making the most detailed, up-to-date election forecasts. What […]The post Postdoc involving pathbreaking work in MRP, Stan, and the 2014 election! appeared first…

## Say hello to SAS Analytics 13.1

January 15, 2014
By

Late last month, while many of us were sipping eggnog and decking halls with boughs of holly, SAS released the 13.1 version of its analytical products. Readers of Maura Stokes' newsletter, SAS Statistics and Operations Research News (Nov 2013), have already been alerted to new features in products such as [...]