## Write a reusable SAS/IML module that passes values to R

November 20, 2013
By

When I call R from within the SAS/IML language, I often pass parameters from SAS into R. This feature enables me to write general-purpose, reusable, modules that can analyze data from many different data sets. I've previously blogged about how to pass values to SAS procedures from PROC IML by [...]

## NYT (non)-retraction watch

November 20, 2013
By

Mark Palko is irritated by the Times’s refusal to retract a recounting of a hoax regarding Dickens and Dostoevsky. All I can say is, the Times refuses to retract mistakes of fact that are far more current than that! See here for two examples that particularly annoyed me, to the extent that I contacted various […]The post NYT (non)-retraction watch appeared first on Statistical Modeling, Causal Inference, and Social Science.

## Raising Statistical Standards Effect on Sample Size

November 20, 2013
By

The failure of mainstream research to consistently reproduce results have led many to look for the faults in current methodologies.One of these potential faults identified is that the significance levels of current standards is too high.  A standa...

## On the use of marginal posteriors in marginal likelihood estimation via importance-sampling

November 19, 2013
By

Perrakis, Ntzoufras, and Tsionas just arXived a paper on marginal likelihood (evidence) approximation (with the above title). The idea behind the paper is to base importance sampling for the evidence on simulations from the product of the (block) marginal posterior distributions. Those simulations can be directly derived from an MCMC output by randomly permuting the […]

November 19, 2013
By

First, I saw Andrew Gelman's rant about "big bad education" (link) which leads me to Mark Palko's rant about teaching "the Law of Large Numbers" in the new "Common Core" curriculum for New York schools. Mark's conclusion being: If we start talking about setting aside significant time to cover probability and statistics accurately and in reasonable depth and put the ideas in proper context, you have my enthusiastic support, but…

## Practical Data Science with R: Manning Deal of the Day November 19th 2013

November 19, 2013
By

Please share: Manning Deal of the Day November 19: Half off Practical Data Science with R. Use code dotd1119au at www.manning.com/zumel/. Related posts: Data Science, Machine Learning, and Statistics: what is in a name? Data science project planning S...

## A Survey Tool Designed Entirely in Shiny Surveying Users of R

November 19, 2013
By

I have written a very basic survey tool built entirely in the Shiny package of R.  I hope the tool is useful.  Modifying the survey for your own purposes is trivially easy (I hope).I have not commented my code so it is pretty messy right now....

## More on “data science” and “statistics”

November 19, 2013
By

After reading Rachel and Cathy’s book, I wrote that “Statistics is the least important part of data science . . . I think it would be fair to consider statistics as a subset of data science. . . . it’s not the most important part of data science, or even close.” But then I received […]The post More on “data science” and “statistics” appeared first on Statistical Modeling, Causal Inference,…

## A letter to high-school students

November 19, 2013
By

Imagine Magazine, a youth-focused journal by Johns Hopkins's Center of Talented Youth, invited me to contribute an article in celebration of statistics. I try to convey the fun and joy of working with numbers and charts. You can read it...

## R and Solr Integration Using Solr’s REST APIs

November 19, 2013
By

Solr is the most popular, fast and reliable open source enterprise search platform from the Apache Luene project.  Among many other features, we love its powerful full-text search, hit highlighting, faceted search, and near real-time indexing. &nb...

## Predicting claims with a Bayesian network

November 19, 2013
By

Here is a little Bayesian Network to predict the claims for two different types of drivers over the next year, see also example 16.15 in [1]. Let's assume there are good and bad drivers. The probabilities that a good driver will have 0, 1 or 2 claims i...

## Lucien Le Cam: “The Bayesians hold the Magic”

November 18, 2013
By

Today is Lucien Le Cam’s birthday. He was an error statistician whose remarks in an article, “A Note on Metastatisics,” in a collection on foundations of statistics (Le Cam 1977)* had some influence on me.  A statistician at Berkeley, Le Cam was a co-editor with Neyman of the Berkeley Symposia volumes. I hadn’t mentioned him on […]

## Binomial regression model

November 18, 2013
By
$Y_i\sim\mathcal{B}(p(\boldsymbol{X_i}))$

Most of the time, when we introduce binomial models, such as the logistic or probit models, we discuss only Bernoulli variables, . This year (actually also the year before), I discuss extensions to multinomial regressions, where  is a function on some simplex. The multinomial logistic model was mention here. The idea is to consider, for instance with three possible classes the following model and Now, what about a real Binomial model, , where ‘s are known. How…

## Feeling optimistic after the Future of the Statistical Sciences Workshop

November 18, 2013
By

Last I week I participated in the Future of the Statistical Sciences Workshop. I arrived feeling somewhat pessimistic about the future of our discipline. My pessimism stemmed from the emergence of the term Data Science and the small role academic … Continue reading →

## Graduate Course on Copulas and Extreme Values

November 18, 2013
By

This Winter, I will be giving a (graduate) course on extreme values, and copulas (more generally multivariate models and dependence), MAT8595. It is an ISM course, and even if it will probably be given in French, I will upload information here, in English. I will upload the (detailed) syllabus of the course during the Christmas holidays. But to give an overview, for those willing to register, the first part of the course will…

## What’s my Kasparov number?

November 18, 2013
By

A colleague writes: Personally my Kasparov number is two: I beat ** in a regular tournament game, and ** beat Kasparov! That’s pretty impressive, especially given that I didn’t know this guy played chess at all! Anyway, this got me thinking, what’s my Kasparov number? OK, that’s easy. I beat Magnus Carlsen the other day […]The post What’s my Kasparov number? appeared first on Statistical Modeling, Causal Inference, and Social…

## The e-Writing Jungle Part 2: The MathML Impasse and the MathJax Solution

November 18, 2013
By

Back to LaTeX and MathJax and MathML and Python and Sphinx and IPython and R and Knitter and Firefox and Chrome and ...In Part 1, I praised e-books done as LaTeX to pdf to the web, perhaps surprisingly. Now let's go the other way, to an e-boo...

## Historical Value at Risk versus historical Expected Shortfall

November 18, 2013
By

Comparing the behavior of the two on the S&P 500. Previously There have been a few posts about Value at Risk (VaR) and Expected Shortfall (ES) including an introduction to Value at Risk and Expected Shortfall. Data and model The underlying data are daily returns for the S&P 500 from 1950 to the present. The VaR and … Continue reading →

## Vectorizing the construction of a structured matrix

November 18, 2013
By

In using a vector-matrix language such as SAS/IML, MATLAB, or R, one of the challenges for programmers is learning how to vectorize computations. Often it is not intuitive how to program a computation so that you avoid looping over the rows and columns of a matrix. However, there are a [...]

## Some Options for Testing Tables

November 18, 2013
By

Contingency tables are a very good way to summarize discrete data.  They are quite easy to construct and reasonably easy to understand. However, there are many nuances with tables and care should be taken when making conclusions related to the data. Here are just a few thoughts on the topic. Dealing with sparse data On […]

## Alpha testing shinyapps.io – first impressions

November 18, 2013
By

ShinyApps.io is a new server which is currently in alpha testing to host Shiny applications.  It is being designed by the RStudio team and provides some distinct features different from that of the ShinyApps.io is intended for larger applications ...

## Analysis of “Deal or No Deal” results

November 18, 2013
By

Deal or No Deal My son, Jonathan, loves game-shows, and his current favourite is Deal or No Deal, the Australian version. It has been airing now for over ten years, and there is at least one episode available every weeknight … Continue reading →

## Hello North Carolina

November 17, 2013
By

This Wednesday, I'm giving the Big Data Seminar at NC State. Here is the announcement. *** In his new book Numbersense: How to Use Big Data to Your Advantage, Kaiser Fung (NYU & Vimeo statistician) calls attention to one aspect of the Big Data phenomenon that has not received media attention: the consumers of Big Data analyses, i.e. everyone, will face more confusion and less clarity as the volume of…