## More on “data science” and “statistics”

November 19, 2013
By

After reading Rachel and Cathy’s book, I wrote that “Statistics is the least important part of data science . . . I think it would be fair to consider statistics as a subset of data science. . . . it’s not the most important part of data science, or even close.” But then I received […]The post More on “data science” and “statistics” appeared first on Statistical Modeling, Causal Inference,…

## A letter to high-school students

November 19, 2013
By

Imagine Magazine, a youth-focused journal by Johns Hopkins's Center of Talented Youth, invited me to contribute an article in celebration of statistics. I try to convey the fun and joy of working with numbers and charts. You can read it...

## R and Solr Integration Using Solr’s REST APIs

November 19, 2013
By

Solr is the most popular, fast and reliable open source enterprise search platform from the Apache Luene project.  Among many other features, we love its powerful full-text search, hit highlighting, faceted search, and near real-time indexing. &nb...

## Predicting claims with a Bayesian network

November 19, 2013
By

Here is a little Bayesian Network to predict the claims for two different types of drivers over the next year, see also example 16.15 in [1]. Let's assume there are good and bad drivers. The probabilities that a good driver will have 0, 1 or 2 claims i...

## Lucien Le Cam: “The Bayesians hold the Magic”

November 18, 2013
By

Today is Lucien Le Cam’s birthday. He was an error statistician whose remarks in an article, “A Note on Metastatisics,” in a collection on foundations of statistics (Le Cam 1977)* had some influence on me.  A statistician at Berkeley, Le Cam was a co-editor with Neyman of the Berkeley Symposia volumes. I hadn’t mentioned him on […]

## Binomial regression model

November 18, 2013
By
$Y_i\sim\mathcal{B}(p(\boldsymbol{X_i}))$

Most of the time, when we introduce binomial models, such as the logistic or probit models, we discuss only Bernoulli variables, . This year (actually also the year before), I discuss extensions to multinomial regressions, where  is a function on some simplex. The multinomial logistic model was mention here. The idea is to consider, for instance with three possible classes the following model and Now, what about a real Binomial model, , where ‘s are known. How…

## Feeling optimistic after the Future of the Statistical Sciences Workshop

November 18, 2013
By

Last I week I participated in the Future of the Statistical Sciences Workshop. I arrived feeling somewhat pessimistic about the future of our discipline. My pessimism stemmed from the emergence of the term Data Science and the small role academic … Continue reading →

## Graduate Course on Copulas and Extreme Values

November 18, 2013
By

This Winter, I will be giving a (graduate) course on extreme values, and copulas (more generally multivariate models and dependence), MAT8595. It is an ISM course, and even if it will probably be given in French, I will upload information here, in English. I will upload the (detailed) syllabus of the course during the Christmas holidays. But to give an overview, for those willing to register, the first part of the course will…

## What’s my Kasparov number?

November 18, 2013
By

A colleague writes: Personally my Kasparov number is two: I beat ** in a regular tournament game, and ** beat Kasparov! That’s pretty impressive, especially given that I didn’t know this guy played chess at all! Anyway, this got me thinking, what’s my Kasparov number? OK, that’s easy. I beat Magnus Carlsen the other day […]The post What’s my Kasparov number? appeared first on Statistical Modeling, Causal Inference, and Social…

## The e-Writing Jungle Part 2: The MathML Impasse and the MathJax Solution

November 18, 2013
By

Back to LaTeX and MathJax and MathML and Python and Sphinx and IPython and R and Knitter and Firefox and Chrome and ...In Part 1, I praised e-books done as LaTeX to pdf to the web, perhaps surprisingly. Now let's go the other way, to an e-boo...

## Historical Value at Risk versus historical Expected Shortfall

November 18, 2013
By

Comparing the behavior of the two on the S&P 500. Previously There have been a few posts about Value at Risk (VaR) and Expected Shortfall (ES) including an introduction to Value at Risk and Expected Shortfall. Data and model The underlying data are daily returns for the S&P 500 from 1950 to the present. The VaR and … Continue reading →

## Vectorizing the construction of a structured matrix

November 18, 2013
By

In using a vector-matrix language such as SAS/IML, MATLAB, or R, one of the challenges for programmers is learning how to vectorize computations. Often it is not intuitive how to program a computation so that you avoid looping over the rows and columns of a matrix. However, there are a [...]

## Some Options for Testing Tables

November 18, 2013
By

Contingency tables are a very good way to summarize discrete data.  They are quite easy to construct and reasonably easy to understand. However, there are many nuances with tables and care should be taken when making conclusions related to the data. Here are just a few thoughts on the topic. Dealing with sparse data On […]

## Alpha testing shinyapps.io – first impressions

November 18, 2013
By

ShinyApps.io is a new server which is currently in alpha testing to host Shiny applications.  It is being designed by the RStudio team and provides some distinct features different from that of the ShinyApps.io is intended for larger applications ...

## Analysis of “Deal or No Deal” results

November 18, 2013
By

Deal or No Deal My son, Jonathan, loves game-shows, and his current favourite is Deal or No Deal, the Australian version. It has been airing now for over ten years, and there is at least one episode available every weeknight … Continue reading →

## Hello North Carolina

November 17, 2013
By

This Wednesday, I'm giving the Big Data Seminar at NC State. Here is the announcement. *** In his new book Numbersense: How to Use Big Data to Your Advantage, Kaiser Fung (NYU & Vimeo statistician) calls attention to one aspect of the Big Data phenomenon that has not received media attention: the consumers of Big Data analyses, i.e. everyone, will face more confusion and less clarity as the volume of…

## Probabilité et géométrie

November 17, 2013
By
$\mathbb{P}(Y=y)=\sum_x \mathbb{P}(Y=y,X=x)$

Une des formules les plus importantes en probabilité (je trouve) est la “formule des probabilités totales” qui dit tout simplement que que l’ont peut aussi écrire, à l’aide de la formule de Bayes Une des conséquences de ce résultat est la “law of total expectation“, souvent appelé théorème de double projection, que l’on écrit souvent sous la forme raccourcie  (dans la formule de droite, le premier symbole est un espérance, c’est…

November 17, 2013
By

In response to some big new push for testing schoolchildren, Mark Palko writes: The announcement of a new curriculum is invariably followed by a round of hearty round of self congratulations and talk of “keeping standards high” as if adding a slide to a PowerPoint automatically made students better informed. It doesn’t work that way. […]The post Big bad education bureaucracy does big bad things appeared first on Statistical Modeling,…

## Dutch Rainwater Composition 1992-2005.

November 17, 2013
By

After reading Blog About Stats' Open Data Index Blog Post I decided to browse a bit in the Open Data Index. Choosing Netherlands and following Emission of Pollutants I ended on a page from National Institute for Public Health. The page&n...

## What should statistics do about massive open online courses?

November 17, 2013
By

Marie Davidian, the President of the American Statistical Association, writes about the JHU Biostatistics effort to deliver massive open online courses. She interviewed Jeff, Brian Caffo, and me and summarized our thoughts. All acknowledge that the future is unknown. How … Continue reading →

## Stein’s Method

November 16, 2013
By
$Stein’s Method$

I have mentioned Stein’s method in passing, a few times on this blog. Today I want to talk about Stein’s method in a bit of detail. 1. What Is Stein’s Method? Stein’s method, due to Charles Stein, is actually quite old, going back to 1972. But there has been a great deal of interest in […]

## Stein’s Method

November 16, 2013
By
$Stein’s Method$

I have mentioned Stein’s method in passing, a few times on this blog. Today I want to talk about Stein’s method in a bit of detail. 1. What Is Stein’s Method? Stein’s method, due to Charles Stein, is actually quite old, going back to 1972. But there has been a great deal of interest in […]

## S. Stanley Young: More Trouble with ‘Trouble in the Lab’ (Guest post)

November 16, 2013
By

Stanley Young’s guest post arose in connection with Kepler’s Nov. 13, and my November 9 post,and associated comments. S. Stanley Young, PhD Assistant Director for Bioinformatics National Institute of Statistical Sciences Research Triangle Park, NC Much is made by some of the experimental biologists that their art is oh so sophisticated that mere mortals do not have […]