Proc tabulate for simple statistics (corrected)

October 30, 2011
By

Ken Beath, of Macquarie University, commented on an earlier entry that the best way to generate summary statistics is using proc tabulate. While the best tools might differ, depending on the purpose, we wanted to share Ken's code demonstrating how to ...

All your Bayes are belong to us!

October 27, 2011
By

This week's post contains solutions to My Favorite Bayes's Theorem Problems, and one new problem.  If you missed last week's post, go back and read the problems before you read the solutions! If you don't understand the title of this post, brush...

PAWL package on CRAN

October 26, 2011
By

The PAWL package (which I talked about there, and which implements the parallel adaptive Wang-Landau algorithm and adaptive Metropolis-Hastings for comparison) is now on CRAN! http://cran.r-project.org/web/packages/PAWL/index.html which means that within R you can easily install it by typing install.packages("PAWL") Isn’t that amazing? It’s just amazing. Kudos to the CRAN team for their quickness and their […]

Example 9.11: Employment plot

October 25, 2011
By

A facebook friend posted the picture reproduced above-- it makes the case that President Obama has been a successful creator of jobs, and also paints GW Bush as a president who lost jobs. Another friend pointed out that to be fair, all of Bush's presi...

Named in Best Colleges top 50 statistics blogs of 2011!

October 25, 2011
By

Realizations in Biostatistics has been named in Best Colleges top 50 best statistics blogs of 2011! The wide variety of content in this blog has been noted, and, yes, I do try to write about a lot of different aspects of statistics for technical and no...

Parameter vs. Observation Dimension?

October 24, 2011
By

*** Updated 10/27/11: Original text appended in strike. *** Bill Bolstad’s response to Xi’an’s review of his book Understanding Computational Bayesian Statistics included the following comment, which I found interesting: Frequentist p-values are constructed in the parameter dimension using a probability distribution defined only in the observation dimension. Bayesian credible intervals are constructed in the [...]

Support Vector Machine with GPU, Part II

October 22, 2011
By

In our last tutorial on SVM training with GPU, we mentioned a necessary step to pre-scale the data with rpusvm-scale, and to reverse scaling the prediction outcome. This cumbersome procedure is now simplified with the latest RPUSVM. read more

My favorite Bayes’s Theorem problems

October 20, 2011
By

This week: some of my favorite problems involving Bayes's Theorem.  Next week: solutions. 1) The first one is a warm-up problem.  I got it from Wikipedia (but it's no longer there): Suppose there are two full bowls of cookies. Bowl #1 has 10...

The Wang-Landau algorithm reaches the flat histogram in finite time.

October 20, 2011
By

Cross-posted from my personal blog. MCMC practitioners may be familiar with the Wang-Landau algorithm, which is widely used in Physics. This algorithm divides the sample space into “boxes”. Given a target distribution, the algorithm then samples proportionally to the target in each box, while aiming at spending a pre-defined proportion of the sample in each [...]

Parachutes

October 20, 2011
By

I went to a great talk today by David Goldstein, which I might write about further later since he said many of things of considerable interest. But I had to quickly point to an interesting paper he mentioned: Parachute use to prevent death and major trauma related to gravitational challenge: systematic review of randomised controlled [...]

October 17, 2011
By

I read Jason Rosenhouse's book about The Monty Hall Problem recently, and I use the problem as an example in my statistics class.  Last semester I wrote a variation of the problem that turns out to be challenging, and a motivating problem for Baye...

Random art on the web

October 15, 2011
By

Since we explored some statitics of an abstract painting with Pierre (we even have an article in Variances last issue!), I became more sensitive to art linked to randomness. Here are some pointers to related websites I have digged out. Random.org, mentioned here by Pierre, is, at it reads, a true random number service that […]

qr_multiply function in scipy.linalg

October 14, 2011
By

In scipy's development version there's a new function closely related to the QR-decomposition of a matrix and to the least-squares solution of a linear system. What this function does is to compute the QR-decomposition of a matrix and then multiply the...

Multiply Imputing an Outcome Variable

October 12, 2011
By

Some scholars suggest that multiply imputing an outcome variable is incorrect. I use intuition and simulation to argue that multiply imputing outcomes can drastically improve estimates, even in the case of non-ignorable missingness. Continue reading &#...

Artist view of crimes in London

October 10, 2011
By

At first sight, one could think this picture is a scale model of some narrow moutains, like Bryce Canyon… Actually it represents crimes in East London, an cardboard artwork by the Londoner artist Abigail Reynolds, called Mount Fear.  Here is what can be read on the artist’s webpage: The terrain of Mount Fear is generated […]

Using Sweave

October 8, 2011
By

If you use R and haven’t discovered Sweave then go and find out about it. It enables R code and plots to be incorporated into a document so the analysis and report can be combined together in a single document. … Continue reading →

Kernel Methods and Support Vector Machines de-Mystified

October 8, 2011
By

We give a simple explanation of the interrelated machine learning techniques called kernel methods and support vector machines. We hope to characterize and de-mystify some of the properties of these methods. To do this we work some examples and draw a few analogies. The familiar no matter how wonderful is not perceived as mystical. Goals [...] Related posts: Book Review: Ensemble Methods in Data Mining (Seni & Elder) Six Fundamental…

Bayesian Computation (3)

October 6, 2011
By

In Chapter 3 of "Bayesian Computation with R", Jim Albert talked about how to conduct 2 fundamental tasks of Statistics, namely Estimation and Hypothesis Testing in a single parameter framework.The structure of this chapter is organized as the followin...

Obtain Trace of the Projection Matrix in a Linear Regression

October 6, 2011
By

Recently, I am working on coding in SAS for a set of regularized regressions and need to compute trace of the projection matrix:$$S=X(X'X + \lambda I)^{-1}X'$$.Wikipedia has a well written introduction to Trace @ here.To obtain the inverse of matrix ...

Calling Google Maps API from R

October 5, 2011
By

Hi, Related to Julyan’s previous post, I want to share an easy way to access Google Maps API through R. And then we’ll stop about Google, otherwise it’ll look like we’re just looking for jobs. My problem was the following: I have a database (from priceofweed.com), with locations written as “city, region, country”. What I [...]

Calculating and graphing within-subject confidence intervals for ANOVA

October 4, 2011
By

Psychologists are gradually coming round to the view that it is a good idea to present interval estimates alongside point estimates of statistics. The most common statistic reported in psychology research is almost certainly the me...

Drawing maps using shape files and R

October 4, 2011
By

Sometimes, the only thing we want is a chart that speaks for itself rather than boring regression tables in our research paper. Graphs are efficient at showing the broad picture of an issue. In fact, graphs in research papers seem to be gaining a momen...

Showing Explained Variance in Multilevel Models

October 3, 2011
By

In this post I will show one way to display explained variance using a line chart. For the best of my knowledge, there is no a default plot for displaying the effect of a factor on the deviance of multilevel models; so this is going to be a tentative ...