I raise this question because we see calls for running segmentation with individual estimates from hierarchical Bayes choice models without any mention of the possible complications that might accompany such an approach. Actually, ...

To give you all some time to digest my review of the Art of R Programming, I thought why not continue this trend of book reviews with a review of Learning SQL. This book came highly recommended by some colleagues of mine as a place to whet your SQL app...

My Coursera Data Analysis class is done for now! All the lecture notes are on Github all the videos are on Youtube. They are tagged by week with tags “Week x”. After ENAR the comments on how to have better stats … Continue reading →

This blog post uses a function and a script written in R that were displayed in an earlier blog post. Introduction This is the second of a series of blog posts about simple linear regression; the first was written recently on some conceptual nuances and subtleties about this model. In this blog post, I will use […]

Here is the function that I wrote for doing simple linear regression, as alluded to in my blog post about simple linear regression on log-transformed data on the decay of DDT concentration in trout in Lake Michigan. My goal was to replicate the 4 columns of the output from applying summary() to the output of lm(). […]

Arguments in data visualization are so fierce because the stakes are so low is a great zinger that I’ve heard a few times recently. But it’s not always true. Data visualization influences important decisions every day. The Congressional Budget Office’s new snapshots are but one example. The role of the Congressional Budget Office (CBO) is to provide information to members of the U.S. Congress so they can make better decisions.…

When text data is in a nice CSV format, read.csv is enough to parse it into a useable format. But if this is not the case, getting the data into a useable format is not so straightforward. In this post… See more ›

In his review in 1938 of Historical Development of the Graphical Representation of Statistical Data, by H. Gray Funkhauser, for The Economic Journal, the great economist writes: Perhaps the most striking outcome of Mr. Funkhouser’s researches is the fact of the very slow progress which graphical methods made until quite recently. . . . In [...]

We had a great turnout yesterday for our Zero to R Hero workshop at the Quebec Centre for Biodiversity Science. We went from the absolute basics of the command line, to the intricacies of importing data, and finally we had a look at plotting using ggplot2. We didn’t have time to get to this extra module […]

Dan Kahan writes: The basic idea . . . is to promote identification of study designs that scholars who disagree about a proposition would agree would generate evidence relevant to their competing conjectures—regardless of what studies based on such designs actually find. Articles proposing designs of this sort would be selected for publication and only [...]

To explore how we can make it easier to create new visualization designs, we are running a study based on a new approach, called visualization primitives. It lets you map data to the properties of objects like rectangles and ellipses. Build something w...

Why I had used html5 for my today’s talk? My last presentation was prepared using html5. This time I wanted some innovation while making the slides. I prepared first few slides in Jessyink. Then I got to know that my … Continue reading → The post Data visualisation talk: Presentation using reports package appeared first on Fiddling with data and code.

I am in the process of uploading the video lectures for Data Analysis. I am getting ready to send out the course wrap-up email and I wanted to include the link to the Youtube playlist as well. Unfortunately, Youtube keeps … Continue reading →

Maximum Sharpe Portfolio or Tangency Portfolio is a portfolio on the efficient frontier at the point where line drawn from the point (0, risk-free rate) is tangent to the efficient frontier. There is a great discussion about Maximum Sharpe Portfolio or Tangency Portfolio at quadprog optimization question. In general case, finding the Maximum Sharpe Portfolio […]

Needless to say, it is with great pleasure I am back in beautiful Padova for the workshop Recent Advances in statistical inference: theory and case studies, organised by Laura Ventura and Walter Racugno. Esp. when considering this is one of the last places I met with George Casella, in June 2010. As we have plenty [...]

Jake Hofman writes that he saw my recent newspaper article on running (“How fast do we slow down? . . . For each doubling of distance, the world record time is multiplied by about 2.15. . . . for sprints of 200 meters to 1,000 meters, a doubling of distance corresponds to an increase of [...]

Attention conservation notice: I look back on my works with smug complacency. I started this weblog in January 2003; I don't remember exactly when, and date on files got messed up by various changes from Blogger to Movable Type to Blosxom, where it ...

The desirability of estimating not just conditional means, variances, etc., but whole distribution functions. Parametric maximum likelihood is a solution, if the parametric model is right. Histograms and empirical cumulative distribution functions a...

Homework 7: in which we try to predict political orientation from bumps on the skull the volume of brain regions determined by MRI and adjusted by (unknown) formulas. Assignment, n90_pol.csv data Advanced Data Analysis from an Elementary Point o...