The Art of R Programming review – part 3

March 18, 2013
By

Continuing this series, I'm going to list out a number of other interesting functions and features that I've been (re)learning while reading the Art of R Programming. cut(x,b,labels=F) - creates a factor out of a vector x, by placing each element of x ...

Read more »

Jerome Cornfield: The statistician who established risk factors for lung cancer and heart disease

March 18, 2013
By
Jerome Cornfield: The statistician who established risk factors for lung cancer and heart disease

One purpose of this International Year of Statistics is to spread the word that the field of statistics benefits society. As part of the International Year, many organizations, including SAS and the American Statistical Association (ASA), are turning to history to illustrate how statistics is vital to the health and [...]

Read more »

Which political science journals will have a data policy?

March 18, 2013
By
Which political science journals will have a data policy?

Making available replication materials for the research you do is A Good Thing. It’s also work, and it’s quite easy to never get around to. Certainly I claim no special virtue in this department so I am always happy when there’s an institutional stick to prod my better nature in the right direction. One such institutional […]

Read more »

Which political science journals will have a data policy?

March 18, 2013
By
Which political science journals will have a data policy?

Making available replication materials for the research you do is A Good Thing. It’s also work, and it’s quite easy to never get around to. Certainly I claim no special virtue in this department so I am always happy when there’s an institutional stick to prod my better nature in the right direction. One such institutional [...]

Read more »

Lecture: Additive Models (Advanced Data Analysis from an Elementary Point of View)

March 18, 2013
By

The "curse of dimensionality" limits the usefulness of fully non-parametric regression in problems with many variables: bias remains under control, but variance grows rapidly with dimension. Parametric models do not have this problem, but have bias a...

Read more »

Testing Regression Specifications (Advanced Data Analysis from an Elementary Point of View)

March 18, 2013
By

Non-parametric smoothers can be used to test parametric models. Forms of tests: differences in in-sample performance; differences in generalization performance; whether the parametric model's residuals have expectation zero everywhere. Constructing ...

Read more »

Elsewhere

March 18, 2013
By

Blogging will stay non-existent sporadic while I struggle to get enough ahead of the 96 students in ADA that I can do some research devote myself to contemplating the mysteries of the universe and helping young minds develop their own powers. In the ...

Read more »

Homework: How the Hyracotherium Got Its Mass (Advanced Data Analysis from an Elementary Point of View)

March 18, 2013
By

In which extinct charismatic megafauna give us an excuse to specification test. Assignment Advanced Data Analysis from an Elementary Point of View

Read more »

Logistic Regression (Advanced Data Analysis from an Elementary Point of View)

March 18, 2013
By

Modeling conditional probabilities; using regression to model probabilities; transforming probabilities to work better with regression; the logistic regression model; maximum likelihood; numerical maximum likelihood by Newton's method and by iterative...

Read more »

Exam: Nice Demo City, But Will It Scale? (Advanced Data Analysis from an Elementary Point of View)

March 18, 2013
By

In which we compare the power-law scaling model of urban economies due to Bettencourt, West, et al. to an alternative in which city size is actually irrelevant. This was a one-week take-home exam, intended to use more or less everything taught so far...

Read more »

Generalized Linear Models and Generalized Additive Models (Advanced Data Analysis from an Elementary Point of View)

March 18, 2013
By

Iteratively re-weighted least squares for logistic regression re-examined: coping with nonlinear transformations and model-dependent heteroskedasticity. The common pattern of generalized linear models and IRWLS. Binomial and Poisson regression. The ...

Read more »

Multivariate Distributions (Advanced Data Analysis from an Elementary Point of View)

March 18, 2013
By

Reminders about multivariate distributions. The multivariate Gaussian distribution: definition, relation to the univariate or scalar Gaussian distribution; effect of linear transformations on the parameters; plotting probability density contours in t...

Read more »

"Frequentist Accuracy of Bayesian Estimates" (This Week at the Statistics Seminar)

March 18, 2013
By

A speaker who needs no introduction (but will get one), on a topic whose closeness to my heart needs no elaboration (but will get it): Bradley Efron, "Frequentist Accuracy of Bayesian Estimates" Abstract: In the absence of prior information, popular...

Read more »

A Better Definition of Chart Junk

March 18, 2013
By
A Better Definition of Chart Junk

Maximizing the data-ink ratio sounds like a good idea, but when actually followed to the letter produces terrible and nonsensical results. Here is a more reasonable definition of chart junk that does away with the pretense of a mathematical formula and puts some common sense back into the question of good chart design. Much has been made of Tufte’s famous data-ink ratio, and many people like to rail, privately and…

Read more »

Update on Higgs data analysis: statistical flukes (part 1)

March 18, 2013
By
Update on Higgs data analysis: statistical flukes (part 1)

I am always impressed at how researchers flout the popular philosophical conception of scientists as being happy as clams when their theories are ‘born out’ by data, while terribly dismayed to find any anomalies that might demand “revolutionary science” (as Kuhn famously called it). Scientists, says Kuhn, are really only trained to do “normal science”—science […]

Read more »

Confidence Intervals: informal, traditional, bootstrap

March 18, 2013
By
Confidence Intervals: informal, traditional, bootstrap

Confidence Intervals Confidence intervals are needed because there is variation in the world. Nearly all natural, human or technological processes result in outputs which vary to a greater or lesser extent. Examples of this are people’s heights, students’ scores in … Continue reading →

Read more »

R: Time To Event Simulation – Weibull Model

March 17, 2013
By

This was a quick Saturday afternoon project- I wanted to write the guts of a program to simulate survival data. While there's plenty of survival datasets around to play with, I wanted to make something that could eventually be used to explore models wh...

Read more »

Variability of garch predictions

March 17, 2013
By
Variability of garch predictions

How variable are garch predictions? Previously There have been several posts on garch, in particular: A practical introduction to garch modeling The components garch model in the rugarch package Both of these posts speak about the two common prediction targets: prediction (of volatility) at the individual times (usually days) term structure prediction — the average … Continue reading →

Read more »

Sunday data/statistics link roundup (3/17/13)

March 17, 2013
By

A post on the Revolutions blog about an analysis of the worldwide email traffic patterns. The corresponding paper is also pretty interesting. The best part is the whole analysis was done in R.  A bill in California that would require … Continue reading →

Read more »

The disappearing or non-disappearing middle class

March 17, 2013
By
The disappearing or non-disappearing middle class

Despite the title, this post is mostly not about economics or even politics but rather about the central role of comparisons in statistics and statistical graphics. It started when someone pointed me to this article in which Megan McArdle points out the misleadingness of a graph that seems to show a bimodal income distribution but [...]

Read more »

Ordinal Data

March 17, 2013
By
Ordinal Data

I expect to be getting some ordinal data, from 5 or 9 point rating scales, pretty soon, so I am having a look ahead how to treat those. Often ANOVA is used, even though it is well known not to be ideal fro a statistical point of view, so that is the st...

Read more »

Happy St Patrick’s Day

March 17, 2013
By
Happy St Patrick’s Day

I love Saint Patrick’s Day for, at least, two reasons. The first one is that, on March 17th, you can play out loud The Pogues, the second one is that it’s the only day in the year when I really enjoy getting a Guiness in a pub. And Guiness is important in statistical science (I did mention a couple of hours ago – on this blog –  that beers were…

Read more »

“Nightshifts Linked to Increased Risk for Ovarian Cancer”

March 17, 2013
By
“Nightshifts Linked to Increased Risk for Ovarian Cancer”

Zosia Chustecka writes: Much of the previous work on the link between cancer and nightshifts has focused on breast cancer . . . The latest report, focusing on ovarian cancer, was published in the April issue of Occupational and Environmental Medicine. This increase in the risk for ovarian cancer with nightshift work is consistent with, [...]

Read more »


Subscribe

Email:

  Subscribe