A statistical problem with “nothing to hide”

June 10, 2013
By

One problem with the nothing-to-hide argument is that it assumes innocent people will be exonerated certainly and effortlessly. That is, it assumes that there are no errors, or if there are, they are resolved quickly and easily. Suppose the probability of a correctly analyzing an email or phone call is not 100% but 99.99%. In […]

I don’t think we get much out of framing politics as the Tragic Vision vs. the Utopian Vision

June 10, 2013
By

Ole Rogeberg writes: Recently read your blogpost on Pinker’s views regarding red and blue states. This might help you see where he’s coming from: The “conflict of visions” thing that Pinker repeats to likely refers to Thomas Sowell’s work in the books “Conflict of Visions” and “Visions of the anointed.” The “Conflict of visions” book is [...]The post I don’t think we get much out of framing politics as the Tragic…

R: Measure of Relative Variability

June 10, 2013
By

The measure of relative variability is the coefficient of variation (CV). Unlike measures of absolute variability, the CV is unitless when it comes to comparisons between the dispersions of two distributions of different units of measurement. In R, CV ...

Once more, superimposing time series creates silly theories

June 10, 2013
By

After I wrote the post about superimposing two time series to generate fake correlations, there was a lively discussion in the comments about whether a scatter plot would have done better. Here is the promised follow-up post. The contentious issue...

Introduction to stable distributions for finance

June 10, 2013
By

A few basics about the stable distribution. Previously “The distribution of financial returns made simple” satirized ideas about the statistical distribution of returns, including the stable distribution. Origin As “A tale of two returns” points out, the log return of a long period of time is the sum of the log returns of the shorter … Continue reading →

Visually comparing different data distributions: The spread plot

June 10, 2013
By

Suppose that you have several data distributions that you want to compare. Questions you might ask include "Which variable has the largest spread?" and "Which variables exhibit skewness?" More generally, you might be interested in visualizing how the distribution of one variable differs from the distribution of other variables. The [...]

R: Measures of Absolute Variability

June 10, 2013
By

Measures of absolute variability deal with the dispersion of the data points. This include the following:Range - rangeInterquartile Range - IQRQuartile DeviationAverage DeviationStandard Deviation - sdThese measures of variability restrict to uniform u...

You Do Not Need to Tell Me I Have A Typo in My Documentation

June 10, 2013
By

So I just got yet yet another comment saying "you have a typo in your documentation". While I do appreciate these kind reminders, I think it might be a good exercise for those who want to try GIT and Github pull requests, which make it possible for y...

Frontiers of Science update

June 10, 2013
By

This is just a local Columbia thing, so I’m posting Sunday night when nobody will read it . . . Samantha Cooney reports in the Spectator (Columbia’s student newspaper): Frontiers of Science may be in for an overhaul. After a year reviewing the course, the Educational Policy and Planning Committee has issued a report detailing [...]The post Frontiers of Science update appeared first on Statistical Modeling, Causal Inference, and Social…

The flipped classroom

June 9, 2013
By

Back in the mid1980s I was a trainee teacher at a high school in Rotorua. My associate teacher commented that she didn’t like to give homework much of the time as the students tended to practise things wrong, thus entrenching … Continue reading →

Exploratory Data Analysis: Kernel Density Estimation – Conceptual Foundations

For the sake of brevity, this post has been created from the first half of a previous long post on kernel density estimation.  This first half focuses on the conceptual foundations of kernel density estimation.  The second half will focus on constructing kernel density plots and rug plots in R. Introduction Recently, I began a […]

June 9, 2013
By
$The Value of Adding Randomness$

In computer science it is common to use randomized algorithms. The same is true in statistics: there are many ways that adding randomness can make things easier. But the way that randomness enters, varies quite a bit in different methods. I thought it might be interesting to collect some specific examples of statistical procedures where […]

“Heterogeneity of variance in experimental studies: A challenge to conventional interpretations”

June 9, 2013
By

Avi sent along this old paper from Bryk and Raudenbush, who write: The presence of heterogeneity of variance across groups indicates that the standard statistical model for treatment effects no longer applies. Specifically, the assumption that treatments add a constant to each subject’s development fails. An alternative model is required to represent how treatment effects [...]The post “Heterogeneity of variance in experimental studies: A challenge to conventional interpretations” appeared first…

R: Quartiles, Deciles, and Percentiles

June 9, 2013
By

The measures of position such as quartiles, deciles, and percentiles are available in quantile function. This function has a usage,where:x - the data pointsprob - the location to measurena.rm - if FALSE, NA (Not Available) data points are not ignoredna...

Quick and Simple D3 Network Graphs from R

June 9, 2013
By

Sometimes I just want to quickly make a simple D3 JavaScript directed network graph with data in R. Because D3 network graphs can be manipulated in the browser–i.e. nodes can be moved around and highlighted–they're really nice for data...

R: Mean and Median

June 9, 2013
By

Mean in R is computed using the function mean. Consider the scores of 20 MSU-IIT students in Stat 101 exam with a hundred items: 70, 78, 66, 65, 50, 53, 48, 88, 95, 80, 85, 84, 81, 63, 68, 73, 75, 84, 49, and 77. Compute and interpret the mean and medi...

New Judea Pearl journal of causal inference

June 8, 2013
By

Pearl reports that his Journal of Causal Inference has just posted its first issue, which contains a mix of theoretical and applied papers. Pearl writes that they welcome submissions on all aspects of causal inference. The post New Judea Pearl journa...

Richard Gill: “Integrity or fraud… or just quesionable research practices?”

June 8, 2013
By

Professor Richard Gill Statistics Group Mathematical Institute Leiden University http://www.math.leidenuniv.nl/~gill/ I am very grateful to Richard Gill for permission to post an e-mail from him (after my “dirty laundry” post) along with slides from his talk, “Integrity or fraud… or just questionable research practices?” and associated papers. I record my own reflections on the pseudoscientific […]

Using trends in R-squared to measure progress in criminology??

June 8, 2013
By

Torbjørn Skardhamar writes: I am a sociologist/criminologist working at Statistics Norway. As I am not a trained statistician, I find myself sometimes in need to check basic statistical concepts. Recently, I came across an article which I found a bit strange, but I needed to check up on my statistical understanding of a very basic [...]The post Using trends in R-squared to measure progress in criminology?? appeared first on Statistical…

R: Matrix Operations

June 8, 2013
By

Matrix manipulation in R are very useful in Linear Algebra. Below are lists of common yet important functions in dealing operations with matrices:Transpose - tMultiplication - %*%Determinant - detInverse - solve, or ginv of MASS libraryEigenvalues and ...

Hey, I Just Did a Significance Test!

June 8, 2013
By

I’ve seen it happens quite often. The sig test. Somebody simply needs to know the p-value and that one number will provide all of the information about the study that they need to know. The dataset is presented and the client/boss/colleague/etc invariably asks the question “is it significant?” and “what’s the correlation?”. To quote R.A. […]

Robust logistic regression

June 7, 2013
By

Corey Yanofsky writes: In your work, you’ve robustificated logistic regression by having the logit function saturate at, e.g., 0.01 and 0.99, instead of 0 and 1. Do you have any thoughts on a sensible setting for the saturation values? My intuition suggests that it has something to do with proportion of outliers expected in the [...]The post Robust logistic regression appeared first on Statistical Modeling, Causal Inference, and Social Science.

ENCODE ChIP-Seq Significance Tool: Which TFs Regulate my Genes?

June 7, 2013
By

I collaborate with several investigators on gene expression projects using both microarray and RNA-seq. After I show a collaborator which genes are dysregulated in a particular condition or tissue, the most common question I get is "what are the transc...