From Whale Calls to Dark Matter: Competitive Data Science with R and Python

July 12, 2013
By
From Whale Calls to Dark Matter: Competitive Data Science with R and Python

Back in June I gave a fun talk at Montreal Python on some of my dabbling in the competitive data science scene. The good people at Savior-fair Linux recorded the talk and have edited it all together into a pretty slick video. If you can spare twenty-minutes or so, have a look. If you want […]

Read more »

“A tangle of unexamined emotional impulses and illogical responses”

July 12, 2013
By

Tyler Cowen posts the following note from a taxi driver: I learned very early on to never drive someone to their destination if it was a route they drove themselves, say to their home from the airport . . . Everyone prides themselves on driving the shortest route but they rarely do. . . . […]The post “A tangle of unexamined emotional impulses and illogical responses” appeared first on Statistical…

Read more »

Course Materials from useR! 2013 R/Bioconductor for Analyzing High-Throughput Genomic Data

July 12, 2013
By
Course Materials from useR! 2013 R/Bioconductor for Analyzing High-Throughput Genomic Data

At last week's 2013 useR! conference in Albacete, Spain, Martin Morgan and Marc Carlson led a course on using R/Bioconductor for analyzing next-gen sequencing data, covering alignment, RNA-seq, ChIP-seq, and sequence annotation using R. The course mate...

Read more »

Path storage in the particle filter

July 12, 2013
By
Path storage in the particle filter

Hey particle lovers, With Lawrence Murray and Sylvain Rubenthaler we looked at how to store the paths in the particle filter, and the related expected memory cost. We just arXived a technical report about it. Would you like to know more? Consider a particle filter with particles. At each step of the algorithm, positive weights […]

Read more »

Longer-history back-tests

July 12, 2013
By
Longer-history back-tests

One of the important steps of evaluating new trading idea or strategy is to see how it behaved historically (i.e. create back-test and examine the equity curve in different economic and market conditions) However, creating a long back-test is usually problematic because most ETFs do not have a long price history. One way to alleviate […]

Read more »

Is Particle Physics Bad Science? (memory lane)

July 11, 2013
By
Is Particle Physics Bad Science? (memory lane)

Memory Lane: reblog July 11, 2012 (+ updates at the end).  I suppose[ed] this was somewhat of a joke from the ISBA, prompted by Dennis Lindley, but as I [now] accord the actual extent of jokiness to be only ~10%, I’m sharing it on the blog [i].  Lindley (according to O’Hagan) wonders why scientists require […]

Read more »

The Geiger Counter problem

July 11, 2013
By
The Geiger Counter problem

I am supposed to turn in the manuscript for Think Bayes next week, but I couldn't resist adding a new chapter.  I was adding a new exercise, based on an example from Tom Campbell-Ricketts, author of the Maximum Entropy blog. He got the idea from E...

Read more »

Yes, worry about generalizing from data to population. But multilevel modeling is the solution, not the problem

July 11, 2013
By
Yes, worry about generalizing from data to population.  But multilevel modeling is the solution, not the problem

A sociologist writes in: Samuel Lucas has just published a paper in Quality and Quantity arguing that anything less than a full probability sample of higher levels in HLMs yields biased and unusable results. If I follow him correctly, he is arguing that not only are the SEs too small, but the parameter estimates themselves […]The post Yes, worry about generalizing from data to population. But multilevel modeling is the…

Read more »

Climate change and duelling charts

July 11, 2013
By
Climate change and duelling charts

Abhinav asks me to check out his blog post on a chart on global warming (I prefer the term climate change) featured on Wonkblog. The chart is sourced to a report by the World Metereological Association (link to PDF). Hello,...

Read more »

Testing for Interaction in Logit Models

July 11, 2013
By
Testing for Interaction in Logit Models

Andrew Gelman recently posted about testing for interaction in logistic regression models. This is something I've read and thought a little about, so I'm linking to several articles on the topic and offering my quick take. The Debate in Political Science As far as I can tell, the debate started in political science when Wolfinger […]

Read more »

Don’t trust the Turk

July 10, 2013
By
Don’t trust the Turk

Dan Kahan gives a bunch of reasons not to trust Mechanical Turk in psychology experiments, in particular when studying “hypotheses about cognition and political conflict over societal risks and other policy-relevant facts.” The post Don&#8...

Read more »

Asking good questions

July 10, 2013
By
Asking good questions

I’m currently attending my third conference in three weeks. So I’ve heard a lot of talks, and I’ve heard a lot of questions asked after the talks. In this guest post, Eran Raviv reflects on what makes a good question after a talk. A few weeks back I attended the excellent ISF conference. In one of the sessions, the presenter was talking about a state-of-the-art method to prevent model overfitting,…

Read more »

Visualizing a tiny slice of India’s demographics with information from Wikipedia

July 10, 2013
By
Visualizing a tiny slice of India’s demographics with information from Wikipedia

UPDATE: THE BLOG/SITE HAS MOVED TO GITHUB. THE NEW LINK FOR THE BLOG/SITE IS patilv.github.io and THE LINK TO THIS POST IS:http://bit.ly/1ib8wTl .  PLEASE UPDATE ANY BOOKMARKS YOU MAY HAVE.This post presents a tiny slice of a complex and...

Read more »

Startup Universe: Connecting Startup Companies, Founders and Investors

July 10, 2013
By
Startup Universe: Connecting Startup Companies, Founders and Investors

Start Up Universe [visual.ly], developed by information design agency Accurat and graphic designer Ben Willers for visualization community aggregator Visually, provides a comprehensive view of the relationships between startup companies and their foun...

Read more »

Watch Dogs: Mapping all Publicly Available Data of a World City

July 10, 2013
By
Watch Dogs: Mapping all Publicly Available Data of a World City

Watch Dogs - We Are Data [watchdogs.com], developed by the French global video game publisher Ubisoft, is the first website to gather publicly available and real-time data about Paris, London and Berlin in a single interface. More specifically, all ...

Read more »

Please send all comments to /dev/ripley

July 10, 2013
By
Please send all comments to /dev/ripley

Trey Causey asks, Has R-help gotten meaner over time?: I began by using Scrapy to download all the e-mails sent to R-help between April 1997 (the earliest available archive) and December 2012. . . . We each read 500 messages and coded them in the following categories: -2 Negative and unhelpful -1 Negative but helpful […]The post Please send all comments to /dev/ripley appeared first on Statistical Modeling, Causal Inference,…

Read more »

What are the iconic data graphs of the past 10 years?

July 10, 2013
By

This article in the New York Times about the supposed death of photography got me thinking about statistics. Apparently, the death of photography has been around the corner for some time now: For years, photographers have been bracing for this … Continue reading →

Read more »

Six reasons you should stop using the RANUNI function to generate random numbers

July 10, 2013
By
Six reasons you should stop using the RANUNI function to generate random numbers

Are you still using the old RANUNI, RANNOR, RANBIN, and other "RANXXX" functions to generate random numbers in SAS? If so, here are six reasons why you should switch from these older (1970s) algorithms to the newer (late 1990s) Mersenne-Twister algorithm, which is implemented in the RAND function. The newer [...]

Read more »

Flotsam 13: early July links

July 10, 2013
By

Man flu kept me at home today, so I decided to do something ‘useful’ and go for a linkathon: Ed Yong discusses the effect of subject expectations in psychology experiments Nice Results, But What Did You Expect? At the beginning there was another article on The placebo phenomenon, and another one on The placebo defect. […]

Read more »

“Frontiers in Massive Data Analysis”

July 10, 2013
By

Mike Jordan sends along this National Academies report on “big data.” This is not a research report but it could be interesting in that it conveys what are believed to be important technical challenges. The post “Frontiers in Massive...

Read more »

Predicting season records for NFL teams – part 2

July 9, 2013
By
Predicting season records for NFL teams – part 2

This is the second, technical, part of this series. See the first part for the overview. Introduction This post will introduce the technical details behind the nfl season record prediction that was introduced in part one. After selecting the error metric and defining an acceptable baseline, which was setup in part one, the next step is to develop a plan of attack. In order to create and develop this plan,…

Read more »

Exploratory Data Analysis: Conceptual Foundations of Histograms – Illustrated with New York’s Ozone Pollution Data

Exploratory Data Analysis: Conceptual Foundations of Histograms – Illustrated with New York’s Ozone Pollution Data

Introduction Continuing my recent series on exploratory data analysis (EDA), today’s post focuses on histograms, which are very useful plots for visualizing the distribution of a data set.  I will discuss how histograms are constructed and use histograms to assess the distribution of the “Ozone” data from the built-in “airquality” data set in R.  In […]

Read more »

Predicting season records for NFL teams – overview

July 9, 2013
By
Predicting season records for NFL teams – overview

This is the first, non-technical, part of this series. See the second part for more detail. Introduction I was recently looking for a good machine learning task to try out, and I thought that doing something NFL-related would be interesting, because the NFL season is about to start (finally!). Why was I looking for a good machine learning task to try out? I have mostly done my data analysis work…

Read more »


Subscribe

Email:

  Subscribe