Posts Tagged ‘ Tutorials ’

Neglected optimization topic: set diversity

February 8, 2016
By
Neglected optimization topic: set diversity

The mathematical concept of set diversity is a somewhat neglected topic in current applied decision sciences and optimization. We take this opportunity to discuss the issue. The problem Consider the following problem: for a number of items U = {x_1, … x_n} pick a small set of them X = {x_i1, x_i2, ..., x_ik} such … Continue reading Neglected optimization topic: set diversity

Read more »

Prepping Data for Analysis using R

January 20, 2016
By
Prepping Data for Analysis using R

Nina and I are proud to share our lecture: “Prepping Data for Analysis using R” from ODSC West 2015. Nina Zumel and John Mount ODSC WEST 2015 It is about 90 minutes, and covers a lot of the theory behind the vtreat data preparation library. We also have a Github repository including all the lecture … Continue reading Prepping Data for Analysis using R

Read more »

Using Excel versus using R

January 15, 2016
By

Here is a video I made showing how R should not be considered “scarier” than Excel to analysts. One of the takeaway points: it is easier to email R procedures than Excel procedures. Win-Vector’s John Mount shows a simple analysis both in Excel and in R. A save of the “email” linking to all code … Continue reading Using Excel versus using R

Read more »

Tutorial: RNA-seq differential expression & pathway analysis with Sailfish, DESeq2, GAGE, and Pathview

December 4, 2015
By

BackgroundThis tutorial shows an example of RNA-seq data analysis with DESeq2, followed by KEGG pathway analysis using GAGE. Using data from GSE37704, with processed data available on Figshare DOI: 10.6084/m9.figshare.1601975. This dataset has six samp...

Read more »

Fluid use of data

November 19, 2015
By
Fluid use of data

Nina Zumel and I recently wrote a few article and series on best practices in testing models and data: Random Test/Train Split is not Always Enough How Do You Know if Your Data Has Signal? How do you know if your model is going to work? A Simpler Explanation of Differential Privacy (explaining the reusable … Continue reading Fluid use of data

Read more »

Upcoming Win-Vector Appearances

November 9, 2015
By

We have two public appearances coming up in the next few weeks: Workshop at ODSC, San Francisco – November 14 Both of us will be giving a two-hour workshop called Preparing Data for Analysis using R: Basic through Advanced Techniques. We will cover key issues in this important but often neglected aspect of data science, … Continue reading Upcoming Win-Vector Appearances

Read more »

Don’t use stats::aggregate()

October 31, 2015
By

When working with an analysis system (such as R) there are usually good reasons to prefer using functions from the “base” system over using functions from extension packages. However, base functions are sometimes locked into unfortunate design compromises that can now be avoided. In R’s case I would say: do not use stats::aggregate(). Read on … Continue reading Don’t use stats::aggregate()

Read more »

Baking priors

October 13, 2015
By
Baking priors

There remains a bit of a two-way snobbery that Frequentist statistics is what we teach (as so-called objective statistics remain the same no matter who works with them) and Bayesian statistics is what we do (as it tends to directly estimate posterior probabilities we are actually interested in). Nina Zumel hit the nail on the … Continue reading Baking priors

Read more »

Using differential privacy to reuse training data

October 5, 2015
By
Using differential privacy to reuse training data

Win-Vector LLC‘s Nina Zumel wrote a great article explaining differential privacy and demonstrating how to use it to enhance forward step-wise logistic regression (essentially reusing test data). This allowed her to reproduce results similar to the recent Science paper “The reusable holdout: Preserving validity in adaptive data analysis”. The technique essentially protects and reuses test … Continue reading Using differential privacy to reuse training data

Read more »

A Simpler Explanation of Differential Privacy

October 2, 2015
By
A Simpler Explanation of Differential Privacy

Differential privacy was originally developed to facilitate secure analysis over sensitive data, with mixed success. It’s back in the news again now, with exciting results from Cynthia Dwork, et. al. (see references at the end of the article) that apply results from differential privacy to machine learning. In this article we’ll work through the definition … Continue reading A Simpler Explanation of Differential Privacy

Read more »


Subscribe

Email:

  Subscribe