Blog Archives

Some thoughts on the downsides of current Data Science practice

April 19, 2017
By
Some thoughts on the downsides of current Data Science practice

Bert Huang has a nice blog talking about poor results of ML/AI algorithms in “wild” data, which echos some of my experience and thoughts. His conclusions are worth thinking about, IMO. 1. Big data is complex data. As we go out and collect more data from a finite world, we’re necessarily going to start collecting […]

Read more »

pandas “transform” using the tidyverse

April 12, 2017
By
pandas “transform” using the tidyverse

Chris Moffit has a nice blog on how to use the transform function in pandas. He provides some (fake) data on sales and asks the question of what fraction of each order is from each SKU. Being a R nut and a tidyverse fan, I thought to compare and contrast the code for the pandas […]

Read more »

pandas “transform” using the tidyverse

April 12, 2017
By
pandas “transform” using the tidyverse

Chris Moffit has a nice blog on how to use the transform function in pandas. He provides some (fake) data on sales and asks the question of what fraction of each order is from each SKU. Being a R nut and a tidyverse fan, I thought to compare and contrast the code for the pandas […]

Read more »

Changing names in the tidyverse: An example for many regressions

March 9, 2017
By
Changing names in the tidyverse: An example for many regressions

A collaborator posed an interesting R question to me today. She wanted to do several regressions using different outcomes, with models being computed on different strata defined by a combination of experimental design variables. She then just wanted to extract the p-values for the slopes for each of the models, and then filter the strata […]

Read more »

Copying tables from R to Outlook

February 28, 2017
By
Copying tables from R to Outlook

I work in an ecosystem that uses Outlook for e-mail. When I have to communicate results with collaborators one of the most frequent tasks I face is to take a tabular output in R (either a summary table or some sort of tabular output) and send it to collaborators in Outlook. One method is certainly […]

Read more »

A (much belated) update to plotting Kaplan-Meier curves in the tidyverse

February 28, 2017
By
A (much belated) update to plotting Kaplan-Meier curves in the tidyverse

One of the most popular posts on this blog has been my attempt to create Kaplan-Meier plots with an aligned table of persons-at-risk below it under the ggplot paradigm. That post was last updated 3 years ago. In the interim, Chris Dardis has built upon these attempts to create a much more stable and feature-rich […]

Read more »

A quick exploration of the ReporteRs package

October 28, 2016
By
A quick exploration of the ReporteRs package

The package ReporteRs has been getting some play on the interwebs this week, though it’s actually been around for a while. The nice thing about this package is that it allows writing Word and PowerPoint documents in an OS-independent fashion unlike some earlier packages. It also allows the editing of documents by using bookmarks within […]

Read more »

Annotated Facets with ggplot2

October 20, 2016
By
Annotated Facets with ggplot2

I was recently asked to do a panel of grouped boxplots of a continuous variable, with each panel representing a categorical grouping variable. This seems easy enough with ggplot2 and the facet_wrap function, but then my collaborator wanted p-values on the graphs! This post is my approach to the problem. First of all, one caveat. I’m a […]

Read more »

A follow-up to Crowdsourcing Research

February 11, 2016
By
A follow-up to Crowdsourcing Research

Last month I published some thoughts on crowdsourcing research, inspired by Anthony Goldbloom’s talk at Statistical Programming DC on the Kaggle experience. Today, I found a rather similar discussion  on crowdsourcing research (on the online version of the magazine Good) as a potential way to increase the accuracy of scientific research and reducing bias. I think more consideration needs to […]

Read more »

Crowdsourcing research

January 16, 2016
By
Crowdsourcing research

Last evening, Anthony Goldbloom, the founder of Kaggle.com, gave a very nice talk at a joint Statistical Programming DC/Data Science DC event about the Kaggle experience and what can be learned from the results of their competitions. One of the take away messages was that crowdsourcing data problems to a diligent and motivated group of entrepreneurial data […]

Read more »


Subscribe

Email:

  Subscribe