Blog Archives

Three Quick and Simple Data Cleaning Helper Functions (December 2013)

December 6, 2013
By

As I go about cleaning and merging data sets with R I often end up creating and using simple functions over and over. When this happens, I stick them in the DataCombine package. This makes it easier for me to remember how to do an operation and others ...

Read more »

Showing results from Cox Proportional Hazard Models in R with simPH

September 2, 2013
By
Showing results from Cox Proportional Hazard Models in R with simPH

Update 2 February 2014: A new version of simPH (Version 1.0) will soon be available for download from CRAN. It allows you to plot using points, ribbons, and (new) lines. See the updated package description paper for examples. Note that the ribbons argu...

Read more »

GitHub renders CSV in the browser, becomes even better for social data set creation

August 23, 2013
By
GitHub renders CSV in the browser, becomes even better for social data set creation

I've written in a number of places about how GitHub can be a great place to store data. Unlike basically all other web data storage sites (many of which I really like such as Dataverse and FigShare) GitHub enables deep social data set development and f...

Read more »

Getting Started with Reproducible Research: A chapter from my new book

July 15, 2013
By
Getting Started with Reproducible Research: A chapter from my new book

This is an abridged excerpt from Chapter 2 of my new book Reproducible Research with R and RStudio. It's published by Chapman & Hall/CRC Press. You can purchase it on Amazon. "Search inside this book" includes a complete table of contents. Researc...

Read more »

Quick and Simple D3 Network Graphs from R

June 9, 2013
By
Quick and Simple D3 Network Graphs from R

Sometimes I just want to quickly make a simple D3 JavaScript directed network graph with data in R. Because D3 network graphs can be manipulated in the browser–i.e. nodes can be moved around and highlighted–they're really nice for data...

Read more »

Slide: one function for lag/lead variables in data frames, including time-series cross-sectional data

May 21, 2013
By

I often want to quickly create a lag or lead variable in an R data frame. Sometimes I also want to create the lag or lead variable for different groups in a data frame, for example, if I want to lag GDP for each country in a data frame. I've found ...

Read more »

Reinhart & Rogoff: Everyone makes coding mistakes, we need to make it easy to find them + Graphing uncertainty

April 17, 2013
By
Reinhart & Rogoff: Everyone makes coding mistakes, we need to make it easy to find them + Graphing uncertainty

You may have already seen a lot written on the replication of Reinhart & Rogoff’s (R &amp R) much cited 2010 paper done by Herndon, Ash, and Pollin. If you haven’t, here is a round up of some of some of what has been written: Konczal, Y...

Read more »

Dropbox & R Data

April 11, 2013
By

I'm always looking for ways to download data from the internet into R. Though I prefer to host and access plain-text data sets (CSV is my personal favourite) from GitHub (see my short paper on the topic) sometimes it's convenient to get data st...

Read more »

FillIn: a function for filling in missing data in one data frame with info from another

February 15, 2013
By

Update (10 March 2013): FillIn is now part of the budding DataCombine package. Sometimes I want to use R to fill in values that are missing in one data frame with values from another. For example, I have data from the World Bank on government deficits...

Read more »

InstallOldPackages: a repmis command for installing old R package versions

February 3, 2013
By

A big problem in reproducible research is that software changes. The code you used to do a piece of research may depend on a specific version of software that has since been changed. This is an annoying problem in R because install.packages only instal...

Read more »


Subscribe

Email:

  Subscribe