save() vs saveRDS()

(This article was originally published at Yihui's Blog on Yihui Xie | 谢益辉, and syndicated at StatsBlogs.)

So Jenny finally decided to write a blog post about why she would set your computer on fire, which was great. Twitter is an inferior tool for discussions or Q&A. Sadly, most people would still stick to Twitter for everything. I saw Scott Gigante asked Jenny a great question on Twitter:

Why do you prefer saveRDS() to save()?

From the replies, Simon Coulombe cited Gavin Simpson’s blog post in 2012 (Yes! Write blog posts!!), which was clearly written, but I think it missed one important thing, which was later pointed out by Thomas Leeper in the same Twitter thread.

load() can overwrite objects, silently. readRDS() cannot.

That is the greatest advantage of saveRDS() over save(), and explains why I almost always use the former. To put it short:

  • save() saves the objects and their names together in the same file; saveRDS() only saves the value of a single object (its name is dropped).

  • load() loads the file saved by save(), and creates the objects with the saved names silently (if you happen to have objects in your current environment with the same names, these objects will be overridden); readRDS() only loads the value, and you have to assign the value to a variable.

The combination save() + load() can be dangerous. You may destroy your existing objects without knowing it. readRDS() is more explicit, and safer.

Some may argue that save() has the advantage of saving multiple objects. I don’t think this advantage is worth it, when you consider its possibly destructive consequence. If you must save multiple objects, what I’d do is to combine them into a list(), and save the single list with saveRDS(). Later I can retrieve the list, and explicitly extract the elements I need.

Please comment on the article here: Yihui's Blog on Yihui Xie | 谢益辉