# Methods changed – Information changed

January 22, 2013
By

The joint OECD – WTO Trade in Value-Added Initiative breaks with conventional measurements of  trade, which record gross flows of goods and services each time they cross borders. It seeks instead to analyse the value added by a country in the production of any good or service that is then exported, and offers a fuller picture of commercial relations between nations.

The new methodology and its results are visually explained. Video and an interactive presentation give beautifully made insights.

Filed under: 01 New on the Web, 09 Stat.Office / Organization, OECD Tagged: exports, methods

Tags: , , , ,

 Tweet

## Subscribe

Email:

Update (2017/02/06): It is much easier to fix typos on my web pages now. There is a toolbar under the title of each post, and please hit the Edit button if you find any mistakes on the page to propose a correction through Github. You don’t need to use any command-line tools. You should not follow the instructions in the post below any more (the knitr documentation has been moved to a different repo).

So I just got yet yet another comment saying “you have a typo in your documentation”. While I do appreciate these kind reminders, I think it might be a good exercise for those who want to try GIT and Github pull requests, which make it possible for you to contribute to open source and fix obvious problems with no questions being asked – just do it yourself, and send the changes to the original author(s) through Github.

The official documentation for Github pull requests is a little bit verbose for beginners. Basically what you need to do for simple tasks are:

1. click the Fork button and clone the repository in your own account;
2. make the changes in your cloned version;
4. click the Pull Request button to send a request to the original author;

For trivial changes, sometimes I accept them on my cell phone while I’m still in bed. No extra communication is needed.

Occasionally I see reports of this kind of trivial documentation changes in the R-devel mailing list, and I believe that is just horribly inefficient. You could have done this quietly and quickly, and the developers could have merged the changes with a single mouse click. (Oh, okay, well, you know, SVN, mailing lists, …)

For the knitr repository, it has two branches: master and gh-pages. The R package lives in the master branch, and the knitr website lives in the gh-pages branch. If you want to fix any problems in the website, just check out the gh-pages:

git checkout gh-pages

All pages were written in Markdown, so edit them with your favorite text editor. For example, as the above comment pointed out, I omitted a right parenthesis ) in _posts/2012-02-24-sweave.md, and you just add it, save the file, write a GIT commit message, push to your repository and send the pull request.

I know I can do this by myself in five seconds, and it takes me way more time to write this blog post, but I just want everybody to know how people with different skill levels can play their roles in software development.

Let’s see how many minutes it takes for the pull request to come after I publish this blog post. Hurry!! :)

[post_title] => You Do Not Need to Tell Me I Have A Typo in My Documentation [post_excerpt] =>

Update (2017/02/06): It is much easier to fix typos on my web pages now. There is a toolbar under the title of each post, and please hit the Edit button if you find any mistakes on the page to propose a correction through Github. You don’t need to use any command-line tools. You should not follow the instructions in the post below any more (the knitr documentation has been moved to a different repo).

So I just got yet yet another comment saying “you have a typo in your documentation”. While I do appreciate these kind reminders, I think it might be a good exercise for those who want to try GIT and Github pull requests, which make it possible for you to contribute to open source and fix obvious problems with no questions being asked – just do it yourself, and send the changes to the original author(s) through Github.

The official documentation for Github pull requests is a little bit verbose for beginners. Basically what you need to do for simple tasks are:

1. click the Fork button and clone the repository in your own account;
2. make the changes in your cloned version;
4. click the Pull Request button to send a request to the original author;

For trivial changes, sometimes I accept them on my cell phone while I’m still in bed. No extra communication is needed.

Occasionally I see reports of this kind of trivial documentation changes in the R-devel mailing list, and I believe that is just horribly inefficient. You could have done this quietly and quickly, and the developers could have merged the changes with a single mouse click. (Oh, okay, well, you know, SVN, mailing lists, …)

For the knitr repository, it has two branches: master and gh-pages. The R package lives in the master branch, and the knitr website lives in the gh-pages branch. If you want to fix any problems in the website, just check out the gh-pages:

git checkout gh-pages

All pages were written in Markdown, so edit them with your favorite text editor. For example, as the above comment pointed out, I omitted a right parenthesis ) in _posts/2012-02-24-sweave.md, and you just add it, save the file, write a GIT commit message, push to your repository and send the pull request.

I know I can do this by myself in five seconds, and it takes me way more time to write this blog post, but I just want everybody to know how people with different skill levels can play their roles in software development.

Let’s see how many minutes it takes for the pull request to come after I publish this blog post. Hurry!! :)

[post_status] => publish [comment_status] => closed [ping_status] => closed [post_password] => [post_name] => you-do-not-need-to-tell-me-i-have-a-typo-in-my-documentation-3 [to_ping] => [pinged] => [post_modified] => 2013-06-10 00:00:00 [post_modified_gmt] => 2013-06-10 00:00:00 [post_content_filtered] => [post_parent] => 0 [guid] => https://yihui.name/en/2013/06/fix-typo-in-documentation/ [menu_order] => 0 [post_type] => post [post_mime_type] => [comment_count] => 0 [ancestors] => Array ( ) [filter] => raw ) ) ) [6] => Array ( [file] => /home/blogs/public_html/wp-content/plugins/add-link-to-facebook/add-link-to-facebook-class.php [line] => 1378 [function] => Publish_post [class] => WPAL2Facebook [object] => WPAL2Facebook Object ( [main_file] => /home/blogs/public_html/wp-content/plugins/add-link-to-facebook/add-link-to-facebook.php [debug] => [site_id] => [blog_id] => 1 ) [type] => -> [args] => Array ( [0] => stdClass Object ( [ID] => 36489 [post_author] => 185 [post_date] => 2013-06-10 00:00:00 [post_date_gmt] => 2013-06-10 00:00:00 [post_content] =>

Update (2017/02/06): It is much easier to fix typos on my web pages now. There is a toolbar under the title of each post, and please hit the Edit button if you find any mistakes on the page to propose a correction through Github. You don’t need to use any command-line tools. You should not follow the instructions in the post below any more (the knitr documentation has been moved to a different repo).

So I just got yet yet another comment saying “you have a typo in your documentation”. While I do appreciate these kind reminders, I think it might be a good exercise for those who want to try GIT and Github pull requests, which make it possible for you to contribute to open source and fix obvious problems with no questions being asked – just do it yourself, and send the changes to the original author(s) through Github.

The official documentation for Github pull requests is a little bit verbose for beginners. Basically what you need to do for simple tasks are:

1. click the Fork button and clone the repository in your own account;
2. make the changes in your cloned version;
4. click the Pull Request button to send a request to the original author;

For trivial changes, sometimes I accept them on my cell phone while I’m still in bed. No extra communication is needed.

Occasionally I see reports of this kind of trivial documentation changes in the R-devel mailing list, and I believe that is just horribly inefficient. You could have done this quietly and quickly, and the developers could have merged the changes with a single mouse click. (Oh, okay, well, you know, SVN, mailing lists, …)

For the knitr repository, it has two branches: master and gh-pages. The R package lives in the master branch, and the knitr website lives in the gh-pages branch. If you want to fix any problems in the website, just check out the gh-pages:

git checkout gh-pages

All pages were written in Markdown, so edit them with your favorite text editor. For example, as the above comment pointed out, I omitted a right parenthesis ) in _posts/2012-02-24-sweave.md, and you just add it, save the file, write a GIT commit message, push to your repository and send the pull request.

I know I can do this by myself in five seconds, and it takes me way more time to write this blog post, but I just want everybody to know how people with different skill levels can play their roles in software development.

Let’s see how many minutes it takes for the pull request to come after I publish this blog post. Hurry!! :)

[post_title] => You Do Not Need to Tell Me I Have A Typo in My Documentation [post_excerpt] =>

Update (2017/02/06): It is much easier to fix typos on my web pages now. There is a toolbar under the title of each post, and please hit the Edit button if you find any mistakes on the page to propose a correction through Github. You don’t need to use any command-line tools. You should not follow the instructions in the post below any more (the knitr documentation has been moved to a different repo).

So I just got yet yet another comment saying “you have a typo in your documentation”. While I do appreciate these kind reminders, I think it might be a good exercise for those who want to try GIT and Github pull requests, which make it possible for you to contribute to open source and fix obvious problems with no questions being asked – just do it yourself, and send the changes to the original author(s) through Github.

The official documentation for Github pull requests is a little bit verbose for beginners. Basically what you need to do for simple tasks are:

1. click the Fork button and clone the repository in your own account;
2. make the changes in your cloned version;
4. click the Pull Request button to send a request to the original author;

For trivial changes, sometimes I accept them on my cell phone while I’m still in bed. No extra communication is needed.

Occasionally I see reports of this kind of trivial documentation changes in the R-devel mailing list, and I believe that is just horribly inefficient. You could have done this quietly and quickly, and the developers could have merged the changes with a single mouse click. (Oh, okay, well, you know, SVN, mailing lists, …)

For the knitr repository, it has two branches: master and gh-pages. The R package lives in the master branch, and the knitr website lives in the gh-pages branch. If you want to fix any problems in the website, just check out the gh-pages:

git checkout gh-pages

All pages were written in Markdown, so edit them with your favorite text editor. For example, as the above comment pointed out, I omitted a right parenthesis ) in _posts/2012-02-24-sweave.md, and you just add it, save the file, write a GIT commit message, push to your repository and send the pull request.

I know I can do this by myself in five seconds, and it takes me way more time to write this blog post, but I just want everybody to know how people with different skill levels can play their roles in software development.

Let’s see how many minutes it takes for the pull request to come after I publish this blog post. Hurry!! :)

[post_status] => publish [comment_status] => closed [ping_status] => closed [post_password] => [post_name] => you-do-not-need-to-tell-me-i-have-a-typo-in-my-documentation-3 [to_ping] => [pinged] => [post_modified] => 2013-06-10 00:00:00 [post_modified_gmt] => 2013-06-10 00:00:00 [post_content_filtered] => [post_parent] => 0 [guid] => https://yihui.name/en/2013/06/fix-typo-in-documentation/ [menu_order] => 0 [post_type] => post [post_mime_type] => [comment_count] => 0 [ancestors] => Array ( ) [filter] => raw ) ) ) [7] => Array ( [function] => Transition_post_status [class] => WPAL2Facebook [object] => WPAL2Facebook Object ( [main_file] => /home/blogs/public_html/wp-content/plugins/add-link-to-facebook/add-link-to-facebook.php [debug] => [site_id] => [blog_id] => 1 ) [type] => -> [args] => Array ( [0] => publish [1] => new [2] => stdClass Object ( [ID] => 36489 [post_author] => 185 [post_date] => 2013-06-10 00:00:00 [post_date_gmt] => 2013-06-10 00:00:00 [post_content] =>

Update (2017/02/06): It is much easier to fix typos on my web pages now. There is a toolbar under the title of each post, and please hit the Edit button if you find any mistakes on the page to propose a correction through Github. You don’t need to use any command-line tools. You should not follow the instructions in the post below any more (the knitr documentation has been moved to a different repo).

So I just got yet yet another comment saying “you have a typo in your documentation”. While I do appreciate these kind reminders, I think it might be a good exercise for those who want to try GIT and Github pull requests, which make it possible for you to contribute to open source and fix obvious problems with no questions being asked – just do it yourself, and send the changes to the original author(s) through Github.

The official documentation for Github pull requests is a little bit verbose for beginners. Basically what you need to do for simple tasks are:

1. click the Fork button and clone the repository in your own account;
2. make the changes in your cloned version;
4. click the Pull Request button to send a request to the original author;

For trivial changes, sometimes I accept them on my cell phone while I’m still in bed. No extra communication is needed.

Occasionally I see reports of this kind of trivial documentation changes in the R-devel mailing list, and I believe that is just horribly inefficient. You could have done this quietly and quickly, and the developers could have merged the changes with a single mouse click. (Oh, okay, well, you know, SVN, mailing lists, …)

For the knitr repository, it has two branches: master and gh-pages. The R package lives in the master branch, and the knitr website lives in the gh-pages branch. If you want to fix any problems in the website, just check out the gh-pages:

git checkout gh-pages

All pages were written in Markdown, so edit them with your favorite text editor. For example, as the above comment pointed out, I omitted a right parenthesis ) in _posts/2012-02-24-sweave.md, and you just add it, save the file, write a GIT commit message, push to your repository and send the pull request.

I know I can do this by myself in five seconds, and it takes me way more time to write this blog post, but I just want everybody to know how people with different skill levels can play their roles in software development.

Let’s see how many minutes it takes for the pull request to come after I publish this blog post. Hurry!! :)

[post_title] => You Do Not Need to Tell Me I Have A Typo in My Documentation [post_excerpt] =>

Update (2017/02/06): It is much easier to fix typos on my web pages now. There is a toolbar under the title of each post, and please hit the Edit button if you find any mistakes on the page to propose a correction through Github. You don’t need to use any command-line tools. You should not follow the instructions in the post below any more (the knitr documentation has been moved to a different repo).

So I just got yet yet another comment saying “you have a typo in your documentation”. While I do appreciate these kind reminders, I think it might be a good exercise for those who want to try GIT and Github pull requests, which make it possible for you to contribute to open source and fix obvious problems with no questions being asked – just do it yourself, and send the changes to the original author(s) through Github.

The official documentation for Github pull requests is a little bit verbose for beginners. Basically what you need to do for simple tasks are:

1. click the Fork button and clone the repository in your own account;
2. make the changes in your cloned version;
4. click the Pull Request button to send a request to the original author;

For trivial changes, sometimes I accept them on my cell phone while I’m still in bed. No extra communication is needed.

Occasionally I see reports of this kind of trivial documentation changes in the R-devel mailing list, and I believe that is just horribly inefficient. You could have done this quietly and quickly, and the developers could have merged the changes with a single mouse click. (Oh, okay, well, you know, SVN, mailing lists, …)

For the knitr repository, it has two branches: master and gh-pages. The R package lives in the master branch, and the knitr website lives in the gh-pages branch. If you want to fix any problems in the website, just check out the gh-pages:

git checkout gh-pages

All pages were written in Markdown, so edit them with your favorite text editor. For example, as the above comment pointed out, I omitted a right parenthesis ) in _posts/2012-02-24-sweave.md, and you just add it, save the file, write a GIT commit message, push to your repository and send the pull request.

I know I can do this by myself in five seconds, and it takes me way more time to write this blog post, but I just want everybody to know how people with different skill levels can play their roles in software development.

Let’s see how many minutes it takes for the pull request to come after I publish this blog post. Hurry!! :)

[post_status] => publish [comment_status] => closed [ping_status] => closed [post_password] => [post_name] => you-do-not-need-to-tell-me-i-have-a-typo-in-my-documentation-3 [to_ping] => [pinged] => [post_modified] => 2013-06-10 00:00:00 [post_modified_gmt] => 2013-06-10 00:00:00 [post_content_filtered] => [post_parent] => 0 [guid] => https://yihui.name/en/2013/06/fix-typo-in-documentation/ [menu_order] => 0 [post_type] => post [post_mime_type] => [comment_count] => 0 [ancestors] => Array ( ) [filter] => raw ) ) ) [8] => Array ( [file] => /home/blogs/public_html/wp-includes/plugin.php [line] => 403 [function] => call_user_func_array [args] => Array ( [0] => Array ( [0] => WPAL2Facebook Object ( [main_file] => /home/blogs/public_html/wp-content/plugins/add-link-to-facebook/add-link-to-facebook.php [debug] => [site_id] => [blog_id] => 1 ) [1] => Transition_post_status ) [1] => Array ( [0] => publish [1] => new [2] => stdClass Object ( [ID] => 36489 [post_author] => 185 [post_date] => 2013-06-10 00:00:00 [post_date_gmt] => 2013-06-10 00:00:00 [post_content] =>

Update (2017/02/06): It is much easier to fix typos on my web pages now. There is a toolbar under the title of each post, and please hit the Edit button if you find any mistakes on the page to propose a correction through Github. You don’t need to use any command-line tools. You should not follow the instructions in the post below any more (the knitr documentation has been moved to a different repo).

So I just got yet yet another comment saying “you have a typo in your documentation”. While I do appreciate these kind reminders, I think it might be a good exercise for those who want to try GIT and Github pull requests, which make it possible for you to contribute to open source and fix obvious problems with no questions being asked – just do it yourself, and send the changes to the original author(s) through Github.

The official documentation for Github pull requests is a little bit verbose for beginners. Basically what you need to do for simple tasks are:

1. click the Fork button and clone the repository in your own account;
2. make the changes in your cloned version;
4. click the Pull Request button to send a request to the original author;

For trivial changes, sometimes I accept them on my cell phone while I’m still in bed. No extra communication is needed.

Occasionally I see reports of this kind of trivial documentation changes in the R-devel mailing list, and I believe that is just horribly inefficient. You could have done this quietly and quickly, and the developers could have merged the changes with a single mouse click. (Oh, okay, well, you know, SVN, mailing lists, …)

For the knitr repository, it has two branches: master and gh-pages. The R package lives in the master branch, and the knitr website lives in the gh-pages branch. If you want to fix any problems in the website, just check out the gh-pages:

git checkout gh-pages

All pages were written in Markdown, so edit them with your favorite text editor. For example, as the above comment pointed out, I omitted a right parenthesis ) in _posts/2012-02-24-sweave.md, and you just add it, save the file, write a GIT commit message, push to your repository and send the pull request.

I know I can do this by myself in five seconds, and it takes me way more time to write this blog post, but I just want everybody to know how people with different skill levels can play their roles in software development.

Let’s see how many minutes it takes for the pull request to come after I publish this blog post. Hurry!! :)

[post_title] => You Do Not Need to Tell Me I Have A Typo in My Documentation [post_excerpt] =>

Update (2017/02/06): It is much easier to fix typos on my web pages now. There is a toolbar under the title of each post, and please hit the Edit button if you find any mistakes on the page to propose a correction through Github. You don’t need to use any command-line tools. You should not follow the instructions in the post below any more (the knitr documentation has been moved to a different repo).

So I just got yet yet another comment saying “you have a typo in your documentation”. While I do appreciate these kind reminders, I think it might be a good exercise for those who want to try GIT and Github pull requests, which make it possible for you to contribute to open source and fix obvious problems with no questions being asked – just do it yourself, and send the changes to the original author(s) through Github.

The official documentation for Github pull requests is a little bit verbose for beginners. Basically what you need to do for simple tasks are:

1. click the Fork button and clone the repository in your own account;
2. make the changes in your cloned version;
4. click the Pull Request button to send a request to the original author;

For trivial changes, sometimes I accept them on my cell phone while I’m still in bed. No extra communication is needed.

Occasionally I see reports of this kind of trivial documentation changes in the R-devel mailing list, and I believe that is just horribly inefficient. You could have done this quietly and quickly, and the developers could have merged the changes with a single mouse click. (Oh, okay, well, you know, SVN, mailing lists, …)

For the knitr repository, it has two branches: master and gh-pages. The R package lives in the master branch, and the knitr website lives in the gh-pages branch. If you want to fix any problems in the website, just check out the gh-pages:

git checkout gh-pages

All pages were written in Markdown, so edit them with your favorite text editor. For example, as the above comment pointed out, I omitted a right parenthesis ) in _posts/2012-02-24-sweave.md, and you just add it, save the file, write a GIT commit message, push to your repository and send the pull request.

I know I can do this by myself in five seconds, and it takes me way more time to write this blog post, but I just want everybody to know how people with different skill levels can play their roles in software development.

Let’s see how many minutes it takes for the pull request to come after I publish this blog post. Hurry!! :)

[post_status] => publish [comment_status] => closed [ping_status] => closed [post_password] => [post_name] => you-do-not-need-to-tell-me-i-have-a-typo-in-my-documentation-3 [to_ping] => [pinged] => [post_modified] => 2013-06-10 00:00:00 [post_modified_gmt] => 2013-06-10 00:00:00 [post_content_filtered] => [post_parent] => 0 [guid] => https://yihui.name/en/2013/06/fix-typo-in-documentation/ [menu_order] => 0 [post_type] => post [post_mime_type] => [comment_count] => 0 [ancestors] => Array ( ) [filter] => raw ) ) ) ) [9] => Array ( [file] => /home/blogs/public_html/wp-includes/post.php [line] => 3024 [function] => do_action [args] => Array ( [0] => transition_post_status [1] => publish [2] => new [3] => stdClass Object ( [ID] => 36489 [post_author] => 185 [post_date] => 2013-06-10 00:00:00 [post_date_gmt] => 2013-06-10 00:00:00 [post_content] =>

Update (2017/02/06): It is much easier to fix typos on my web pages now. There is a toolbar under the title of each post, and please hit the Edit button if you find any mistakes on the page to propose a correction through Github. You don’t need to use any command-line tools. You should not follow the instructions in the post below any more (the knitr documentation has been moved to a different repo).

So I just got yet yet another comment saying “you have a typo in your documentation”. While I do appreciate these kind reminders, I think it might be a good exercise for those who want to try GIT and Github pull requests, which make it possible for you to contribute to open source and fix obvious problems with no questions being asked – just do it yourself, and send the changes to the original author(s) through Github.

The official documentation for Github pull requests is a little bit verbose for beginners. Basically what you need to do for simple tasks are:

1. click the Fork button and clone the repository in your own account;
2. make the changes in your cloned version;
4. click the Pull Request button to send a request to the original author;

For trivial changes, sometimes I accept them on my cell phone while I’m still in bed. No extra communication is needed.

Occasionally I see reports of this kind of trivial documentation changes in the R-devel mailing list, and I believe that is just horribly inefficient. You could have done this quietly and quickly, and the developers could have merged the changes with a single mouse click. (Oh, okay, well, you know, SVN, mailing lists, …)

For the knitr repository, it has two branches: master and gh-pages. The R package lives in the master branch, and the knitr website lives in the gh-pages branch. If you want to fix any problems in the website, just check out the gh-pages:

git checkout gh-pages

All pages were written in Markdown, so edit them with your favorite text editor. For example, as the above comment pointed out, I omitted a right parenthesis ) in _posts/2012-02-24-sweave.md, and you just add it, save the file, write a GIT commit message, push to your repository and send the pull request.

I know I can do this by myself in five seconds, and it takes me way more time to write this blog post, but I just want everybody to know how people with different skill levels can play their roles in software development.

Let’s see how many minutes it takes for the pull request to come after I publish this blog post. Hurry!! :)

[post_title] => You Do Not Need to Tell Me I Have A Typo in My Documentation [post_excerpt] =>

Update (2017/02/06): It is much easier to fix typos on my web pages now. There is a toolbar under the title of each post, and please hit the Edit button if you find any mistakes on the page to propose a correction through Github. You don’t need to use any command-line tools. You should not follow the instructions in the post below any more (the knitr documentation has been moved to a different repo).

So I just got yet yet another comment saying “you have a typo in your documentation”. While I do appreciate these kind reminders, I think it might be a good exercise for those who want to try GIT and Github pull requests, which make it possible for you to contribute to open source and fix obvious problems with no questions being asked – just do it yourself, and send the changes to the original author(s) through Github.

The official documentation for Github pull requests is a little bit verbose for beginners. Basically what you need to do for simple tasks are:

1. click the Fork button and clone the repository in your own account;
2. make the changes in your cloned version;
4. click the Pull Request button to send a request to the original author;

For trivial changes, sometimes I accept them on my cell phone while I’m still in bed. No extra communication is needed.

Occasionally I see reports of this kind of trivial documentation changes in the R-devel mailing list, and I believe that is just horribly inefficient. You could have done this quietly and quickly, and the developers could have merged the changes with a single mouse click. (Oh, okay, well, you know, SVN, mailing lists, …)

For the knitr repository, it has two branches: master and gh-pages. The R package lives in the master branch, and the knitr website lives in the gh-pages branch. If you want to fix any problems in the website, just check out the gh-pages:

git checkout gh-pages

All pages were written in Markdown, so edit them with your favorite text editor. For example, as the above comment pointed out, I omitted a right parenthesis ) in _posts/2012-02-24-sweave.md, and you just add it, save the file, write a GIT commit message, push to your repository and send the pull request.

I know I can do this by myself in five seconds, and it takes me way more time to write this blog post, but I just want everybody to know how people with different skill levels can play their roles in software development.

Let’s see how many minutes it takes for the pull request to come after I publish this blog post. Hurry!! :)

[post_status] => publish [comment_status] => closed [ping_status] => closed [post_password] => [post_name] => you-do-not-need-to-tell-me-i-have-a-typo-in-my-documentation-3 [to_ping] => [pinged] => [post_modified] => 2013-06-10 00:00:00 [post_modified_gmt] => 2013-06-10 00:00:00 [post_content_filtered] => [post_parent] => 0 [guid] => https://yihui.name/en/2013/06/fix-typo-in-documentation/ [menu_order] => 0 [post_type] => post [post_mime_type] => [comment_count] => 0 [ancestors] => Array ( ) [filter] => raw ) ) ) [10] => Array ( [file] => /home/blogs/public_html/wp-includes/post.php [line] => 2690 [function] => wp_transition_post_status [args] => Array ( [0] => publish [1] => new [2] => stdClass Object ( [ID] => 36489 [post_author] => 185 [post_date] => 2013-06-10 00:00:00 [post_date_gmt] => 2013-06-10 00:00:00 [post_content] =>

Update (2017/02/06): It is much easier to fix typos on my web pages now. There is a toolbar under the title of each post, and please hit the Edit button if you find any mistakes on the page to propose a correction through Github. You don’t need to use any command-line tools. You should not follow the instructions in the post below any more (the knitr documentation has been moved to a different repo).

So I just got yet yet another comment saying “you have a typo in your documentation”. While I do appreciate these kind reminders, I think it might be a good exercise for those who want to try GIT and Github pull requests, which make it possible for you to contribute to open source and fix obvious problems with no questions being asked – just do it yourself, and send the changes to the original author(s) through Github.

The official documentation for Github pull requests is a little bit verbose for beginners. Basically what you need to do for simple tasks are:

1. click the Fork button and clone the repository in your own account;
2. make the changes in your cloned version;
4. click the Pull Request button to send a request to the original author;

For trivial changes, sometimes I accept them on my cell phone while I’m still in bed. No extra communication is needed.

Occasionally I see reports of this kind of trivial documentation changes in the R-devel mailing list, and I believe that is just horribly inefficient. You could have done this quietly and quickly, and the developers could have merged the changes with a single mouse click. (Oh, okay, well, you know, SVN, mailing lists, …)

For the knitr repository, it has two branches: master and gh-pages. The R package lives in the master branch, and the knitr website lives in the gh-pages branch. If you want to fix any problems in the website, just check out the gh-pages:

git checkout gh-pages

All pages were written in Markdown, so edit them with your favorite text editor. For example, as the above comment pointed out, I omitted a right parenthesis ) in _posts/2012-02-24-sweave.md, and you just add it, save the file, write a GIT commit message, push to your repository and send the pull request.

I know I can do this by myself in five seconds, and it takes me way more time to write this blog post, but I just want everybody to know how people with different skill levels can play their roles in software development.

Let’s see how many minutes it takes for the pull request to come after I publish this blog post. Hurry!! :)

[post_title] => You Do Not Need to Tell Me I Have A Typo in My Documentation [post_excerpt] =>

Update (2017/02/06): It is much easier to fix typos on my web pages now. There is a toolbar under the title of each post, and please hit the Edit button if you find any mistakes on the page to propose a correction through Github. You don’t need to use any command-line tools. You should not follow the instructions in the post below any more (the knitr documentation has been moved to a different repo).

So I just got yet yet another comment saying “you have a typo in your documentation”. While I do appreciate these kind reminders, I think it might be a good exercise for those who want to try GIT and Github pull requests, which make it possible for you to contribute to open source and fix obvious problems with no questions being asked – just do it yourself, and send the changes to the original author(s) through Github.

The official documentation for Github pull requests is a little bit verbose for beginners. Basically what you need to do for simple tasks are:

1. click the Fork button and clone the repository in your own account;
2. make the changes in your cloned version;
4. click the Pull Request button to send a request to the original author;

For trivial changes, sometimes I accept them on my cell phone while I’m still in bed. No extra communication is needed.

Occasionally I see reports of this kind of trivial documentation changes in the R-devel mailing list, and I believe that is just horribly inefficient. You could have done this quietly and quickly, and the developers could have merged the changes with a single mouse click. (Oh, okay, well, you know, SVN, mailing lists, …)

For the knitr repository, it has two branches: master and gh-pages. The R package lives in the master branch, and the knitr website lives in the gh-pages branch. If you want to fix any problems in the website, just check out the gh-pages:

git checkout gh-pages

All pages were written in Markdown, so edit them with your favorite text editor. For example, as the above comment pointed out, I omitted a right parenthesis ) in _posts/2012-02-24-sweave.md, and you just add it, save the file, write a GIT commit message, push to your repository and send the pull request.

I know I can do this by myself in five seconds, and it takes me way more time to write this blog post, but I just want everybody to know how people with different skill levels can play their roles in software development.

Let’s see how many minutes it takes for the pull request to come after I publish this blog post. Hurry!! :)

[post_status] => publish [comment_status] => closed [ping_status] => closed [post_password] => [post_name] => you-do-not-need-to-tell-me-i-have-a-typo-in-my-documentation-3 [to_ping] => [pinged] => [post_modified] => 2013-06-10 00:00:00 [post_modified_gmt] => 2013-06-10 00:00:00 [post_content_filtered] => [post_parent] => 0 [guid] => https://yihui.name/en/2013/06/fix-typo-in-documentation/ [menu_order] => 0 [post_type] => post [post_mime_type] => [comment_count] => 0 [ancestors] => Array ( ) [filter] => raw ) ) ) [11] => Array ( [file] => /home/blogs/public_html/wp-content/plugins/feedwordpress/syndicatedpost.class.php [line] => 1605 [function] => wp_insert_post [args] => Array ( [0] => Array ( [post_title] => You Do Not Need to Tell Me I Have A Typo in My Documentation [post_content] =>

Update (2017/02/06): It is much easier to fix typos on my web pages now. There is a toolbar under the title of each post, and please hit the Edit button if you find any mistakes on the page to propose a correction through Github. You don’t need to use any command-line tools. You should not follow the instructions in the post below any more (the knitr documentation has been moved to a different repo).

So I just got yet yet another comment saying “you have a typo in your documentation”. While I do appreciate these kind reminders, I think it might be a good exercise for those who want to try GIT and Github pull requests, which make it possible for you to contribute to open source and fix obvious problems with no questions being asked – just do it yourself, and send the changes to the original author(s) through Github.

The official documentation for Github pull requests is a little bit verbose for beginners. Basically what you need to do for simple tasks are:

1. click the Fork button and clone the repository in your own account;
2. make the changes in your cloned version;
4. click the Pull Request button to send a request to the original author;

For trivial changes, sometimes I accept them on my cell phone while I’m still in bed. No extra communication is needed.

Occasionally I see reports of this kind of trivial documentation changes in the R-devel mailing list, and I believe that is just horribly inefficient. You could have done this quietly and quickly, and the developers could have merged the changes with a single mouse click. (Oh, okay, well, you know, SVN, mailing lists, …)

For the knitr repository, it has two branches: master and gh-pages. The R package lives in the master branch, and the knitr website lives in the gh-pages branch. If you want to fix any problems in the website, just check out the gh-pages:

git checkout gh-pages

All pages were written in Markdown, so edit them with your favorite text editor. For example, as the above comment pointed out, I omitted a right parenthesis ) in _posts/2012-02-24-sweave.md, and you just add it, save the file, write a GIT commit message, push to your repository and send the pull request.

I know I can do this by myself in five seconds, and it takes me way more time to write this blog post, but I just want everybody to know how people with different skill levels can play their roles in software development.

Let’s see how many minutes it takes for the pull request to come after I publish this blog post. Hurry!! :)

[post_excerpt] =>

Update (2017/02/06): It is much easier to fix typos on my web pages now. There is a toolbar under the title of each post, and please hit the Edit button if you find any mistakes on the page to propose a correction through Github. You don’t need to use any command-line tools. You should not follow the instructions in the post below any more (the knitr documentation has been moved to a different repo).

So I just got yet yet another comment saying “you have a typo in your documentation”. While I do appreciate these kind reminders, I think it might be a good exercise for those who want to try GIT and Github pull requests, which make it possible for you to contribute to open source and fix obvious problems with no questions being asked – just do it yourself, and send the changes to the original author(s) through Github.

The official documentation for Github pull requests is a little bit verbose for beginners. Basically what you need to do for simple tasks are:

1. click the Fork button and clone the repository in your own account;
2. make the changes in your cloned version;
4. click the Pull Request button to send a request to the original author;

For trivial changes, sometimes I accept them on my cell phone while I’m still in bed. No extra communication is needed.

Occasionally I see reports of this kind of trivial documentation changes in the R-devel mailing list, and I believe that is just horribly inefficient. You could have done this quietly and quickly, and the developers could have merged the changes with a single mouse click. (Oh, okay, well, you know, SVN, mailing lists, …)

For the knitr repository, it has two branches: master and gh-pages. The R package lives in the master branch, and the knitr website lives in the gh-pages branch. If you want to fix any problems in the website, just check out the gh-pages:

git checkout gh-pages

All pages were written in Markdown, so edit them with your favorite text editor. For example, as the above comment pointed out, I omitted a right parenthesis ) in _posts/2012-02-24-sweave.md, and you just add it, save the file, write a GIT commit message, push to your repository and send the pull request.

I know I can do this by myself in five seconds, and it takes me way more time to write this blog post, but I just want everybody to know how people with different skill levels can play their roles in software development.

Let’s see how many minutes it takes for the pull request to come after I publish this blog post. Hurry!! :)

[post_date_gmt] => 2013-06-10 00:00:00 [post_date] => 2013-06-10 00:00:00 [post_modified_gmt] => 2013-06-10 00:00:00 [post_modified] => 2013-06-10 00:00:00 [post_status] => publish [comment_status] => closed [ping_status] => closed [guid] => https://yihui.name/en/2013/06/fix-typo-in-documentation/ [meta] => Array ( [syndication_source] => English Blog on Yihui Xie | 谢益辉 [syndication_source_uri] => https://yihui.name/en/index.xml [syndication_source_id] => http://yihui.name/en/feed/ [syndication_feed] => http://yihui.name/en/feed/ [syndication_feed_id] => 44 [syndication_permalink] => https://yihui.name/en/2013/06/fix-typo-in-documentation/ [syndication_item_hash] => 6bcaecdc026e9012796a9f6cee5f0e11 ) [post_type] => post [post_author] => 185 [tax_input] => Array ( [post_tag] => Array ( ) [category] => Array ( [0] => 79 ) [post_format] => Array ( ) ) [post_pingback] => ) [1] => 1 ) ) [12] => Array ( [file] => /home/blogs/public_html/wp-content/plugins/feedwordpress/syndicatedpost.class.php [line] => 1517 [function] => insert_post [class] => SyndicatedPost [object] => SyndicatedPost Object ( [item] => Array ( [description] =>

Update (2017/02/06): It is much easier to fix typos on my web pages now. There is a toolbar under the title of each post, and please hit the Edit button if you find any mistakes on the page to propose a correction through Github. You don’t need to use any command-line tools. You should not follow the instructions in the post below any more (the knitr documentation has been moved to a different repo).

So I just got yet yet another comment saying “you have a typo in your documentation”. While I do appreciate these kind reminders, I think it might be a good exercise for those who want to try GIT and Github pull requests, which make it possible for you to contribute to open source and fix obvious problems with no questions being asked – just do it yourself, and send the changes to the original author(s) through Github.

The official documentation for Github pull requests is a little bit verbose for beginners. Basically what you need to do for simple tasks are:

1. click the Fork button and clone the repository in your own account;
2. make the changes in your cloned version;
4. click the Pull Request button to send a request to the original author;

For trivial changes, sometimes I accept them on my cell phone while I’m still in bed. No extra communication is needed.

Occasionally I see reports of this kind of trivial documentation changes in the R-devel mailing list, and I believe that is just horribly inefficient. You could have done this quietly and quickly, and the developers could have merged the changes with a single mouse click. (Oh, okay, well, you know, SVN, mailing lists, …)

For the knitr repository, it has two branches: master and gh-pages. The R package lives in the master branch, and the knitr website lives in the gh-pages branch. If you want to fix any problems in the website, just check out the gh-pages:

git checkout gh-pages

All pages were written in Markdown, so edit them with your favorite text editor. For example, as the above comment pointed out, I omitted a right parenthesis ) in _posts/2012-02-24-sweave.md, and you just add it, save the file, write a GIT commit message, push to your repository and send the pull request.

I know I can do this by myself in five seconds, and it takes me way more time to write this blog post, but I just want everybody to know how people with different skill levels can play their roles in software development.

Let’s see how many minutes it takes for the pull request to come after I publish this blog post. Hurry!! :)

[guid] => https://yihui.name/en/2013/06/fix-typo-in-documentation/ [pubdate] => Mon, 10 Jun 2013 00:00:00 +0000 [link] => https://yihui.name/en/2013/06/fix-typo-in-documentation/ [title] => You Do Not Need to Tell Me I Have A Typo in My Documentation [title#] => 1 [link#] => 1 [pubdate#] => 1 [guid#] => 1 [description#] => 1 [summary#] => 1 [summary] =>

Update (2017/02/06): It is much easier to fix typos on my web pages now. There is a toolbar under the title of each post, and please hit the Edit button if you find any mistakes on the page to propose a correction through Github. You don’t need to use any command-line tools. You should not follow the instructions in the post below any more (the knitr documentation has been moved to a different repo).

So I just got yet yet another comment saying “you have a typo in your documentation”. While I do appreciate these kind reminders, I think it might be a good exercise for those who want to try GIT and Github pull requests, which make it possible for you to contribute to open source and fix obvious problems with no questions being asked – just do it yourself, and send the changes to the original author(s) through Github.

The official documentation for Github pull requests is a little bit verbose for beginners. Basically what you need to do for simple tasks are:

1. click the Fork button and clone the repository in your own account;
2. make the changes in your cloned version;
4. click the Pull Request button to send a request to the original author;

For trivial changes, sometimes I accept them on my cell phone while I’m still in bed. No extra communication is needed.

Occasionally I see reports of this kind of trivial documentation changes in the R-devel mailing list, and I believe that is just horribly inefficient. You could have done this quietly and quickly, and the developers could have merged the changes with a single mouse click. (Oh, okay, well, you know, SVN, mailing lists, …)

For the knitr repository, it has two branches: master and gh-pages. The R package lives in the master branch, and the knitr website lives in the gh-pages branch. If you want to fix any problems in the website, just check out the gh-pages:

git checkout gh-pages

All pages were written in Markdown, so edit them with your favorite text editor. For example, as the above comment pointed out, I omitted a right parenthesis ) in _posts/2012-02-24-sweave.md, and you just add it, save the file, write a GIT commit message, push to your repository and send the pull request.

I know I can do this by myself in five seconds, and it takes me way more time to write this blog post, but I just want everybody to know how people with different skill levels can play their roles in software development.

Let’s see how many minutes it takes for the pull request to come after I publish this blog post. Hurry!! :)

[date_timestamp] => 1370822400 ) [entry] => FeedWordPie_Item Object ( [feed] => FeedWordPie Object ( [subscription] => 44 [data] => Array ( [child] => Array ( [] => Array ( [rss] => Array ( [0] => Array ( [data] => [attribs] => Array ( [] => Array ( [version] => 2.0 ) ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) [child] => Array ( [] => Array ( [channel] => Array ( [0] => Array ( [data] => [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) [child] => Array ( [] => Array ( [title] => Array ( [0] => Array ( [data] => English Blog on Yihui Xie | 谢益辉 [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [link] => Array ( [0] => Array ( [data] => https://yihui.name/en/index.xml [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [description] => Array ( [0] => Array ( [data] => Recent content in English Blog on Yihui Xie | 谢益辉 [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [generator] => Array ( [0] => Array ( [data] => Hugo -- gohugo.io [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [language] => Array ( [0] => Array ( [data] => en-us [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [lastBuildDate] => Array ( [0] => Array ( [data] => Tue, 31 Jan 2017 00:00:00 +0000 [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [item] => Array ( [0] => Array ( [data] => [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) [child] => Array ( [] => Array ( [title] => Array ( [0] => Array ( [data] => Some Facts about Jeff Leek [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [link] => Array ( [0] => Array ( [data] => https://yihui.name/en/2017/04/jeff-leek-facts/ [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [pubDate] => Array ( [0] => Array ( [data] => Fri, 14 Apr 2017 00:00:00 +0000 [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [guid] => Array ( [0] => Array ( [data] => https://yihui.name/en/2017/04/jeff-leek-facts/ [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [description] => Array ( [0] => Array ( [data] =>

Note: What other facts about Jeff Leek do you “know”? Please feel free to click the edit button above and submit a pull request on Github, or tweet with the hashtag #jeffleekfacts.

I have not written blog posts for quite a while. It is not because I don’t have anything to write. On the contrary, I have a huge amount of things that I could have written about, e.g., how I collect and manage GIFs, and some stories behind the publication of the bookdown book. A lot of things have happened since the last time I wrote a post here. I’ll explain them later this year.

Today I started this post only because I love memes and rumors (with no bad intentions), especially those about the guys behind Simply Statistics. These guys are a lot of fun. So Jeff Leek asked on Twitter the other day about an R interface to Alexa Skills (Cc’ed me probably because of my Shiny Voice app).

Anyone know of an R package for interfacing with Alexa Skills? @thosjleeper @xieyihui @drob @JennyBryan @HoloMarkeD ?

— Jeff Leek (@jtleek) April 12, 2017

I don’t know anything about Alexa, but the funny thing was that rumors quickly emerged in the replies:

@DrJWolfson @jtleek @thosjleeper @xieyihui @drob @JennyBryan @HoloMarkeD Jeff Leek can do zero-fold crossvalidation

— Thomas Lumley (@tslumley) April 12, 2017

And I just wanted to collect these unknown “facts” about Jeff A-leek-sa:

• @DrJWolfson: Jeff Leek smooths densities with his bare hands.

• @tslumley: Jeff Leek can do zero-fold crossvalidation.

• @xieyihui: Jeff Leek supports both vector and matrix machines.

• @TrestleJeff: stringsAsFactors defaults to FALSE in Jeff Leek’s presence.

• @rdpeng: Jeff can convert data frames to matrices with his mind.

• @drob: Jeff Leek’s error messages contain the cure for cancer. Unfortunately, he’s never seen one.

• @drob: Any statistic is a sufficient statistic when it’s Jeff Leek using it.

• @seankross: Jeff Leek has no need for the Tidyverse. Any data he touches tidies itself out of a combination of respect and fear.

• @joranelias: All Jeff Leek sequences of random variables converge surely in probability.

• @kennyshirley: Correlation implies whatever Jeff Leek tells it to imply.

• @bcaffo: Jeff Leek counted to infinity. Twice.

• @DrJWolfson: Using only the irrationals.

• @bcaffo: Which reminds me that Jeff Leek can make square root of 2 rational.

• @bcaffo: Jeff Leek can fit a regression line with one point. And get a variance.

• @mjfrigaard0: Jeff Leek once won a Kaggle competition, but was disqualified for using his abacus.

• @mjfrigaard: Git commits to Jeff Leek.

• @mjfrigaard: Jeff Leek can lift the curse of dimensionality by merely glancing at your data.

• @clarkfitzg: All P-values computed by Jeff Leek are significant.

• @jrnld: Singular matrices are so named because Jeff Leek is the only one who can invert them.

• @rikturr: Jeff Leek’s cubic splines are all linear.

• @butterflyology: Jeef Leek supports S3 and S4 classes, at the same time.

• @tpoi: When Jeff Leek uses <-<-, it tests for equality

• @Miao_Cai_SLU: Jeff build regression models without endogeneity.

• @brandenco: CRAN checks itself before it submits to Jeff Leek.

• @BeEngelhardt: When Jeff Leek inverts a matrix, time actually reverses $O(n^3)$.

• @michaelhoffman: Jeff Leek is the Uniformly Most Powerful Jeff.

• @pachamaltese: Jeff Leek always obtains unbiased estimators.

• @rdpeng (a few days later): Ok, who knew that Slack was an acronym?

• @sherrirose: Jeff Leek knew that Slack was an acronym.

• @rdpeng: Nice. I deserved that.

• @EamonCaddigan: Jeff Leek can quickly process several yottabytes of data in memory. HIS memory.

• @butterflyology: If you install Jeff Leek, there are no dependencies. Because Jeff Leek depends on nothing.

• @DrJWolfson: Jeff Leek cannot be replicated, but he is fully reproducible.

• @pdalgd: Jeff Leek can make code work just by waiting for it!

• @TrestleJeff: R Core was founded as a Jeff Leek tribute band.

I’m looking forward to yet more facts.

[attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) ) ) ) [1] => Array ( [data] => [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) [child] => Array ( [] => Array ( [title] => Array ( [0] => Array ( [data] => A Letter of Recommendation for Nan Xiao [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [link] => Array ( [0] => Array ( [data] => https://yihui.name/en/2014/11/lor-nan-xiao/ [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [pubDate] => Array ( [0] => Array ( [data] => Tue, 18 Nov 2014 00:00:00 +0000 [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [guid] => Array ( [0] => Array ( [data] => https://yihui.name/en/2014/11/lor-nan-xiao/ [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [description] => Array ( [0] => Array ( [data] =>

I hope my letter could boost this guy up like:

I’m not sure if I’m a good observer, but time and time again I feel some people are undervalued, or they were not given better opportunities to show their value. Not surprisingly, I know quite a few such people in the Chinese R/stats community, mainly because of the website Capital of Statistics (COS) that I founded a number of years ago.

I believe Nan Xiao is among these undervalued people, which is why I’m writing a public letter of recommendation for him to apply to a stats/biostats/bioinformatics program in the US. You can go to his website http://r2s.name to know more about him, and I’m not going to repeat his information here.

As someone who went through the same application process six years ago, I know it is difficult to get an offer unless you are from a top university in China in the eyes of the admission committee. By “top” (in terms of the major of statistics), it basically means Peking U, Tsinghua, USTC, Fudan, Beijing Normal U, and perhaps one or two other universities. My alma mater was Renmin U (in Beijing), which nobody knows, unfortunately. The statistics program at Renmin actually has got the highest ranking in China this year, and I’m not surprised at all. Perhaps Renmin does not offer the best math training to students in statistics, but I think its program is well balanced between application and theory. In recent years, they have been putting more emphasis on the math training to catch up with the “top” universities. Personally I do not think this is a good idea, but it seems to make the admission committees in the US more comfortable. Anyway, the first driving force of my admission to Iowa State U was probably my work on the animation package, which was also why I was acquainted with my PhD advisors Di and Heike before I applied to Iowa State.

To some degree, I was very fortunate since my research interest, statistical graphics, at that time was not the “mainstream” in statistics (it still is not), and it happened that there were two professors with the same research interest, so it was fairly easy to make the deal. Nan’s interests (machine learning/bioinformatics) are broader than mine, and I think he will face more competition consequently. Given his education background from a university that is not widely known, I’m trying to make him more visible, although my influence might be very limited. I believe he will make better contribution during his PhD training than me, if his potential can be well utilized.

I have known Nan for quite a few years. We have physically met only once during the 6th Chinese R conference last year, but I have been reading his forum posts in COS and blog posts since circa 2008. He is one of the best hackers that I know, with a very good sense of beauty. Apparently, hacking skills are becoming more and more important in this age of data (excuse me, but I do hate saying “big data” when “big” is meaningless). Let me enumerate some of my observations about him:

• He knows well about the web (scraping data, security issues, and so on). To his future advisor/department, this means he could be very helpful if you need to obtain data from the web, and he may be able to improve the department IT support, which often sucks from my experience.
• He is a superb presenter. He has an outstanding presenting style, which you can see from his past talks (it does not matter if you do not understand Chinese). You may underestimate the importance of this, but please recall how much you (or is it just me?) wanted to fall asleep during the Joint Statistical Meetings, when everybody was using the same blue Beamer style, with pages after pages of equations.
• My favorite illustration among his blog posts is this one: http://r2s.name/cn/r/ria.html
• He has deep interests in data visualization, in particular, network visualization. Look at his list of papers on his website! Aren’t those graphs beautiful?
• He has worked on the translation of three books into Chinese with other people. To translate a book, you certainly have to understand it. You probably should not have any doubts on how well he knows R, graphics, and data mining methods.
• I do not formally collaborate with him very often, but you may want to look at the SVD example in his projects. He did it after I said “How about a Shiny app?”. I believe there have been many other SVD examples with the similar idea, but I was still impressed how quickly he made it. If you are familiar with Shiny, you may also be impressed by his taste on design (I love the “Crouching Tiger Hidden Dragon” picture. Looks so cool!).
• I know little about bioinformatics, chemoinformatics, or pharmacology, so I’m not going to comment on these specifics. There is one thing that I’m sure about, though, which is his eagerness for making substantial contributions to science that can eventually benefit some people. I trust his sincereness.

If you are looking for a PhD student in a stats-related program in 2015, please consider this guy. A job in the industry is also a possibility for him, so please also consider him if your company has a position for an awesome hacker. Email me if you have more questions, or forward my blog post to your colleagues/friends who might be interested.

[attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) ) ) ) [2] => Array ( [data] => [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) [child] => Array ( [] => Array ( [title] => Array ( [0] => Array ( [data] => A Few Notes on UseR! 2014 [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [link] => Array ( [0] => Array ( [data] => https://yihui.name/en/2014/07/a-few-notes-on-user2014/ [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [pubDate] => Array ( [0] => Array ( [data] => Sat, 26 Jul 2014 00:00:00 +0000 [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [guid] => Array ( [0] => Array ( [data] => https://yihui.name/en/2014/07/a-few-notes-on-user2014/ [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [description] => Array ( [0] => Array ( [data] =>

It has been a month since the UseR! 2014 conference, and I’m probably the last one who writes about it. UseR! is my favorite conference because it is technical and not too big. I have completely lost interest in big and broad conferences like JSM (to me, it has become Joint Sightseeing Meetings). Karl has written two blog posts about UseR! (1-2, 3-4), and I’m going to add a few more observations here. An important disclaimer before I move on: Karl Broman is not responsible for videotaping at UseR! 2014 (neither am I), and he is not even on the conference committee. I accidentally mentioned the videos when replying his tweet, which seemed to have caused confusion unfortunately. Any questions about the conference, including the time to publish videos, should be directed to the official organizing committee.

The conference website is hosted on Github. Awesome. Speakers can add links to their slides through pull requests. Genius. It is a little sad, though, that each UseR conference has its own website and twitter handle. There should have been a single website, a single domain name, and a single twitter account managing all the R conferences each year. Fragmentation is just such a natural thing in programmers’ world.

The on-campus dorms were fantastic (oh when I wrote “dorm” I almost typed “dnorm”), which saved us time on transportation. Dining halls were on campus as well. Breakfast was perfect, although I could not stand eating sandwiches, hamburgers, or pizza for four days. Okay, let’s talk about the talks. You can find most of the slides on the conference website.

• John Chambers: the three promising projects Rcpp/Rcpp11, RLLVM, and h2o. I do not know anything about h2o. I think Rcpp has been a great success, and I have my blind faith in Romain for Rcpp11. For RLLVM, I’m a little concerned: 15 stars, and 4 forks on Github. That is not a good sign. The bus factor is too low. Well, I have been admiring Duncan for a huge number of his amazing packages. Perhaps he can handle this one on his own as well.
• Ramnath Vaidyanathan: all JavaScript libraries are belong to Ramnath! If you want to get a certain JS library integrated with R, tell him the night before, go to sleep, and you will see it in the R world the next morning. I’m only partially kidding :)
• Gordon Woodhull: it was great to see the substantial progress in RCloud.
• Jeroen Ooms: I came from the RWeb age, so you know how excited I was when I saw OpenCPU a couple of years ago. There were things I wished I could do for years but were too complicated before OpenCPU was launched. For example, the knitr issue #51 was probably my first experiment with OpenCPU, and I had a lot of fun with it.
• Jonathan Godfrey: I did not attend his talk, but he attended my tutorial and talked to me a couple of times during the conference. That was the first time I had talked to a blind R user, and was surprised to know a few facts:
• PDF is bad for blind people, and HTML is much better (think R package vignettes);
• it is nearly impossible for them to read raster images, and SVG graphics can be better;
• if there is an image in an HTML document, its alt attribute is very important;
• LaTeX math expressions created by MathJax look excellent in the eyes of sighted people, but they seem to be hard to read by the blind (Jonathan mentioned to me afterwards that it may be possible to configure MathJax to make it readable but he is not sure how to do it; the math expressions on Wikipedia are displayed as images with the alt attribute, and he can read those math expressions);
• Matt Dowle: his data.table story with Patrick Burns was pretty interesting. You can read his slides.
• Aran Lunzer: LivelyR. It was not a talk. It was simply magic. I had absolutely no clue how they made it, even though I’m one of the (co-)authors of the packages that they used.
• Dirk Eddelbuettel: Rcpp and Docker. I was really glad that he mentioned Docker. Travis CI seems to have attracted a lot of attention of R package authors after I was inspired by a reader of my blog and experimented with it last year. The major missing piece on Travis CI is R for R users. apt-get install r-base every time is a waste of time and resources. It will be nice if one can build one’s own virtual machine with all the necessary packages. This is pretty simple with Docker. However, I have not found a free service like Travis CI that allows the users to build/test software with custom Docker containers.
• Martin Mächler: good practices in R programming. You can find slides on his homepage. This was an excellent talk. Precise and clear. I recommend everyone to read his slides. One minor and subjective issue is = vs <-. Not many people are with me, and once Alyssa Frazee’s post made me cried a little in the restroom (so excited to find another person using = in R).
• Andy Chen: RLint. Programming styles? I guess a single programming style will never happen. I insist on using = instead of <- for assignment in R, except when I collaborate with the left arrow party. Roger Peng insists 8 spaces for indentation. Excuse me? What is a programming style?
• John Nash: it was a great pleasure to meet John in person for the first time. I love hearing old stories from senior people, such as Jeff Laake telling me early stories about ADMB and CRAN, and John Kimmel mentioning John Tukey and the development of interactive graphics at Bell Labs. John (Nash) showed me some Fortran, Pascal, and BASIC programs to me that were even older than me. Personally I have no interest in these languages, but it was interesting to know what were done before you were born, and some of the programs are still “alive” in R. Since he was so eager for running Fortran code in knitr documents, we sat together for a few minutes, he wrote a Fortran example, and I just added a quick and dirty Fortran engine in knitr.
• I gave a talk titled “Knitr Ninja”, and a few people remembered sword(2) after that. I was extremely bored by myself after I had given so many talks on knitr, so I thought I should do a completely different talk that nobody had heard of, including my bosses at RStudio. It turned out that the RStudio viewer was pretty handy for presentations, and I could show Kakashi in it:

• Katharine Mullen: JSS. Honestly I’m a little concerned about JSS, although I strongly believe it is an outstanding journal. I have only published one paper on it (the animation package), and the review process was too slow. Four months goes by and you hear nothing back so you ask what’s up. Then four months goes by, you get the first round of review. Sometimes I do not quite understand how free journals work, or what motivates the anonymous reviewers. I think this is a pretty hard problem, and I would propose to open up the journal to the wild just like open source software. Even 58 editors on board is still too few compared to the number of authors and submissions. I was extremely excited that Jan de Leeuw’s very first proposal of establishing such a journal was that the journal should be done in HTML!! And interactive, where possible!! That was the year 1995 (I was still in the fifth grade in elementary school learning fractions in a village). Twenty years later, I think the infrastructure is good enough (e.g. R Markdown, Shiny, Shiny Server) to go back to his original proposal. I would love to see papers in HTML instead of PDF. Typesetting with HTML is a whole lot easier and attractive than LaTeX/PDF in my opinion, and there is a whole lot more interesting stuff to play with in HTML. With R Markdown v2, HTML and PDF are not mutually exclusive, although we will have to give up certain markups in LaTeX, but man, do you really need \proglang{} and \pkg{}?

Finally it is rumor time:

• iris has been officially declared (? by whom? perhaps by the many sleepy faces in the audience) as the dataset porn in R, and the next candidate will be ggplot2::diamonds!
[attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) ) ) ) [3] => Array ( [data] => [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) [child] => Array ( [] => Array ( [title] => Array ( [0] => Array ( [data] => library() vs require() in R [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [link] => Array ( [0] => Array ( [data] => https://yihui.name/en/2014/07/library-vs-require/ [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [pubDate] => Array ( [0] => Array ( [data] => Sat, 26 Jul 2014 00:00:00 +0000 [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [guid] => Array ( [0] => Array ( [data] => https://yihui.name/en/2014/07/library-vs-require/ [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [description] => Array ( [0] => Array ( [data] =>

While I was sitting in a conference room at UseR! 2014, I started counting the number of times that require() was used in the presentations, and would rant about it after I counted to ten. With drums rolling, David won this little award (sorry, I did not really mean this to you).

Ladies and gentlemen, I've said this before: require() is the wrong way to load an R package; use library() instead #useR2014

— Yihui Xie (@xieyihui) July 2, 2014

After I tweeted about it, some useRs seemed to be unhappy and asked me why. Both require() and library() can load (strictly speaking, attach) an R package. Why should not one use require()? The answer is pretty simple. If you take a look at the source code of require (use the source, Luke, as Martin Mächler mentioned in his invited talk), you will see that require() basically means “try to load the package using library() and return a logical value indicating the success or failure”. In other words, library() loads a package, and require() tries to load a package. So when you want to load a package, do you load a package or try to load a package? It should be crystal clear.

One bad consequence of require() is that if you require('foo') in the beginning of an R script, and use a function bar() in the foo package on line 175, R will throw an error object “bar” not found if foo was not installed. That is too late and sometimes difficult for other people to understand if they use your script but are not familiar with the foo package – they may ask, what is the bar object, and where is it from? When your code is going to fail, fail loudly, early, and with a relevant error message. require() does not signal an error, and library() does.

Sometimes you do need require() to use a package conditionally (e.g. the sun is not going to explode without this package), in which case you may use an if statement, e.g.

if (require('foo')) {
awesome_foo_function()
} else {
warning('You missed an awesome function')
}

That should be what require() was designed for, but it is common to see R code like this as well:

if (!require('foo')) {
stop('The package foo was not installed')
}

Sigh.

• library('foo') stops when foo was not installed
• require() is basically try(library())

Then if (!require('foo')) stop() is basically “if you failed to try to load this package, please fail”. I do not quite understand why it is worth the circle, except when one wants a different error message with the one from library(), otherwise one can simply load and fail.

There is one legitimate reason to use require(), though, and that is, “require is a verb and library is a noun!” I completely agree. require should have been a very nice name to choose for the purpose of loading a package, but unfortunately… you know.

If you take a look at the StackOverflow question on this, you will see a comment on “package vs library” was up-voted a lot of times. It used to make a lot of sense to me, but now I do not care as much as I did. There have been useRs (including me up to a certain point) desperately explaining the difference between the two terms package and library, but somehow I think R’s definition of a library is indeed unusual, and the function library() makes the situation worse. Now I’m totally fine if anyone calls my packages “libraries”, because I know what you mean.

Karthik Ram suggested this GIF to express “Ah a new library, but require? Noooooo“:

Since you have read the source code, Luke, you may have found that you can abuse require() a bit, for example:

> (require(c('MASS', 'nnet')))
Failed with error:  ‘'package' must be of length 1’
the condition has length > 1 and only the first element will be used
[1] FALSE

> (require(c('MASS', 'nnet'), character.only = TRUE))
Failed with error:  ‘'package' must be of length 1’
the condition has length > 1 and only the first element will be used
[1] FALSE

> library(c('MASS', 'nnet'), character.only = TRUE)
Error in library(c("MASS", "nnet"), character.only = TRUE) :
'package' must be of length 1

So require() failed not because MASS and nnet did not exist, but because of a different error. As long as there is an error (no matter what it is), require() returns FALSE.

One thing off-topic while I’m talking about these two functions: the argument character.only = FALSE for library() and require() is a design mistake in my eyes. It seems the original author(s) wanted to be lazy to avoid typing the quotes around the package name, so library(foo) works like library("foo"). Once you show people they can be lazy, you can never pull them back. Apparently, the editors of JSS (Journal of Statistical Software) have been trying to promote the form library("foo") and discourage library(foo), but I do not think it makes much sense now or it will change anything. If it were in the 90’s, I’d wholeheartedly support it. It is simply way too late now. Yes, two extra quotation marks will kill many kittens on this planet. If you are familiar with *nix commands, this idea is not new – just think about tar -z -x -f, tar -zxf, and tar zxf.

One last mildly annoying issue with require() is that it is noisy by default, because of the default quietly = FALSE, e.g.

> require('nnet')
> require('MASS', quietly = TRUE)

So when I tell you to load a package, you tell me you are loading a package, as if you had heard me. Oh thank you!

[attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) ) ) ) [4] => Array ( [data] => [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) [child] => Array ( [] => Array ( [title] => Array ( [0] => Array ( [data] => Markdown or LaTeX? [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [link] => Array ( [0] => Array ( [data] => https://yihui.name/en/2013/10/markdown-or-latex/ [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [pubDate] => Array ( [0] => Array ( [data] => Sat, 19 Oct 2013 00:00:00 +0000 [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [guid] => Array ( [0] => Array ( [data] => https://yihui.name/en/2013/10/markdown-or-latex/ [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [description] => Array ( [0] => Array ( [data] =>

What happens if you ask for too much power from Markdown?

R Markdown is one of the document formats that knitr supports, and it is probably the most popular one. I have been asked many times about the choice between Markdown and LaTeX, so I think I’d better wrap up my opinions in a blog post. These two languages (do you really call Markdown a language?) are kind of at the two extremes: Markdown is super easy to learn and type, but it is primarily targeted at HTML pages, and you do not have fine control over typesetting ( really? really?), because you only have a very limited number of HTML tags in the output; LaTeX is relatively difficult to learn and type, but it allows you to do precise typesetting (you have control over anything, and that is probably why a lot of time can be wasted).

## What is the problem?

What is the root problem? I think one word answers everything: page! Why do we need pages? Printing is the answer.

In my eyes, the biggest challenge for typesetting is to arrange elements properly with the restriction of pages. This restriction seems trivial, but it is really the root of all “evil”. Without having to put things on pages, life can be much easier in writing.

What is the root of this root problem in LaTeX? One concept: floating environments. If everything comes in a strictly linear fashion, writing will be just writing; typesetting should be no big deal. Because a graph cannot be broken over two pages, it is hard to find a place to put it. By default, it can float to unexpected places. The same problem can happen to tables (see the end of a previous post). You may have to add or delete some words to make sure they float to proper places. That is endless trouble in LaTeX.

There is no such a problem in HTML/Markdown, because there is no page. You just keep writing, and everything appears linearly.

## Can I have both HTML and PDF output?

There is no fault being greedy, and it is natural to ask the question whether one can have both HTML and PDF output from a single source document. The answer is maybe yes: you can go from LaTeX to HTML, or from Markdown to LaTeX/PDF.

• pandoc can convert Markdown to almost anything
• many tools to convert LaTeX to HTML

But remember, Markdown was designed for HTML, and LaTeX was for PDF and related output formats. If you ask for more power from either language, the result is not likely be ideal, otherwise one of them must die.

## How to make the decision?

If your writing does not involve complicated typesetting and primarily consists of text (especially no floating environments), go with Markdown. I cannot think of a reason why you must use LaTeX to write a novel. See Hadley’s new book Advanced R programming for an excellent example of Markdown + knitr + other tools: the typesetting elements in this book are very simple – section headers, paragraphs, and code/output. That is pretty much it. Eventually it should be relatively easy to convert those Markdown files to LaTeX via Pandoc, and publish a PDF using the LaTeX class from Chapman & Hall.

For the rest of you, what I’d recommend is to think early and make a decision in the beginning; avoid having both HTML and PDF in mind. Ask yourself only one question: must I print the results nicely on paper? If the answer is yes, go with LaTeX; otherwise just choose whatever makes you comfortable. The book Text Analysis with R authored by Matthew Jockers is an example of LaTeX + knitr. Matt also asked me this question about Markdown vs LaTeX last week while he was here at Iowa State. For this particular book, I think Markdown is probably OK, although I’m not quite sure about a few environments in the book, such as the chapter abstracts.

It is not obvious whether we must print certain things. I think we are just too used to printing. For example, dear professors, must we print our homework? (apparently Jenny does not think so; I saw her grade homework on RPubs.com!) Or dear customers, must we submit reports in PDF? … In this era, you have laptops, iPad, Kindle, tablets and all kinds of electronic devices that can show rich media, why must you print everything (in black and white)?

For those who are still reading this post, let me finish with a side story: Matt, a LaTeX novice, taught himself LaTeX a few months ago, and he has finished the draft of a book with LaTeX! Why are you still hesitating about the choice of tools? Shouldn’t you just go ahead and get the * done? Although all roads lead to Rome, some people die at the starting line instead of on the roads.

[attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) ) ) ) [5] => Array ( [data] => [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) [child] => Array ( [] => Array ( [title] => Array ( [0] => Array ( [data] => Testing R Packages [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [link] => Array ( [0] => Array ( [data] => https://yihui.name/en/2013/09/testing-r-packages/ [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [pubDate] => Array ( [0] => Array ( [data] => Mon, 30 Sep 2013 00:00:00 +0000 [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [guid] => Array ( [0] => Array ( [data] => https://yihui.name/en/2013/09/testing-r-packages/ [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [description] => Array ( [0] => Array ( [data] =>

This guy th3james claimed Testing Code Is Simple, and I agree. In the R world, this is not anything new. As far as I can see, there are three schools of R users with different testing techniques:

1. tests are put under package/tests/, and a foo-test.Rout.save from R CMD BATCH foo-test.R; testing is done by comparing foo-test.Rout from R CMD check with your foo-test.Rout.save; R notifies you when it sees text differences; this is typically used by R core and followers
2. RUnit and its followers: formal ideas were borrowed from other languages and frameworks and it looks there is a lot to learn before you can get started
3. the testthat family: tests are expressed as expect_something() like a natural human language

At its core, testing is nothing but “tell me if something unexpected happened”. The usual way to tell you is to signal an error. In R, that means stop(). A very simple way to write a test for the function FUN() is:

if (!identical(FUN(arg1 = val1, arg2 = val2, ...), expected_value)) {
stop('FUN() did not return the expected value!')
}

That is, when we pass the values val1 and val2 to the arguments arg1 and arg2, respectively, the function FUN() should return a value identical to our expected value, otherwise we signal an error. If R CMD check sees an error, it will stop and fail.

For me, I only want one thing for unit testing: I want the non-exported functions to be visible to me during testing; unit testing should have all “units” available, but R’s namespace has intentionally restricted the objects that are visible to the end users of a package, which is a Very Good Thing to end users. It is less convenient to the package author, since he/she will have to use the triple colon syntax such as foo:::hidden_fun() when testing the function hidden_fun().

I wrote a tiny package called testit after John Ramey dropped by my office one afternoon while I was doing intern at Fred Hutchinson Cancer Research Center last year. I thought a while about the three testing approaches, and decided to write my own package because I did not like the first approach (text comparison), and I did not want to learn or remember the new vocabulary of RUnit or testthat. There is only one function for the testing purpose in this package: assert().

assert(
"1 plus 1 is equal to 2",
1 + 1 == 2
)

You can write multiple testing conditions, e.g.

assert(
"1 plus 1 is equal to 2",
1 + 1 == 2,
identical(1 + 1, 2),
(1 + 1 >= 2) && (1 + 1 <= 2), # mathematician's proof
c(is.numeric(1 + 1), is.numeric(2))
)

There is another function test_pkg() to run all tests of a package using an empty environment with the package namespace as its parent environment, which means all objects in the package, exported or not, are directly available without ::: in the test scripts. See the CRAN page for a list of packages that use testit, for example, my highr package, where you can find some examples of tests.

While I do not like the text comparison approach, it does not mean it is not useful. Actually it is extremely useful when testing text document output. It is just a little awkward when testing function output. The text comparison approach plays an important role in the development of knitr: I have a Github repository knitr-examples, which serves as both an example repo and a testing repo. When I push new commits to Github, I use Travis CI to test the package, and there are two parts of the tests: one is to run R CMD check on the package, which uses testit to run the test R scripts, and the other is to re-compile all the examples, and do git diff to see if there are changes. I have more than 100 examples, which should have reasonable coverage of possible problems in the new changes in knitr. This way, I feel comfortable when I bring new features or make changes in knitr because I know they are unlikely to break old documents.

If you are new to testing and only have 3 minutes, I’d strongly recommend you to read at least the first two sections of Hadley’s testthat article.

[attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) ) ) ) [6] => Array ( [data] => [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) [child] => Array ( [] => Array ( [title] => Array ( [0] => Array ( [data] => After Three Months I Cannot Reproduce My Own Book [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [link] => Array ( [0] => Array ( [data] => https://yihui.name/en/2013/09/cannot-reproduce-my-own-book/ [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [pubDate] => Array ( [0] => Array ( [data] => Thu, 05 Sep 2013 00:00:00 +0000 [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [guid] => Array ( [0] => Array ( [data] => https://yihui.name/en/2013/09/cannot-reproduce-my-own-book/ [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [description] => Array ( [0] => Array ( [data] =>

TL;DR I thought I could easily jump to a high standard (reproducibility), but I failed.

Some of you may have noticed that the knitr book is finally out. Amazon is offering a good price at the moment, so if you are interested, you’d better hurry up.

I avoided the phrase “Reproducible Research” in the book title, because I did not want to take that responsibility, although it is related to reproducible research in some sense. The book was written with knitr v1.3 and R 3.0.1, as you can see from my sessionInfo() in the preface.

Three months later, several things have changed, and I could not reproduce the book, but that did not surprise me. I’ll explain the details later. Here I have extracted the first three chapters, and released the corresponding source files in the knitr-book repository on Github. You can also find the link to download the PDF there. This repository may be useful to those who plan to write a book using R.

What I could not reproduce were not really important. The major change in the recent knitr versions was the syntax highlighting commands, e.g. \hlcomment{} is \hlcom{} now, and the syntax highlighting has been improved by the highr package (sorry, Romain). This change brought a fair amount of changes when I look at git diff, but these are only cosmetic changes.

I tried my best to avoid writing anything that is likely to change in the future into the book, but as a naive programmer, I have to say sorry that I have broken two little features, although they may not really affect you:

• the preferred way to stop knitr in case of errors is to set the chunk option error = FALSE instead of the package option stop_on_error, which has been deprecated (Section 6.2.4);
• for external code chunks (Section 9.2), the preferred chunk delimiter is ## ---- instead of ## @knitr now;

Actually the backward-compatibility is still there, so they will not really break until a long time later.

With exactly the same software environment, I think I can reproduce the book, but that does not make much sense. Things are always evolving. Then there are two types of reproducible research:

1. the “dead” reproducible research (reproduce in a very specific environment);
2. the reproducible research that evolves and generalizes;

I think the latter is more valuable. Being reproducible alone is not the goal, because you may be reproducing either real findings or simply old mistakes. As Roger Peng wrote,

[…] reproducibility cannot really address the validity of a scientific claim as well as replication

Roger’s recent three blog posts on reproducible research are very worth reading. This blog post of mine is actually not quite relevant (no data analysis here), so I recommend my readers to move over there after you haved checked out the knitr-book repository.

[attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) ) ) ) [7] => Array ( [data] => [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) [child] => Array ( [] => Array ( [title] => Array ( [0] => Array ( [data] => My first Bioconductor conference (2013) [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [link] => Array ( [0] => Array ( [data] => https://yihui.name/en/2013/07/bioconductor-2013/ [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [pubDate] => Array ( [0] => Array ( [data] => Sun, 21 Jul 2013 00:00:00 +0000 [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [guid] => Array ( [0] => Array ( [data] => https://yihui.name/en/2013/07/bioconductor-2013/ [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [description] => Array ( [0] => Array ( [data] =>

The BioC 2013 conference was held from July 17 to 19. I attended this conference for my first time, mainly because I’m working at the Fred Hutchinson Cancer Research Center this summer, and the conference venue was just downstairs! No flights, no hotels, no transportation, yeah.

Last time I wrote about my first ENAR experience, and let me tell you why the BioC conference organizers are smart in my eyes.

## A badge that never flips

I do not need to explain this simple design – it just will not flip to the damn blank side:

## The conference program book

The program book was only four pages of the schedule (titles and speakers). The abstracts are online. Trees saved.

## Lightning talks

There were plenty of lightning talks. You can talk whatever you want.

## Live coding

On the developer’s day, Martin Morgan presented some buggy R code to the audience (provided by Laurent Gatto), and asked us to debug it right there. Wow!

## Everything is free after registration

The registration includes almost everything: lunch, beer, wine, coffee, fruits, snacks, and most importantly, Amazon Machine Instances (AMI)!

## AMI

This is a really shiny point of BioC! If you have ever tried to do a software tutorial, you probably know the pain of setting up the environment for your audience, because they use different operating systems, different versions of packages, and who knows what is going to happen after you are on your third slide. At a workshop last year, I had the experience of spending five minutes figuring out why a keyboard shortcut did not work for one Canadian lady in the audience, and it turned out she was using the French keyboard layout.

The BioC organizers solved this problem beautifully by installing the RStudio server on AMI. Every participant was sent a link to the Amazon virtual machine, and all they need is a web browser and wireless connection in the room. All people run R in exactly the same environment.

Isn’t that smart?

## Talks

I do not really know much about biology, although a few biological terms have been added to my volcabulary this summer. When a talk becomes biologically oriented, I will have to give up.

Simon Urbanek talked about big data in R this year, which is unusual, as mentioned by himself. Normally he shows fancy graphics (e.g. iplots). I did not realize the significance of this R 3.0.0 news item until his talk:

It is now possible to write custom connection implementations outside core R using R_ext/Connections.h. Please note that the implementation of connections is still considered internal and may change in the future (see the above file for details).

Given this new feature, he implemented the HDFS connections and 0MQ-based connections in R single-handedly (well, that is always his style).

You probably have noticed the previous links are Github repositories. Yes! Some R core members really appreciate the value of social coding now! I’m sure Simon does. I’m aware of other R core members using Github quietly (DB, SF, MM, PM, DS, DTL, DM), but I do not really know their attitude toward it.

Joe Cheng’s Shiny talk is shiny as usual. Each time I attend his talk, he will show a brand new amazing demo. Joe is the only R programmer that makes me feel “the sky is the limit (of R)”. The audience were shocked when they saw a heatmap that they were so familiar with suddently became interactive in a Shiny app! BTW, Joe has a special sense of humor when he talks about an area in which he is not an expert (statistics or biology).

RStudio 0.98 is going to be awesome. I’m not going to provide the links here, since it is not released yet. I’m sure you will find the preview version if you really want it.

## Bragging rights

• I met Robert Gentleman for the first time!
• I dare fall asleep during Martin Morgan’s tutorial! (sorry, Martin)
• some Bioconductor web pages were built with knitr/R Markdown!

## Next steps

Given Biocondutor’s open-mindedness to new technologies (GIT, Github, AMI, Shiny, …), let’s see if it is going to take over the world. Just kidding. But not completely kidding. I will keep the conversation going before I leave Seattle around mid-August, and get something done hopefully.

If you have any feature requests or suggestions to Bioconductor, I will be happy to serve as the “conductor” temporarily. I guess they should set up a blog at some point.

[attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) ) ) ) [8] => Array ( [data] => [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) [child] => Array ( [] => Array ( [title] => Array ( [0] => Array ( [data] => R Package Versioning [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [link] => Array ( [0] => Array ( [data] => https://yihui.name/en/2013/06/r-package-versioning/ [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [pubDate] => Array ( [0] => Array ( [data] => Thu, 27 Jun 2013 00:00:00 +0000 [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [guid] => Array ( [0] => Array ( [data] => https://yihui.name/en/2013/06/r-package-versioning/ [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [description] => Array ( [0] => Array ( [data] =>

This should be what it feels like to bump the major version of your software:

For me, the main reason for package versioning is to indicate the (slight or significant) differences among different versions of the same package, otherwise we can keep on releasing the version 1.0.

That seems to be a very obvious fact, so here are my own versioning rules, with some ideas borrowed from Semantic Versioning:

1. a version number is of the form major.minor.patch (x.y.z), e.g., 0.1.7
2. only the version x.y is released to CRAN
3. x.y.z is always the development version, and each time a new feature or a bug fix or a change is introduced, bump the patch version, e.g., from 0.1.3 to 0.1.4
4. when one feels it is time to release to CRAN, bump the minor version, e.g., from 0.1 to 0.2
5. when a change is crazy enough that many users are presumably going to yell at you (see the illustration above), it is time to bump the major version, e.g., from 0.18 to 1.0
6. the version 1.0 does not imply maturity; it is just because it is potentially very different from 0.x (such as API changes); same thing applies to 2.0 vs 1.0

I learned the rule #3 from Michael Lawrence (author of RGtk2) and I think it is a good idea. In particular, it is important for brave users who dare install the development versions. When you ask them for their sessionInfo(), you will be aware of which stage they are at.

Rule #2 saves us a little bit energy in the sense that we do not need to write or talk about the foo package 1.3.548, which is boring to type or speak. Normally we say foo 1.3. As a person whose first language is not English, speaking the patch version does consume my brain memory and slows down my thinking while I’m talking. When I say it in Chinese, I feel boring and unnecessarily geeky. Yes, I know I always have weird opinions.

[attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) ) ) ) [9] => Array ( [data] => [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) [child] => Array ( [] => Array ( [title] => Array ( [0] => Array ( [data] => You Do Not Need to Tell Me I Have A Typo in My Documentation [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [link] => Array ( [0] => Array ( [data] => https://yihui.name/en/2013/06/fix-typo-in-documentation/ [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [pubDate] => Array ( [0] => Array ( [data] => Mon, 10 Jun 2013 00:00:00 +0000 [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [guid] => Array ( [0] => Array ( [data] => https://yihui.name/en/2013/06/fix-typo-in-documentation/ [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [description] => Array ( [0] => Array ( [data] =>

Update (2017/02/06): It is much easier to fix typos on my web pages now. There is a toolbar under the title of each post, and please hit the Edit button if you find any mistakes on the page to propose a correction through Github. You don’t need to use any command-line tools. You should not follow the instructions in the post below any more (the knitr documentation has been moved to a different repo).

So I just got yet yet another comment saying “you have a typo in your documentation”. While I do appreciate these kind reminders, I think it might be a good exercise for those who want to try GIT and Github pull requests, which make it possible for you to contribute to open source and fix obvious problems with no questions being asked – just do it yourself, and send the changes to the original author(s) through Github.

The official documentation for Github pull requests is a little bit verbose for beginners. Basically what you need to do for simple tasks are:

1. click the Fork button and clone the repository in your own account;
2. make the changes in your cloned version;
4. click the Pull Request button to send a request to the original author;

For trivial changes, sometimes I accept them on my cell phone while I’m still in bed. No extra communication is needed.

Occasionally I see reports of this kind of trivial documentation changes in the R-devel mailing list, and I believe that is just horribly inefficient. You could have done this quietly and quickly, and the developers could have merged the changes with a single mouse click. (Oh, okay, well, you know, SVN, mailing lists, …)

For the knitr repository, it has two branches: master and gh-pages. The R package lives in the master branch, and the knitr website lives in the gh-pages branch. If you want to fix any problems in the website, just check out the gh-pages:

git checkout gh-pages

All pages were written in Markdown, so edit them with your favorite text editor. For example, as the above comment pointed out, I omitted a right parenthesis ) in _posts/2012-02-24-sweave.md, and you just add it, save the file, write a GIT commit message, push to your repository and send the pull request.

I know I can do this by myself in five seconds, and it takes me way more time to write this blog post, but I just want everybody to know how people with different skill levels can play their roles in software development.

Let’s see how many minutes it takes for the pull request to come after I publish this blog post. Hurry!! :)

[attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) ) ) ) [10] => Array ( [data] => [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) [child] => Array ( [] => Array ( [title] => Array ( [0] => Array ( [data] => A Few Tips for Writing an R Book [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [link] => Array ( [0] => Array ( [data] => https://yihui.name/en/2013/06/tips-for-writing-an-r-book/ [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [pubDate] => Array ( [0] => Array ( [data] => Mon, 03 Jun 2013 00:00:00 +0000 [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [guid] => Array ( [0] => Array ( [data] => https://yihui.name/en/2013/06/tips-for-writing-an-r-book/ [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [description] => Array ( [0] => Array ( [data] =>

I just finished fixing (hopefully all) the problems in the knitr book returned from the copy editor. David Smith has kindly announced this book before I do. I do not have much to say about this book: almost everything in the book can be found in the online documentation, questions & answers and the source code. The point of buying this book is perhaps you do not have time to read through all the two thousand questions and answers online, and I did that for you.

This is my first book, and obviously there have been a lot for me to learn about writing a book. In retrospect, I want to share a few tips that I found useful (in particular, for those who plan to write for Chapman & Hall):

1. Although it sounds like shameless self-promotion, using knitr made it a lot easier to manage R code and its output for the book; for example, I could quickly adapt to R 3.0.1 from 2.15.3 after I came back from a vacation; if I were to write a second edition, I do not think I will have big trouble with my R code in the book (it is easy to make sure the output is up-to-date);

2. I put my source documents under version control, which helped me watch the changes in the output closely; for example, I noticed the source code of the function fivenum() in base R was changed from R 2.15.3 to 3.0.0 thanks to GIT (R core have been updating base R everywhere!);

3. (opinionated) Some people might be very bored to hear this: use LyX instead of plain LaTeX… because you are writing, not coding; LaTeX code is not fun to read…

4. for the LaTeX document class krantz.cls (by Chapman & Hall):

• to solve the only stupid problem in LaTeX (i.e., floating environments float to silly places by default), use something like this:

\renewcommand{\textfraction}{0.05}
\renewcommand{\topfraction}{0.8}
\renewcommand{\bottomfraction}{0.8}
\renewcommand{\floatpagefraction}{0.75}

I’m aware of the float package and the H option, and options like !tbp; I just do not want to force LaTeX to do anything – it may or may not be happy at some point.

• put \usepackage{emptypage} in the preamble to make empty pages really empty, as required by the copy editor.
• the document class krantz.cls does not work with the hyperref package, meaning that you cannot create bookmarks in the PDF; I have posted the solution here.
5. for authors whose native language is not English like me, here is a summary of my problems in English:

• when you want to use which, use that instead, unless there is a comma ahead, or you really want to emphasize a very specific object; e.g.,

“here is a package that is helpful” (correct)

“here is a package which is helpful” (wrong)

“we will introduce an extremely important technology next, which has revolutionized the life of poor statisticians”

• it is “A, B, and C” instead of “A, B and C”

• do not forget the comma in other places, either: “e.g.,”, “i.e.,”, “foo and bar, respectively”; actually, try to use the comma whenever possible to break long sentences into shorter pieces

6. for the plots, use the cairo_pdf() device when possible; in knitr, this means the chunk option dev = 'cairo_pdf'; the reason for the choice of cairo_pdf() over the normal pdf() device is that it can embed fonts in the PDF plot files, otherwise the copy editor will require you to embed all the fonts in the final PDF file of the book; normally pdflatex will embed fonts, and if there are fonts that are not embedded, it is very likely that they are from R graphics;

7. include as many figures as possible (I have 51 figures in this 200-page book), because this will make the number of pages grow faster (I’m evil) so that you will not feel frustrated, and the readers will not fall into the hell of endless text, just pages after pages;

8. prepare an extra monitor for copyediting;

9. learn a little bit about pdftk, because you may want to use it finally, e.g., replace one page with a blank page in the frontmatter;

10. learn these copy editing symbols (thanks, Matt Shotwell);

One thing I did not really understand was the punctuation marks like commas and periods should go inside quotation marks, e.g.,

I have “foo” and “bar.”

This makes me feel weird. I’m more comfortable with

I have “foo” and “bar”.

There was also one thing that I did not catch by version control – one figure file went wrong and I did not realize it, because normally I do not put binary files under version control. Fortunately, I caught it by my eyes. Karl Broman mentioned the same problem to me a while ago. I know there are tools for comparing images (ImageMagick, for example), and I was just too lazy to learn them.

I will be glad to know the experience of other authors, and will try to update this post according to the comments.

[attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) ) ) ) [11] => Array ( [data] => [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) [child] => Array ( [] => Array ( [title] => Array ( [0] => Array ( [data] => Travis CI for R! (not yet) [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [link] => Array ( [0] => Array ( [data] => https://yihui.name/en/2013/04/travis-ci-general-purpose/ [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [pubDate] => Array ( [0] => Array ( [data] => Fri, 12 Apr 2013 00:00:00 +0000 [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [guid] => Array ( [0] => Array ( [data] => https://yihui.name/en/2013/04/travis-ci-general-purpose/ [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [description] => Array ( [0] => Array ( [data] =>

A few days ago I wrote about Travis CI, and was wondering if we could integrate the testing of R packages into this wonderful platform. A reader (Vincent Arel-Bundock) pointed out in the comments that Travis was running Ubuntu that allows you to install software packages at your will.

I took a look at the documentation, and realized they were building and testing packages in virtual machines. No wonder sudo apt-get works. Remember apt-get -h | tail -n1:

This APT has Super Cow Powers. (APT有超级牛力)

## R on Travis CI

Now we are essentially system admins, and we can install anything from Ubuntu repositories, so it does not really matter that Travis CI does not support R yet. Below are a few steps to integrate your R package (on Github) into this system:

1. follow the official guide util you see .travis.yml;
2. copy my .travis.yml for the knitr package if you want, or write your own;
• I use a custom library path ~/R to install add-on R packages so that I do not have to type sudo everywhere
• at the moment I use the RDev PPA by Michael Rutter to install R 3.0.0 since his plan for R 3.0 on CRAN is in May; at that time I’ll change this PPA to a CRAN repository
• since R CMD check requires all packages in Suggests as well, I install knitr using install.packages(dep = TRUE) to make sure all relevant packages are installed
• make install and make check are wrappers of R CMD build and R CMD check respectively, defined in the Makefile
3. push this .travis.yml to Github, and Travis CI will start building your package when a worker is available (normally within a few seconds);

By default you will receive email notifications when there are changes in the build. You can also find the guide on the build status image in the documentation as well, e.g.

What I described here actually applies to any software packages (not only R), as long as the dependencies are available under Ubuntu, or you know how to build them.

## But it is still far from CRAN

OK, it works, but we are still a little bit far from what CRAN does, because Travis CI does not have official support for R. Each time we have to install one Gigabyte of additional software to create the R testing environment (sigh, if only R did not have to tie itself to LaTeX). If these packages are pre-built in the virtual machines, it will save us a lot of time.

The second problem is, there is no Windows support on Travis CI (one developer told us on Twitter that it was coming). There is a page for OS X, but I did not really figure out how to build software under OS X there.

The third problem is Travis CI only builds and tests packages; it does not provide downloads like CRAN. Perhaps we can upload the packages using encryption keys to our own servers.

## R-Forge, where are you going?

I will shut up here since I realized I was not being constructive. Let me spend more time thinking about this, and I love to hear suggestions from readers as well.

So, two potential Google Summer of Code projects:

• make R an officially supported language on Travis CI (this really depends on if the Travis team want it or not)
• improve R-Forge (of course this depends on if the R-Forge team think they need help or not)
[attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) ) ) ) [12] => Array ( [data] => [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) [child] => Array ( [] => Array ( [title] => Array ( [0] => Array ( [data] => Travis CI for R? [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [link] => Array ( [0] => Array ( [data] => https://yihui.name/en/2013/04/travis-ci-for-r/ [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [pubDate] => Array ( [0] => Array ( [data] => Sun, 07 Apr 2013 00:00:00 +0000 [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [guid] => Array ( [0] => Array ( [data] => https://yihui.name/en/2013/04/travis-ci-for-r/ [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [description] => Array ( [0] => Array ( [data] =>

I’m always worried about CRAN: a system maintained by FTP and emails from real humans (basically one of Uwe, Kurt or Prof Ripley). I’m worried for two reasons:

1. The number of R packages is growing exponentially;
2. Time and time again I see frustrations from both parties (CRAN maintainers and package authors);

I have a good solution for 2, which is to keep silent when your submission passes the check system, and say “Sorry!” no matter if you agree with the reason or not when it did not pass (which made one maintainer unhappy), but do not argue – just go back and fix the problem if you know what is the problem; or use dark voodoo to hide (yes, hide, not solve) the problem if you are sure you are right. If you read the mailing list frequently, you probably remember that if (CRAN) discussion. The solution in my mind was if (Sys.getenv('USER') == 'ripley').

The key is, do not argue. Silence is gold.

The CRAN maintainers have been volunteering their time, and we should respect them. The question is, will this approach scale well with the growth of packages? Or who should be in charge of R CMD check?

We, the poor authors, cannot guaranttee that every time our packages can pass CRAN’s machines due to all kinds of reasons. Some problems are actually easy to fix without a real human yelling at us. On the other hand, if the package fortunately passes R CMD check, we do not really need an email from a real human acknowledging “thanks, on CRAN now”.

Travis CI is an excellent platform for continuous integration of software packages. You do not need to interact with a real person by email – each time you push to Github, your package will be automatically built and checked. If there are problems, you will be notified automatically.

A similar platform in the R world is Bioconductor. It has the best two components in software development: version control (although sadly SVN) and continuous checking. I do not know if CRAN will catch up one day. I’m not very optimistic about it; perhaps a more realistic approach is to start a Google Summer of Code project on introducing R into Travis CI. I have no idea how difficult that will be, but I will definitely be thrilled if it comes true this year.

Anyone?

Update on 04/16/2013: just to clarify, what Bioconductor does is not strictly continuous integration (yet) in the sense that it builds packages daily instead of immediately on changes.

[attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) ) ) ) [13] => Array ( [data] => [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) [child] => Array ( [] => Array ( [title] => Array ( [0] => Array ( [data] => On ENAR, or Statistical Meetings in General [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [link] => Array ( [0] => Array ( [data] => https://yihui.name/en/2013/03/on-enar-or-statistical-meetings-in-general/ [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [pubDate] => Array ( [0] => Array ( [data] => Thu, 14 Mar 2013 00:00:00 +0000 [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [guid] => Array ( [0] => Array ( [data] => https://yihui.name/en/2013/03/on-enar-or-statistical-meetings-in-general/ [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [description] => Array ( [0] => Array ( [data] =>

Last year I accepted an invitation from Ben to go to ENAR 2013 – my first ENAR. I used to go to JSM and useR!, and apparently I enjoy useR! most. The reason is not, or not only, because I’m more of a technical person. It is just hard to concentrate at large statistical conferences. I want to make a few suggestions from the perspective of a student, although it is unlikely that any future conference chairs will come here and listen to me:

1. Go green and get rid of printed programs. A program book is thick and clunky and nobody will take it with them when they leave. The hundreds of pages of paper will only end up in garbage. If you have to print them, print N/5 copies instead (N is the number of participants) and let the participants share with each other.
2. Improve the websites and add social network features. For example, we can “reserve” the talks which we are interested, so all participants immediately know which are the popular and highly expected talks, and organizers can schedule appropriate rooms. The discussion session by John Chambers, Duncan Temple Lang, Thomas Lumley and Michael Lawrence at the last JSM in San Diego was a failure in the sense that many people were standing outside of the room (presumably fans of John Chambers). By comparison, my session at ENAR was assigned to a room of (more than?) 400 seats but only 20 people showed up.
3. If we tell the organizers which sessions we plan to go, the program book can be a lot thinner! If there are changes in these sessions, only a small group of people need to be notified. If you notify everybody by inserting a few pieces of announcements, it is a waste of time of most people.
4. One thing I really wish to have for conferences is I want to know which people in my “circle” are also going there. It is hard to go through 1000 names of participants and spot some familiar names. I met a friend at ENAR who collaborated with me (translating an English R tutorial to Chinese) almost 10 years ago but we have never really met in person so we do not know each other. I did not know he was coming as well. I was just sitting on the sofa in a corner and he randomly saw my badge. We were so excited that we finally met in such an unexpected place.
5. If participants can make connections with each other beforehand, it is likely to save us money as well – we can share the costs when we rent cars, take cabs, book hotels and so on.
6. So please do not charge us upon registration – give us a deadline and charge us later. Perhaps I will change my mind later if I cannot find enough interesting people to meet, or the popular talks seem irrelevant to me.
7. I heard from Hadley that some IEEE conferences require the speakers to do a 30-second talk before the conferences, and I think that will be cool and useful for statistical conferences as well. Nowadays I still hear certain speakers read their slides word by word. Some speakers may be shy or are not confident in their oral English, but I do not think the language problem is a really big problem. My suggestion to these speakers is to spend more time preparing jokes instead of the slides: jokes make the audience concentrate and speakers relax. I have told a lot of stupid jokes that I regret afterwards, but I think the net effect is still positive.
8. If it is not possible to arrange 30-second talks, the conference website should allow speakers upload mini versions of their talks to attract more audience.
9. Some people go to conferences for both presentations and sightseeing. Personally I do not care about the latter at all, but unfortunately all big conferences take place in famous big cities. This ENAR was held in the largest Marriott in the world. Am I proud of that? No, not at all, because I had to live in a much cheaper hotel three miles away. One evening I tried to walk back and it took me one hour and twenty minutes. What is more, this place is not really walkable – I had to walk on the grass on the roadside for half an hour because there was no pavement! I do not really mind walking (even for three miles), but it is not a happy memory walking on the grass. The Marriott was such a closed universe that it was hard to walk out, as Karl’s picture shows below:

There's a world outside #ENAR2013 but barbed wire to prevent us from getting there. pic.twitter.com/Yca4pCcoeW

— Karl Broman (@kwbroman) March 12, 2013

By comparison, useR! conferences often take place in a university campus. Last year it was in Vanderbilt, and they provided dorms to students. I lived happily in the dorm, because all I wanted was a place to sleep (there was free wireless too); nothing luxury. Usually there are also inexpensive restaurants on campus. I met helpful local students/researchers there who gave me free ride to some scenic spots, and I had to fight to pay my tickets by myself (but was still treated). It was just a touching trip and I managed to make acquaintance quite a few interesting people (Frank Harrell, Bill Venables and Kevin Coombes, etc).

ENAR “kindly” included a ticket for the Epcot theme park in the registration fee. What did I do in the theme park? I had a dinner in an expensive restaurant (Karl felt guilty for taking me there and generously treated me), and watched a 10-minute fireworks show. Yes, we were so nerdy that we kept on discussing the role of measure theory and Github in a place where we were supposed to say hello to the Mickey mouse.

— Karl Broman (@kwbroman) March 13, 2013

So my major suggestion to the big statistical meetings is, create an environment which emphasizes the communication among people and do not include distracting activities in the registration package by default. There are a couple of small things you can do, for example:

1. More seats in the open place so we can sit and chat.
2. Free beer. I’m sure it is more doable than an Epcot ticket.
3. Always print the participant name on both sides of the badge. You know how stupid it is to show a blank side of your badge to other people (and the badge always flips to the damn wrong side, always, flips!!), especially given that statisticians are socially awkward and feel embarassed to ask other people for names.
4. Choose a university campus instead of Marriott. If you do not know how to choose one, choose Iowa State University then. I’m sure all participants will be highly concentrated unless they are interested in seeing corns and pigs on the farms.

Okay, rants ended. Positive energy coming.

As I said on Twitter, I was happy to meet Karl Broman (who introduced Matthew Stephens to me later, the smartest person in statistics and human genetics according to Karl) and John Muschelli there. I noticed Karl long time ago, mainly due to Top ten worst graphs, but have never met him.

I did not know much about the Johns Hopkins biostat department before I visited them last year, and it has become a place that surprises me more and more. It is a weird and crazy department. I like people for bizarre reasons. For instance, I like Jeff Leek because he prefers steak to be well done (me too). Rafa hides jokes under his very serious-looking face. “Behind the Tan Door” is the best video in the history of statistics. Karl has a series of hilarious stories about the JHSPH logo. There are people in the world that you know for sure you will be excited to meet. Currently I have yet another person on my list: Tyler Rinker.

In case you have not seen it, I strongly recommend you read (and apply for) the postdoctoral fellow position in reproducible research at Hopkins. Note in particular the phrase “serious moxie”!!

So do I regret ENAR? Certainly no. If I can organize my time more efficiently meeting more people like those above, it will be even better. BTW, if you are not the 20 people in my session, feel free to check out my slides on knitr (Brian Bot simply called them “animated gifs” instead of “slides”).

[attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) ) ) ) [14] => Array ( [data] => [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) [child] => Array ( [] => Array ( [title] => Array ( [0] => Array ( [data] => Contribute to The R Journal with LyX/knitr [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [link] => Array ( [0] => Array ( [data] => https://yihui.name/en/2013/02/contribute-to-the-r-journal-with-lyx-knitr/ [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [pubDate] => Array ( [0] => Array ( [data] => Sun, 17 Feb 2013 00:00:00 +0000 [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [guid] => Array ( [0] => Array ( [data] => https://yihui.name/en/2013/02/contribute-to-the-r-journal-with-lyx-knitr/ [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [description] => Array ( [0] => Array ( [data] =>

(This paragraph is pure rant; feel free to skip it) I have been looking forward to the one-column LaTeX style of The R Journal, and it has arrived eventually. Last time I mentioned “it does not make sense to sell the cooked shrimps”; actually there is another thing that does not make sense in my eyes, which is the two-column LaTeX style. I just hate it. Two-column may save a little bit space in typesetting compared to one-column, but it brings huge inconvenience to the readers who do not have a big enough screen. For each single page, you have to scroll down to read the left column, scroll back and up to read the right column, then scroll down… So you just scroll up and down, up and down, … until you are bored by this PITA.

I have ported the new RJournal.sty to LyX, and you can find the relevant files in my lyx repository. To write articles in LyX with knitr, check out or download the repository and follow these steps:

2. From my repository, copy the layouts folder to your user directory;
3. Download RJournal.sty from the R Journal website and put it in your texmf tree so that LaTeX can find it (this might be the most challenging step if you do not know enough about LaTeX, and I do not want to explain this painful topic);
4. (For Windows users only) make sure R is in your PATH (again this is a painful topic that I hate to explain) and install.packages('knitr') in R;
5. From LyX, click Tools --> Reconfigure and restart LyX;

Now you should be able to open templates/RJournal.lyx and compile it. I have made a quick video of the process below:

So you have no execuse to escape reproducible research! It is even easier than writing in Word to contribute a reproducible article to The R Journal now.

P.S. I will try to submit this new layout file RJournal.layout as well as the template RJournal.lyx to the LyX development team if I do not hear any problems from users.

[attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) ) ) ) [15] => Array ( [data] => [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) [child] => Array ( [] => Array ( [title] => Array ( [0] => Array ( [data] => Publishing from R+knitr to WordPress [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [link] => Array ( [0] => Array ( [data] => https://yihui.name/en/2013/02/publishing-from-r-knitr-to-wordpress/ [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [pubDate] => Array ( [0] => Array ( [data] => Sun, 10 Feb 2013 00:00:00 +0000 [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [guid] => Array ( [0] => Array ( [data] => https://yihui.name/en/2013/02/publishing-from-r-knitr-to-wordpress/ [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [description] => Array ( [0] => Array ( [data] =>

Tal Galili asked a question on StackOverflow on publishing blog posts to WordPress from R + knitr. William K. Morris has written a solution long time ago, and I tweaked it a little bit and created a function knit2wp() in the development version of knitr.

See the page for WordPress in the knitr website for details. Below is a sample screenshot:

P.S. I meant to write a separate post about this, but I probably will not find time – in case you have not noticed, the Journal of Statistical Software ranked the first recently in terms of the impact factor in the category of “Statistics & Probability”:

Although everybody agrees the impact factor is nonsense, I think this is still an indication of the impact of open-access journals. If I cannot read the full content of a paper from my Google search, I will not bother to read it at all. No, I’m lazy and I will not go the library.

[attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) ) ) ) [16] => Array ( [data] => [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) [child] => Array ( [] => Array ( [title] => Array ( [0] => Array ( [data] => Code Pollution With Command Prompts [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [link] => Array ( [0] => Array ( [data] => https://yihui.name/en/2013/01/code-pollution-with-command-prompts/ [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [pubDate] => Array ( [0] => Array ( [data] => Sat, 26 Jan 2013 00:00:00 +0000 [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [guid] => Array ( [0] => Array ( [data] => https://yihui.name/en/2013/01/code-pollution-with-command-prompts/ [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [description] => Array ( [0] => Array ( [data] =>

This is not the first time I have ranted about command prompts, but I cannot help ranting about them whenever I saw them in source code. In short, a piece of source code with command prompts is like a bag of cooked shrimps in the market – it does not make sense, and an otherwise good thing is ruined. I like cooking raw shrimps (way more tasty).

The command prompt here refers to the characters you saw in commands as the prefix, which indicates “I’m ready to take your commands”. In a Unix shell, it is often $; in R, it is > by default. A shell example from the GTK web page:$ jhbuild bootstrap
$jhbuild build meta-gtk-osx-bootstrap$ jhbuild build meta-gtk-osx-core

And an R example from R and Data Mining:

> data("bodyfat", package = "mboost")
> str(bodyfat)

There are numerous examples like this on the web and in the books (sorry, GTK developers and Yanchang Zhao; I came to you randomly). Most people seem to post their code like this. I can probably spend half a day figuring out a problem in measure theory, but I cannot, even if I spend two years, figure out why people include command prompts when publishing source code.

Isn’t it too obvious that you are wasting the time of your readers?

Whenever the command prompts are present in the source code, the reader has to copy the code, remove the prompts, and use the code. Why there has to be an additional step of removing the prompts? Why cannot you make your code directly usable to other people?

jhbuild bootstrap
jhbuild build meta-gtk-osx-bootstrap
jhbuild build meta-gtk-osx-core
data("bodyfat", package = "mboost")
str(bodyfat)

I’m aware of the column selection mode in some editors. I just do not understand why the correct thing should not happen in the very beginning.

Some may argue the prompt helps typesetting, and it makes the code stand out because of the common prefix. In R, + means the last line is not complete, so > and + present the structure of the code, e.g.

> for (i in 1:10) {
+ print(i)
+ }

I believe the structure of code should be presented by the level of indentation, which does not hurt the source code. To make the code stand out, choose a background color (shading) for it. That is the only correct way for typesetting purposes.

So, please stop code pollution, and post usable code.

[attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) ) ) ) [17] => Array ( [data] => [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) [child] => Array ( [] => Array ( [title] => Array ( [0] => Array ( [data] => Find Out Available Usernames with R [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [link] => Array ( [0] => Array ( [data] => https://yihui.name/en/2013/01/find-out-available-usernames-with-r/ [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [pubDate] => Array ( [0] => Array ( [data] => Sat, 05 Jan 2013 00:00:00 +0000 [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [guid] => Array ( [0] => Array ( [data] => https://yihui.name/en/2013/01/find-out-available-usernames-with-r/ [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [description] => Array ( [0] => Array ( [data] =>

Update on 2013/01/05: Xiao Nan in the comments pointed out that apply(combn(letters, 2), 2, paste0, collapse = '') was wrong for all two-letter usernames, and indeed it was. It is not a combination problem. Now I use his elegant outer() solution. One can also use expand.grid(letters, letters).

Github decided to take off their downloads service, and I was very unhappy with this decision. This means I have to migrate several files to other places, and update links accordingly. I saw Bitbucket still provides the service, so I want to migrate my files there.

Sadly my name yihui was already taken on Bitbucket, so I hoped I could get a short name, which made me think how I could check the availability of a username via programming, and here was my solution with the RCurl package:

library(RCurl)
test_user = function(site = 'https://bitbucket.org/',
candidates = c(0:9, letters)) {
for (i in candidates) {
if (!url.exists(paste0(site, i))) message(i)
Sys.sleep(runif(1, 0, .1)) # be nice
}
}
# examples
test_user()
# two-letter names
test_user(candidates = as.vector(outer(letters, letters, 'paste0')))
# check github
test_user('https://github.com/')

As of the time of this blog post, there are no two-letter usernames left on Github, but some are still available on Bitbucket, e.g. by and eq, etc, and the number 4 is also available.

[attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) ) ) ) [18] => Array ( [data] => [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) [child] => Array ( [] => Array ( [title] => Array ( [0] => Array ( [data] => ICERM Reproducibility Workshop: Day 1 [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [link] => Array ( [0] => Array ( [data] => https://yihui.name/en/2012/12/icerm-reproducibility-workshop-day-1/ [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [pubDate] => Array ( [0] => Array ( [data] => Mon, 10 Dec 2012 00:00:00 +0000 [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [guid] => Array ( [0] => Array ( [data] => https://yihui.name/en/2012/12/icerm-reproducibility-workshop-day-1/ [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [description] => Array ( [0] => Array ( [data] =>

I’m attending a workshop on reproducibility at ICERM (Brown University) this week. I really appreciate this great opportunity offered by ICERM, Randy and Victoria.

It is pretty exciting to meet people that you only knew before through indirect ways. One coincidence was that I met Fernando here (for the first time)! We did not know each other before I wrote the IPython post back in November, and I did not expect that we would meet each other so soon. Anyway, it is great to see this extremely energetic guy in person. Nerdy as I am, I immediately asked him (did I even say hello?) how IPython saves and displays plots, and he quickly showed me on the whiteboard.

Some simple notes as bullet points:

• One big question about reproducible research is, where is the reward system? e.g. why should young researchers spend more time on making papers reproducible instead of publishing more papers? There seems to be no immediate reward (although some argue that papers with data/code available have higher citation rates)
• You should look at the Fortran code blackened out in Bill Rider’s slides; it is not that people do not want to share…
• Victoria has a nice historical review of reproducible research
• I learned MacTutor from Jon Borwein’s talk, which seems to be a good old website like Wikipedia (for Math)
• I enjoyed the talk by David Donoho most; they wrote a paper “Deterministic Matrices Matching the Phase Transitions of Gaussian Random Matrices”, and what is truly amazing is that this paper was done through “crowd sourcing”; guess who is the “crowd”? His Stat330/CME362 students (as well as the TA)! They used Dropbox, GIT, runmycode and clusters. The three big points:
• Math as science: mathmaticians should learn science and scientific publication (emphasis on publishing empirical results)
• Research as teaching: teaching can be turned into research; see the above paper
• Code development as science: I especially resonate with this point – code development actually has mature models and practices for a long time, which should be the ideal (or standard) paradigm of doing science; it is rare to see an open source software package published with one single final version; instead we often see versions (version control and semantic versioning) which mark the progress of the package, and we have a full history of how it was written, but papers almost always only have one version
• I learned HOL light from Tom Hales’ talk, which is a computer program for proving theorems (does it help me write my PhD thesis?)
• David Bailey talked about High-Precision Computation and Reproducibility; I’m not familiar with this area but the talk is very interesting, e.g. a change in the float-point library can lead to different observations of particles in physics (some particles might have “gone” after you replace the library); I did not realize numeric precision has such a profound influence

Keep an eye on the workshop website if you are interested.

BTW, to follow up David’s crowd sourcing, my advisor Di Cook did something similar earlier this year but less seriously: the students who took Stat585 at Iowa State collaborated on Github for a fiction in statistics when we were learning GIT in that class, which was actually a lot of fun…

[attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) ) ) ) [19] => Array ( [data] => [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) [child] => Array ( [] => Array ( [title] => Array ( [0] => Array ( [data] => IPython vs knitr, or Python vs R [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [link] => Array ( [0] => Array ( [data] => https://yihui.name/en/2012/11/ipython-vs-knitr/ [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [pubDate] => Array ( [0] => Array ( [data] => Fri, 23 Nov 2012 00:00:00 +0000 [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [guid] => Array ( [0] => Array ( [data] => https://yihui.name/en/2012/11/ipython-vs-knitr/ [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [description] => Array ( [0] => Array ( [data] =>

I watched this video by Fernando Pérez a few days ago when I was reading a comment by James Correia Jr on Simply Statistics:

This is absolutely a fantastic talk that I recommend everybody to watch (it is good in both the form and content). Not surprisingly, I started thinking ipython vs knitr. Corey Chivers said we could learn a lot from each other, and that is definitely true on my side, since ipython is so powerful now.

## ipython and knitr

I did not take a close look at ipython when I was designing knitr because I’m still at the “hello-world” level in Python, and I did not realize until I watched the video that we ended up with some common features like:

• support to Markdown/MathJax (to be fair, MathJax is RStudio and markdown instead of knitr’s contribution), not to mention HTML and LaTeX
• multi-language integration: *magic in ipython (rmagic, octavemagic, …) vs engines in knitr (python, ruby, bash, C++, …)
• support to D3 (knitr example; I did not find a live example with ipython at the moment)

Obviously knitr is still much weaker than ipython in many aspects, but some aspects do not really hurt; for example, the user interface. IPython enables one to write in the browser, which looks cool and is indeed useful. We (Carlos, Simon and I) had a similar attempt called RCloud in the summer this year when I was doing intern at AT&T Labs, which was a combination of Rserve, FastRWeb, knitr, Markdown and a bunch of JS libraries. The user interface is pretty much like ipython; in fact, it was inspired by ipython.

The RCloud project is not completely done yet, but I believe RStudio has done a fair amount of work to make the user interface more friendly, so I’m not terribly worried.

## Python community

That being said, I felt overwhelmed when I saw the Emacs client for ipython in the talk. On one hand,

(I wrote that by Google translation; not sure if it is accurate; I mean in terms of nationality, Japanese programmers have surprised me most; examples include the Ruby language, Kohske Takahashi and kaz_yos)

On the other hand, the R community is still too small compared to Python. I have been looking forward to the R Markdown support in Emacs/ESS. The infrastructure on R’s side has been ready for quite a while. ESS developers have been working hard, but we just need more force to spin R to a higher level in a more timely fashion (embrace the web server, EC2, D3, web sockets, Julia and all the cool stuff; not only generalized linear models).

## R community

Small as it is, the R community is also moving to interesting directions. I especially agree with Jeff Horner on his recent post Innovation in Statistical Computing that RStudio has been making remarkable and innovative contributions to the R community. I think one thing important is that RStudio developers are not statisticians like R core. The R community absolutely needs this kind of fresh power: good sense of user interface, good knowledge of modern computing technologies and most important of all, good project/product managers.

The shiny package is yet another example besides what were mentioned in Jeff’s post. I think it is interesting to compare shiny with Rserve, FastRWeb, gWidgetsWWW(2), rApache, Rook and older packages like CGIwithR. From the technical point of view, each package in the latter group may be more complicated than shiny (Simon, Jeff and John are extremely smart guys), but apparently shiny has become the Gangam style of R web apps. Most users will not, nor do they, care about the technology behind the package. A developer may feel unfair that the user only sees the nice twitter bootstrap style, without noticing the websockets, but that is just the fact.

I always regard R as a general computing language instead of for statistics only. We need more geeks in the R community both to understand non-statistical technologies such as Emacs Lisp, JavaScript and Httpd, etc, and to connect them to certain aspects of (computational) science such as reproducible research.

## Misc

When I saw the 3D barplot in the talk, I feel R graphics will be able to survive longer for a while.

[attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) ) ) ) [20] => Array ( [data] => [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) [child] => Array ( [] => Array ( [title] => Array ( [0] => Array ( [data] => Can We Live Without Backslashes? [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [link] => Array ( [0] => Array ( [data] => https://yihui.name/en/2012/10/lyx-vs-latex/ [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [pubDate] => Array ( [0] => Array ( [data] => Tue, 30 Oct 2012 00:00:00 +0000 [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [guid] => Array ( [0] => Array ( [data] => https://yihui.name/en/2012/10/lyx-vs-latex/ [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [description] => Array ( [0] => Array ( [data] =>

Two months ago there was a discussion in the ESS mailing list about Emacs/ESS started by Paul Johnson, who claimed “Emacs Has No Learning Curve”. While this sounds impossible, he really has some good points, e.g. he encourages beginners to look at the menu. I think this is a good way to learn Emacs. In the very beginning, who can remember C-x C-f? If one finds it hard even to open a file, how is one supposed to move on? (I taught my wife Emacs/ESS, unfortunately, in the wrong way)

Then I went off-topic by discussing Emacs vs LyX as the editor for LaTeX, and I quoted an example by Chris Menzel. This is LyX:

And this is Emacs:

The difference is obvious. I find it difficult to understand why there are so many people who enjoy reading LaTeX code, which makes my eyes bleed. My experience of promoting LyX was often unsuccessful, and I feel some people were born to hate GUI. Paul Thompson, the last person in that discussion thread, wrote these final comments:

There is no such thing as LaTeX without \

Sorry, folks, but that whole idea is whack. Totally whack. It’s like onions without tears. Onions have good qualities, but there is a cost. LaTeX has good qualities. The cost is \.

If you want formatting without \, use Word. You don’t see a single \ but you don’t see anything of any value.

The whole POINT of a markup language is to see the markup. If you don’t want to see the markup, use a WYSIWYG editor like Word.

Editors which do LaTeX hiding the markup are just another WYSIWYG 3rd order approximation.

Frankly speaking, I agree with none of the above comments. LaTeX is excellent, but that does not mean I must read the backslashes for the rest of my life. LyX has done a wonderful job of hiding \ while giving us full power of LaTeX. If there is anything that cannot be done with GUI in LyX, we always have the ERT (Evil Red Text), i.e. we can input raw LaTeX code in LyX (Ctrl + L). LyX gives us a human-readable interface to LaTeX without sacrificing anything. The whole point of a markup language is not to read the source. If we are to see anything, we see the output. For example, we read a web page instead of its HTML code.

It is undeniable that sometimes we need to read the source code for other purposes such as debugging. LyX has a menu View => View Source to allow one to view the LaTeX source of the current document, which is something I often do. I appreciate the importance of source code, but LaTeX is an exception in my eyes. With LyX, I do not have to start every single LaTeX document with \documentclass{} and remember all the gory commands about the geometry or hyperref package (top margin, inner margin, outer margin, page width, bookmarks, etc etc). That is why I managed to write the documentation of my R packages quickly (e.g. knitr, formatR and Rd2roxygen, etc).

Finally, I do not recommend LaTeX novices to try LyX. It is a bad idea in general to use GUI if you do not really understand it.

I know this promotion for LyX is, as usual, not going to help. People simply walk by.

Image via. Also see Changing just a tiny little bit in my LaTeX tabular.

[attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) ) ) ) [21] => Array ( [data] => [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) [child] => Array ( [] => Array ( [title] => Array ( [0] => Array ( [data] => Build Static HTML Help Pages for R Packages [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [link] => Array ( [0] => Array ( [data] => https://yihui.name/en/2012/10/build-static-html-help/ [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [pubDate] => Array ( [0] => Array ( [data] => Mon, 29 Oct 2012 00:00:00 +0000 [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [guid] => Array ( [0] => Array ( [data] => https://yihui.name/en/2012/10/build-static-html-help/ [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [description] => Array ( [0] => Array ( [data] =>

Many R users may still remember the good old days when we had static HTML documentation for R packages. That was probably before R 2.10.0 (in 2009). Then we had the fancy dynamic HTML help based on the built-in httpd server, but it has never really made much sense to me. Ever since then, I have been feeling uncomfortable about writing package documentation, because each time I update my documentation and reinstall my package, I have to quit R and restart it to see the new documentation, otherwise I will have to see this cryptic error in my web browser if I stay in the same R session:

Error in fetch(key) : internal error -3 in R_decompress1

This is how I feel about the dynamic help system in R:

The greatest potential of the httpd server, in my opinion, is to build real dynamic documentation for R, which is something like Sweave, e.g. run examples in real time. But this is not yet done (a design has been there for a long time, though). Anyway, it is super cool to see the package demos run in the web browser when we click the links in help.start().

## Static HTML Help

I do not often need dynamic help; I just want static HTML pages so I do not have to open R each time I only want to look at the documentation of one function. Below is a function to build HTML documentation pages for one package using Rd2HTML():

# for one package
pkgRdDB = tools:::fetchRdDB(file.path(find.package(pkg), 'help', pkg))
for (p in topics) {
tools::Rd2HTML(pkgRdDB[[p]], paste(p, 'html', sep = '.'),
}
}

It is easy to build the HTML documentation for all packages based on the above function:

# for all packages
static_help_all = function() {
owd = getwd(); on.exit(setwd(owd))
for (p in .packages(TRUE)) {
message('* Making static html help pages for ', p)
setwd(system.file('html', package = p))
}
}

## Rebuild R Help with knitr

That is still not the ideal format to me. R help should have been much more appealing and useful, so here comes knit_rd() in the knitr package, which is my attempt to rebuild HTML help pages for R, but with the examples compiled as well. It is pretty much based on static_help() above, and one additional step is to extract the examples code and run it.

My friend Tengfei Yin built his documentation for ggbio with this approach, and you can see the pages here: http://tengfei.github.com/ggbio/docs/man/.

In the development version of knitr, I also added knit_rd_all() to build static html pages for all packages. Once we have static pages, we can add them to bookmarks of our web browser, or publish them to the web (like Tengfei).

Experts would recommend us to build R from source, with the --enable-prebuilt-html option. That should be the fundamental solution. As Martin said, that is not a bad exercise after all.

## Staticdocs

Hadley’s staticdocs package has taken a much further step in this direction. It has a cool style based on bootstrap (it has nothing to do with resampling). See the new ggplot2 documentation website for a fantastic example.

[attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) ) ) ) [22] => Array ( [data] => [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) [child] => Array ( [] => Array ( [title] => Array ( [0] => Array ( [data] => Visiting FHCRC, JHSPH and Meeting Xi'an [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [link] => Array ( [0] => Array ( [data] => https://yihui.name/en/2012/10/visiting/ [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [pubDate] => Array ( [0] => Array ( [data] => Mon, 29 Oct 2012 00:00:00 +0000 [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [guid] => Array ( [0] => Array ( [data] => https://yihui.name/en/2012/10/visiting/ [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [description] => Array ( [0] => Array ( [data] =>

I have been traveling during the last two weeks. I visited Fred Hutchinson Cancer Research Center on Oct 16 and the Department of Biostatistics at Johns Hopkins at the invitation of Simply Statistics on Oct 23. Today Christian Robert was visiting our department at Iowa State, and I also talked to him. It is really cool to talk to those guys, and here are some quick notes.

## FHCRC and Bioconductor

I went to Seattle for a tutorial at Visweek. Since I did not have a talk there, I decided to pay a visit to Fred Hutchinson and Bioconductor. This place has a nice view of the Lake Union:

These pictures were taken from the Space Needle later instead of Hutch…

I met Raphael Gottardo (and Mike Jiang in his lab), Dan Tenenbaum and Zach Stednick; Bioc people were busy at that time with a course, so I did not manage to meet more. My biology has gone back to the kindergarten level after I went to college; it seems only four letters remained in my brain: ATCG (I kind of remember chromosome). So it was hopeless to talk about biology.

The real reason I visited them was all of them more or less tried my knitr package, which apparently excited me. Through the visit, I saw Sweave was still a challenge for them to switch to knitr, because R CMD build heavily relies on Sweave to build package vignettes. That is why I posted this wishlist to R core after I came back, hoping alternative packages could be allowed for building package vignettes. FWIW, Sweave was a great invention, but we can just do better. Perhaps Bioconductor can be a good place to experiment with better R documentation like this and this (Markdown/HTML vignettes).

Coincidently, I learned Zach was one of the organizers of Seattle useR Group, and also the student of Roger Peng, whom I would be visiting in the following week.

When I was just about to leave, John Ramey saw me on his way to get a coffee. My goodness, I totally forgot he was at Hutch as well. John was an early user of knitr, and I caught him at JSM this year with Ryan Rosario. Since I had to leave to meet Joe (RStudio), I invited him for dinner in the evening, where we had a good chat together with Tengfei Yin.

## RStudio

RStudio has been developing a cool package for building interactive (or precisely, reactive) web interfaces with R, but I probably should not write too much about it since it is still in early beta stage at this point. JJ and Joe are really awesome developers. It is amazing to see how quickly they adapted to reference classes in R.

Besides, they are also excellent listeners and make use of every chance to get the feedback from users. I did not know Joe was in Seattle, and he just quickly came to me.

## Johns Hopkins

I was thrilled when I received an invitation from Simply Statistics for a visit to the Biostatistics Department at Johns Hopkins. I started following their blog since the Duke Saga, and I was shocked by the story: non-reproducible research kills people, literally!

This was probably the best visit I had ever had. I like everybody there. I agree with so many thoughts of the three authors behind Simply Statistics, Roger, Jeff (aka Leekasso) and Rafa, such as the skepticism on measure theory, the fast journal, the deterministic machine, the point of point estimate, etc. Sometimes I want to yell at them because they are ruining my PhD – I need time to finish my thesis instead of thinking of all kinds of “weird” ideas. Well, I’m kidding :)

I have been asked for quite a few times about my plan for future career. Initially I thought I was doomed for academia, since I really do not care much about measure theory. It is a PhD core course at Iowa State like most other departments in the US, and we have super professors on this course. I did a stupid thing when I was learning this course. I asked the professor this:

Have you ever seen a set which is not measurable in mathematical statistics?

Apparently I was bored from the very beginning when we were trying to construct a non-measurable set. I did not understand why every statistics PhD has to start with measurable sets. I mean that seems to be too far away from statistics, at which stage we take many things “for granted”. We assume measurable functions and smoothness up to a certain order of derivatives; R-N theorem is beautiful, but we rarely go beyond a limited number of density functions in practice, neither do we go beyond Lebesgue and counting measures. We use the Dirichlet function to show Riemann integration does not work but it has a Lebesgue integral, but I see no point in such a weird function; for the most of my time, I still use the rules for Riemann integration.

So narrow-minded as I am, will I be able to survive the academia? Before my visit, I thought the answer would be definitely no. Kasper Hansen kind of convinced me this might not be correct. Doing theoretical work is one way to go, but it does not mean doing computing is hopeless. That said, it is not enough to do general computing; one may have to find a very specific direction, e.g. work on very specific but also very challenging type of data.

### 2. The talk

I gave an informal talk titled “Can you reproduce your homework?” primarily to students and post-docs there. I got the hilarious website Research in Progress from Simply Statistics just a few days ago, and I immediately used some funny GIF images in my HTML5 slides. They fit pretty well with my talk.

### 3. Behind the tan door

This is my only regret of the visit. Damn it! I should have taken pictures with the great comedians in the department. I did not realize the video “Behind the tan door: getting ahead in academia” was made at JHSPH until I came back. I watched it in January last year, and I loved the humor so much that I watched it again for several times after that, so I’m very familiar with the faces of the actors. That is why I immediately recognized Tom Louis when he passed my door, and I told Roger I knew that face. He was surprised that I knew that video.

Later that day I saw Scott Zeger twice. His face was even more familiar to me; I can never forget his hilarious logistic regression model on promotion of young faculties in that video.

Last year I emailed Steve Goodman (who was behind the camera in the video) and asked him what “tan door” means. I learned tan was the color of the doors at JHSPH, and there was a well-known movie in the early years with a similar name (just like he did not dare tell me, I do not dare tell you here). At that time I noticed Steve was at JHSPH, but obviously I forgot it later. Anyway, here is the video that I highly recommend (with permission to re-publish):

### 4. Point estimate has no point

I’m exaggerating a little bit. Someone told me that PowerPoint has neither power nor point, and similarly I think point estimate alone has no point. This is not part of the visit; Simply Statistics often has a good point. The latest blog post on Nate Silver was such a good one. The problem is the general public only care about the point estimate in most time, and ignore the uncertainty associated with the estimate. In theory, we should take a look at both.

### 5. Reproducible research

In fact I’m not very confident with talking about this big topic; as I said in my slides, it is deep. During the visit, they brought up a few challenging problems in practice, and what I have been doing is smallish things such as homework and websites. I’m not sure how things scale to Gigabytes of complex data, and I really need to know more about problems that data practitioners are facing. I’ll announce a website that implements my fast journal idea in the next few days.

I just got a chance to attend the workshop Reproducibility in Computational and Experimental Mathematics to be held at ICERM (Brown University), which seems to be a great opportunity to learn more about broader issues about reproducible research.

## Xi’an (Christian)

The blog of Christian Robert was named as Xi’an’s OG, which was the reason why it caught my attention long time ago. Xi’an happens to be a big city in China, but it has nothing to do with Christian. Obviously it has confused enough Chinese people. He told me he was also confused by the road sign PED XING like me. It took me like two years to figure out what XING means, and Americans may never understand why Chinese are confused by this: XING happens to be the Chinese Pinyin of the character “Walk” (行). We are confused just because we have no idea why Americans put a road sign in Chinese!!

I showed him my work on the R packages, and he kindly promoted them in his blog later. It also surprised me that he does not use BUGS as a Bayesian, and he is the first person I have ever met who used TeX instead of LaTeX in the early years. I know there must be plenty of people; they just did not tell me.

He wanted to be able to run R code inside the slides, and one of my ideas was to use John Verzani’s gWidgetsWWW2 package. It will certainly work, as I saw John demonstrated at useR! 2012 (he made his slides with this package). I’m wondering if someone could wrap up a package specially for creating slides with gWidgetsWWW2.

I asked him of his opinion on a PhD thesis based on R packages instead of theorems. I was happy to hear him say “Well, you know, there are theorems in theses which are never used…”. I hope my thesis committee will agree with him.

Ah, a long blog post… If there is a take home message, it will be something I have said for many times: you need a website; it really helps.

[attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) ) ) ) [23] => Array ( [data] => [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) [child] => Array ( [] => Array ( [title] => Array ( [0] => Array ( [data] => A Quick Note On Large 2D Data [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [link] => Array ( [0] => Array ( [data] => https://yihui.name/en/2012/10/on-large-2d-data/ [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [pubDate] => Array ( [0] => Array ( [data] => Wed, 03 Oct 2012 00:00:00 +0000 [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [guid] => Array ( [0] => Array ( [data] => https://yihui.name/en/2012/10/on-large-2d-data/ [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [description] => Array ( [0] => Array ( [data] =>

Two months ago I was told one of my old blog posts was borrowed to this post: Finding patterns in big data with SAS/GRAPH. I wrote my blog post four years ago just for fun. The over-plotting issue is pretty boring to me now, but what caught my attention were:

1. alpha-transparency was a new feature in SAS 9.3 (I found this hard to believe);
2. the SAS code was so much longer (not surprising to me);

Well, I do not care and do not intend to start the boring war. If I were to write that post again, I would add the hexagon plot as yet another alternative.

library(hexbin)
with(x, plot(hexbin(V1, V2)))

Two personal announcements:

1. I’ll be attending the Infovis Week at Seattle from Oct 14 to 17;
2. I’ll pay a visit to the Department of Biostatistics at Johns Hopkins from Oct 23 to 24;

I’m happy to meet people nearby during these days. My Seattle schedule is pretty much empty; just let me know.

[attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) ) ) ) [24] => Array ( [data] => [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) [child] => Array ( [] => Array ( [title] => Array ( [0] => Array ( [data] => R Package Vignettes with Markdown [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [link] => Array ( [0] => Array ( [data] => https://yihui.name/en/2012/09/r-package-markdown-vignettes/ [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [pubDate] => Array ( [0] => Array ( [data] => Mon, 10 Sep 2012 00:00:00 +0000 [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [guid] => Array ( [0] => Array ( [data] => https://yihui.name/en/2012/09/r-package-markdown-vignettes/ [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [description] => Array ( [0] => Array ( [data] =>

Note: since R 3.0.0, you can build Markdown vignettes naturally; see https://yihui.name/knitr/demo/vignette/ for details.

What is the best resource to learn an R package? Many R users know the almighty question mark ? in R. For example, type ?lm and you will see the documentation of the function lm. If you know nothing about a package, you can take a look at the HTML help by

help.start()

where you can find the complete list of documentation by clicking the link Packages. The individual help pages are often boring and difficult to read, because you cannot see the whole picture of the elephant. That is where package vignettes can help a lot. A package vignette is like a short paper (in fact some are real journal papers), which gives you an overview of this package, and sometimes with examples. Package vignettes are not a required component of an R package, so you may not find them in all packages. For those packages which contain vignettes, you can find them by browseVignettes(), e.g. for the knitr package

browseVignettes(package = 'knitr')
# or go to
system.file('doc', package = 'knitr')

You can also see links to vignettes from help.start(): click Packages and go to the package documentation, or

help.start()
browseURL(paste0('http://127.0.0.1:', tools:::httpdPort(),
'/library/knitr/doc/index.html'))

Most vignettes are written in LaTeX/Sweave since that is the official approach (see Writing R Extensions). In the past Google Summer of Code, Taiyun Wei explored a few interesting directions of the knitr package, and one of them was to build HTML vignettes for R packages from Markdown, which is much easier to write than LaTeX.

For package authors who are interested, Taiyun’s corrplot package (on Github) can serve as an example. The markdown vignette is vignettes/corrplot-intro.Rmd. When you run R CMD build corrplot, corrplot-intro.Rmd will be converted to corrplot-intro.html, which you can view in help.start() after R CMD INSTALL corrplot_*.tar.gz.

Once you have this HTML vignette, you can also publish it elsewhere. For example, either RPubs.com or GitHub pages to gain more publicity (see an example of the phyloseq package). It is important to let users be aware of package vignettes, and a web link is apparently easier to tell other people than browseVignettes() (I felt very uncomfortable when I was writing the first half of this post because the vignettes are hidden so deep, hence so hard to describe).

So why not start building an HTML vignette for your package with R Markdown now? Think about animations, interactive content (e.g. googleVis), MathJax equations and other exciting web stuff.

[attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) ) ) ) [25] => Array ( [data] => [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) [child] => Array ( [] => Array ( [title] => Array ( [0] => Array ( [data] => For the Stupid Password Rules at Iowa State [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [link] => Array ( [0] => Array ( [data] => https://yihui.name/en/2012/08/stupid-iastate-password-rules/ [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [pubDate] => Array ( [0] => Array ( [data] => Tue, 07 Aug 2012 00:00:00 +0000 [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [guid] => Array ( [0] => Array ( [data] => https://yihui.name/en/2012/08/stupid-iastate-password-rules/ [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [description] => Array ( [0] => Array ( [data] =>

The Fall semester is coming, which means it is time to log into several stupid systems to be prepared for the new semester. Time and time again I’m annoyed by the bullshit password rules at Iowa State University. I wrote to the IT staff once but no one ever responded. Here are their rules:

• Must be 8 characters in length
• Must contain at least one letter (a-z, A-Z) or a supported special character (@, #, $only). All letters are case sensitive. • Must contain at least one number (0-9) • Cannot contain spaces or unsupported special characters • Cannot be re-used after a password expires OK. Exactly 8 characters. No more. No less. If you are a hacker, you must not try other possibilities. What the bloody hell makes it difficult to support characters other than @, #,$? Just like “All men were created equal, and some are more equal than others”, it seems for our dear IT people, some special characters are more special?

At least one number. You ban most special characters and force us to use at least a number to make the password harder to guess. Thank you.

Cannot contain spaces. You must be bitten hard by spaces. I hope you feel better now.

The last rule finally managed to make me mad, because the password seems to expire randomly (higher probability before a new semester). I have set different new passwords for about 8 times after they expired one after another. I’m exhausted and have ran out of imagination now. What is worse, I often forget which password I’m using this time because I have 8 combinations in mind. After you are wrong for 3 times, you will not be allowed to try further.

I do not want to write down passwords somewhere, but now I have to, otherwise I’ll be desparate next semester again. Of course I will not write it down in plain text. There are many password management software packages. The only problem is I have to keep the password database somewhere accessible, and I may not be able to open it on other computers.

Anyway, here is my new password:

x1 = c(letters, LETTERS) # one letter
x2 = c('@', '#', '$') # one special char x3 = 0:9 # one number # 8 characters p = c(sample(x1, 1), sample(x2, 1), sample(x3, 1), sample(c(x1, x2, x3), 5)) cat(paste(sample(p), collapse = ''), '\n') I’ll probably leave this as an exercise to Stat579 students in the coming semester. (via xkcd) [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) ) ) ) [26] => Array ( [data] => [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) [child] => Array ( [] => Array ( [title] => Array ( [0] => Array ( [data] => Making Reproducible Research Enjoyable [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [link] => Array ( [0] => Array ( [data] => https://yihui.name/en/2012/06/enjoyable-reproducible-research/ [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [pubDate] => Array ( [0] => Array ( [data] => Mon, 25 Jun 2012 00:00:00 +0000 [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [guid] => Array ( [0] => Array ( [data] => https://yihui.name/en/2012/06/enjoyable-reproducible-research/ [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [description] => Array ( [0] => Array ( [data] => Note: this is a contributed article for the ICSA Bulletin and the basic idea can be summarized in this picture. It is hard to convince people to think about reproducible research (RR). There are two parts of difficulties: (1) tools used to be for experts only and (2) it is still common practice to copy and paste. For some statisticians, RR is almost equivalent to Sweave (R + LaTeX). I love LaTeX, but LaTeX is still hell to many people. I had an experience of teaching Sweave in a stat-computing class at Iowa State University, and I can tell you their horrible faces after I taught them LaTeX in the first half of the class. I will never do that again. But RR is really important. I recommend you to watch this video if you have not heard of the Deception at Duke to see how improper data processing killed patients: http://www.cbsnews.com/video/watch/?id=7398476n, then you should be guilty when you copy and paste as a statistician. I fully respect the seminal work of Sweave, but in my eyes, it is really a half-done project which did not make much progress in the past few years. I suggested a few features to the R core team, which were often rejected. I understand R is too big to make substantial changes. As a useR, you always have the right to vote by packages, so I wrote the knitr package to fully implement what I thought would be a good engine for RR with R. The basic idea was the same: to mix code and text together, then compile the whole document with code being executed, and you get a report without copying/pasting anything since the code will faithfully give you results. The design was very different from Sweave: knitr is not restricted to a specific format like LaTeX; any output format is possible, including HTML, Markdown and reStructuredText. I will ignore LaTeX in this article, although it took me much more time to work on than other formats. I use Github extensively and learned Markdown there. For those who are not familiar with Markdown, it is an extremely simple language and you can learn it in five minutes at most: http://en.wikipedia.org/wiki/Markdown. It was almost trivial for me to add support for Markdown in knitr, so we can mix R code and Markdown text together and compile reports quickly. That was the beginning of the story. Later RStudio (the IDE of R) saw the work of knitr and decided to add support to it. First we finished the work with Sweave documents, which was painful but rewarding (well, that is LaTeX!). Before that I had finished adding the knitr support in LyX – an excellent front-end of LaTeX, and RR became enjoyable somehow, but only enjoyable for me and perhaps also some other LyX users. We could write LaTeX easily and click the button to get a PDF report from LyX, which was quite handy (https://yihui.name/knitr/demo/lyx/). After the Sweave work was done, I suggested Markdown to RStudio developers, and fortunately they listened. The progress was fast; soon we had a format named R Markdown in RStudio. That was when I believed RR became accessible to the general audience. And suddenly a golden glow descended on me, and all my sins were washed away… Many people seem to have been waiting for a simple format like R Markdown for a long time. The only thing you need to do for a reproducible report is to write code and text. When you write in LaTeX, there are tons of rules to remember like which characters need to be escaped, or how to write a backslash or tilde, whereas in Markdown, you feel like writing emails. JJ Allaire (one of the RStudio authors) and I were invited to give a talk at useR! 2012 on RR a few days ago, and we successfully convinced quite a few people to RR and R Markdown. One of my points was that RR should be made enjoyable. If people suffer from tools all the time, there is no hope for RR to become the common practice. To ask people to go to the right way, we just need to make the right way easier than the wrong way (one smart guy in the audience said this after we gave a talk to the Twin Cities R User Group). Chris Fonnesbeck, an instructor in Biostatistics at Vanderbilt University, decided to completely ban Word documents in his Bios301 this Fall. I admire his courage, and I am evil to be happy to see Word die, but I will be happier if the students can see why Word sucks and how knitr/RStudio/R Markdown can make things much easier and more beautiful. As I proposed at useR! 2012, we should really start to train students to do their homework assignments in a reproducible manner before they do research in the future. This is not hard now. Kevin Coombes and Keith Baggerly are the two heroes (and detectives) who revealed the Duke scandal, which I mentioned before. They have been trying to promote Sweave, and I was thrilled at useR! 2012 that Kevin used one slide to introduce knitr in his invited talk. I was also excited when Keith told me R Markdown was cool and he was going to use it in his reports. There are many other features in knitr which make RR enjoyable. For example, code is highlighted by default so that plain text will not become pain text; for users who do not care about coding styles, their code will be automatically reformatted with the formatR package to make ugly code more readable (Martin Maechler does not like this but he is an R expert and knows how to format R code); figures will never exceed the page margin in LaTeX output; you do not have to use dirty tricks in order to get multiple figures per chunk; … In all, we get beautiful reports by default, although the beauty here is highly opinionated. It is always enjoyable when we can embrace the web, where we have lots of fancy technologies. Markdown can be easily translated into HTML, so we can build web applications with knitr as well. Two examples: 1. Rpubs.com: you can publish your reports to this website (hosted by RStudio) freely from RStudio, and you can see there have already been a couple of nice reports (just forget about emailing ugly Word documents back and forth) 2. An OpenCPU demo: http://public.opencpu.org/apps/knitr/ (you do not need anything but a web browser, then you can compile a report in the cloud) You can see what other people have been doing with knitr at https://yihui.name/knitr/demo/showcase/. Let’s stop the old habit of copy and paste. Let the code speak, and in code we trust. [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) ) ) ) [27] => Array ( [data] => [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) [child] => Array ( [] => Array ( [title] => Array ( [0] => Array ( [data] => Learn formatR in Two Minutes [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [link] => Array ( [0] => Array ( [data] => https://yihui.name/en/2012/05/a-formatr-video/ [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [pubDate] => Array ( [0] => Array ( [data] => Tue, 08 May 2012 00:00:00 +0000 [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [guid] => Array ( [0] => Array ( [data] => https://yihui.name/en/2012/05/a-formatr-video/ [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [description] => Array ( [0] => Array ( [data] => Anthony made a video tutorial on how to use the formatR package, which I think is pretty cool: I wish I could speak English as fast as him… [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) ) ) ) [28] => Array ( [data] => [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) [child] => Array ( [] => Array ( [title] => Array ( [0] => Array ( [data] => How to Make HTML5 Slides with knitr [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [link] => Array ( [0] => Array ( [data] => https://yihui.name/en/2012/05/how-to-make-html5-slides-with-knitr/ [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [pubDate] => Array ( [0] => Array ( [data] => Tue, 01 May 2012 00:00:00 +0000 [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [guid] => Array ( [0] => Array ( [data] => https://yihui.name/en/2012/05/how-to-make-html5-slides-with-knitr/ [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [description] => Array ( [0] => Array ( [data] => One week ago I made an early announcement about the markdown support in the knitr package and RStudio, and now the version 0.5 of knitr is on CRAN, so I’m back to show you how I made the HTML5 slides. For those who are not familiar with markdown, you may read the traditional documentation, but RStudio has a quicker reference (see below). The problem with markdown is that the original invention seems to be too simple, so quite a few variants were derived later (e.g. to support tables); that is another story, and you do not need to worry much about it. Before you get started, make sure your knitr version is at least 0.5: # install.packages(c('knitr', 'XML', 'RCurl')) update.packages(ask = FALSE) packageVersion('knitr') >= 0.5 ## Editor: RStudio You need to install the RStudio preview version to use its new features on markdown support. With this version, you will see an interface like this when you create an R markdown file (File --> New --> R Markdown): The button MD in the toolbar shows a quick reference of the markdown syntax, which I believe you can learn in 3 minutes. To start with, you can use my example on Github: knitr-slides.Rmd, or quickly cook up your own by Ctrl + Shift + I to insert code chunks. You can write headers with # and bullet points with -. It is both quick to write and easy to remember (and readable too). When you are done, just hit the button Knit HTML, and you get a nice HTML page showing you R code and the output. You do not have to learn LaTeX in order to step into the realm of reproducible research. (Did you see the Binomial pmf there?!) ## Converter: Pandoc What happens behind the scene is that RStudio calls knitr to compile the Rmd document to a markdown file (you can see it under the same directory as the Rmd file), and convert this file to HTML. This is a very nice feature, and we can actually go further. Pandoc claims itself to be a universal document converter, and it is indeed very powerful. For the above example, we can convert the markdown output (not Rmd source) to many other formats like HTML, LaTeX, Open Office or Microsoft Office documents. HTML5 slides are also supported. This is the single command that I used to convert knitr-slides.md to DZslides: pandoc -s -S -i -t dzslides --mathjax knitr-slides.md -o knitr-slides.html Then you get an HTML file knitr-slides.html which you can view in a modern web browser. Enjoy. ## Final words HTML5 slides is just one tiny thing that you can play with markdown; check out the pandoc documentation to see more possibilities. That being said, I feel most excited about the RStudio integration with knitr and markdown. LaTeX is beautiful but difficult to learn and laborious to write. MS Word is most widely used but you know… I believe this combination makes reproducible research much more accessible to the general audience, and I hope to see it being used in statistical courses so that students no longer do tedious jobs of copy & paste, and professors no longer suffer from ugly Word reports. Now I have done pretty much what I planned in the beginning. The next step will be our GSoC project, in which we will make the toolchain smoother, and work out better ways for R users to document packages and publish web pages (e.g. blogging like a hacker). If you want to follow our latest changes, you may And final ads: I will be presenting knitr at useR! 2012 with JJ from RStudio. I’m looking forward to meeting more knitters in Nashville :) [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) ) ) ) [29] => Array ( [data] => [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) [child] => Array ( [] => Array ( [title] => Array ( [0] => Array ( [data] => Fancy HTML5 Slides with knitr and pandoc [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [link] => Array ( [0] => Array ( [data] => https://yihui.name/en/2012/04/fancy-html5-slides-with-knitr-and-pandoc/ [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [pubDate] => Array ( [0] => Array ( [data] => Sun, 22 Apr 2012 00:00:00 +0000 [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [guid] => Array ( [0] => Array ( [data] => https://yihui.name/en/2012/04/fancy-html5-slides-with-knitr-and-pandoc/ [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [description] => Array ( [0] => Array ( [data] => Karthik Ram gave an Introduction to R a couple of weeks ago, and I strongly recommend you to take a look at his cool HTML5 slides. I started trying HTML5 slides last year, and now it is difficult for me to go back to beamer, which I have used for a few years for my presenations. It is horrible to see beamer slides everywhere at academic conferences (especially the classic blue themes). You probably have heard of an interesting blog post by Ben Schmidt about ocean shipping animations in the 18th and 19th centuries. I also played with the dataset a little bit, and made some slides named Voyages of Sinbad the Sailor (use Left/Right or Up/Down to navigate). The source file was written in markdown, compiled by knitr, then converted to DZSlides by pandoc. I’m using the development version of knitr, which you can install from Github. I plan to release the version 0.5 this weekend, and this version will particularly feature the markdown support. You can always read the NEWS file to know what is going on in the development. Another piece of news which may be a little bit early to announce is the corresponding support in RStudio. I’m not going to say any details about it right now, but I’m pretty sure the so-called reproducible research and dynamic report generation can be easier than ever very soon! No LaTeX. No worries about HTML/CSS. A simple text file and a single click will give your a reasonably beautiful HTML page. Stay tuned. [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) ) ) ) ) ) [http://www.w3.org/2005/Atom] => Array ( [link] => Array ( [0] => Array ( [data] => [attribs] => Array ( [] => Array ( [href] => https://yihui.name/en/index.xml [rel] => self [type] => application/rss+xml ) ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) ) ) ) ) ) ) ) ) ) ) [type] => 128 [headers] => Array ( [accept-ranges] => bytes [cache-control] => public, max-age=0, must-revalidate [content-length] => 178815 [content-type] => application/xml [date] => Sun, 23 Apr 2017 03:18:43 GMT [etag] => "d145a4083781742a21ba054967cfbb70c6b26cf2-ssl" [strict-transport-security] => max-age=31536000 [age] => 9022 [server] => Netlify ) [build] => 20111015034325 [items] => Array ( [0] => FeedWordPie_Item Object ( [feed] => FeedWordPie Object *RECURSION* [data] => Array ( [data] => [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) [child] => Array ( [] => Array ( [title] => Array ( [0] => Array ( [data] => Some Facts about Jeff Leek [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [link] => Array ( [0] => Array ( [data] => https://yihui.name/en/2017/04/jeff-leek-facts/ [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [pubDate] => Array ( [0] => Array ( [data] => Fri, 14 Apr 2017 00:00:00 +0000 [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [guid] => Array ( [0] => Array ( [data] => https://yihui.name/en/2017/04/jeff-leek-facts/ [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [description] => Array ( [0] => Array ( [data] => Note: What other facts about Jeff Leek do you “know”? Please feel free to click the edit button above and submit a pull request on Github, or tweet with the hashtag #jeffleekfacts. I have not written blog posts for quite a while. It is not because I don’t have anything to write. On the contrary, I have a huge amount of things that I could have written about, e.g., how I collect and manage GIFs, and some stories behind the publication of the bookdown book. A lot of things have happened since the last time I wrote a post here. I’ll explain them later this year. Today I started this post only because I love memes and rumors (with no bad intentions), especially those about the guys behind Simply Statistics. These guys are a lot of fun. So Jeff Leek asked on Twitter the other day about an R interface to Alexa Skills (Cc’ed me probably because of my Shiny Voice app). Anyone know of an R package for interfacing with Alexa Skills? @thosjleeper @xieyihui @drob @JennyBryan @HoloMarkeD ? — Jeff Leek (@jtleek) April 12, 2017 I don’t know anything about Alexa, but the funny thing was that rumors quickly emerged in the replies: @DrJWolfson @jtleek @thosjleeper @xieyihui @drob @JennyBryan @HoloMarkeD Jeff Leek can do zero-fold crossvalidation — Thomas Lumley (@tslumley) April 12, 2017 And I just wanted to collect these unknown “facts” about Jeff A-leek-sa: • @DrJWolfson: Jeff Leek smooths densities with his bare hands. • @tslumley: Jeff Leek can do zero-fold crossvalidation. • @xieyihui: Jeff Leek supports both vector and matrix machines. • @TrestleJeff: stringsAsFactors defaults to FALSE in Jeff Leek’s presence. • @rdpeng: Jeff can convert data frames to matrices with his mind. • @drob: Jeff Leek’s error messages contain the cure for cancer. Unfortunately, he’s never seen one. • @drob: Any statistic is a sufficient statistic when it’s Jeff Leek using it. • @seankross: Jeff Leek has no need for the Tidyverse. Any data he touches tidies itself out of a combination of respect and fear. • @joranelias: All Jeff Leek sequences of random variables converge surely in probability. • @kennyshirley: Correlation implies whatever Jeff Leek tells it to imply. • @just_add_data: Jeff Leek doesn’t trade off bias and variance. • @bcaffo: Jeff Leek counted to infinity. Twice. • @DrJWolfson: Using only the irrationals. • @bcaffo: Which reminds me that Jeff Leek can make square root of 2 rational. • @bcaffo: Jeff Leek can fit a regression line with one point. And get a variance. • @mjfrigaard0: Jeff Leek once won a Kaggle competition, but was disqualified for using his abacus. • @mjfrigaard: Git commits to Jeff Leek. • @mjfrigaard: Jeff Leek can lift the curse of dimensionality by merely glancing at your data. • @clarkfitzg: All P-values computed by Jeff Leek are significant. • @jrnld: Singular matrices are so named because Jeff Leek is the only one who can invert them. • @rikturr: Jeff Leek’s cubic splines are all linear. • @butterflyology: Jeef Leek supports S3 and S4 classes, at the same time. • @tpoi: When Jeff Leek uses <-<-, it tests for equality • @Miao_Cai_SLU: Jeff build regression models without endogeneity. • @brandenco: CRAN checks itself before it submits to Jeff Leek. • @BeEngelhardt: When Jeff Leek inverts a matrix, time actually reverses$O(n^3)\$.

• @michaelhoffman: Jeff Leek is the Uniformly Most Powerful Jeff.

• @pachamaltese: Jeff Leek always obtains unbiased estimators.

• @rdpeng (a few days later): Ok, who knew that Slack was an acronym?

• @sherrirose: Jeff Leek knew that Slack was an acronym.

• @rdpeng: Nice. I deserved that.

• @EamonCaddigan: Jeff Leek can quickly process several yottabytes of data in memory. HIS memory.

• @butterflyology: If you install Jeff Leek, there are no dependencies. Because Jeff Leek depends on nothing.

• @DrJWolfson: Jeff Leek cannot be replicated, but he is fully reproducible.

• @pdalgd: Jeff Leek can make code work just by waiting for it!

• @TrestleJeff: R Core was founded as a Jeff Leek tribute band.

I’m looking forward to yet more facts.

[attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) ) ) [date] => Array ( [raw] => Fri, 14 Apr 2017 00:00:00 +0000 [parsed] => 1492128000 ) [title] => Some Facts about Jeff Leek [links] => Array ( [alternate] => Array ( [0] => https://yihui.name/en/2017/04/jeff-leek-facts/ ) [http://www.iana.org/assignments/relation/alternate] => Array ( [0] => https://yihui.name/en/2017/04/jeff-leek-facts/ ) ) [enclosures] => Array ( ) ) ) [1] => FeedWordPie_Item Object ( [feed] => FeedWordPie Object *RECURSION* [data] => Array ( [data] => [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) [child] => Array ( [] => Array ( [title] => Array ( [0] => Array ( [data] => A Letter of Recommendation for Nan Xiao [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [link] => Array ( [0] => Array ( [data] => https://yihui.name/en/2014/11/lor-nan-xiao/ [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [pubDate] => Array ( [0] => Array ( [data] => Tue, 18 Nov 2014 00:00:00 +0000 [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [guid] => Array ( [0] => Array ( [data] => https://yihui.name/en/2014/11/lor-nan-xiao/ [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [description] => Array ( [0] => Array ( [data] =>

I hope my letter could boost this guy up like:

I’m not sure if I’m a good observer, but time and time again I feel some people are undervalued, or they were not given better opportunities to show their value. Not surprisingly, I know quite a few such people in the Chinese R/stats community, mainly because of the website Capital of Statistics (COS) that I founded a number of years ago.

I believe Nan Xiao is among these undervalued people, which is why I’m writing a public letter of recommendation for him to apply to a stats/biostats/bioinformatics program in the US. You can go to his website http://r2s.name to know more about him, and I’m not going to repeat his information here.

As someone who went through the same application process six years ago, I know it is difficult to get an offer unless you are from a top university in China in the eyes of the admission committee. By “top” (in terms of the major of statistics), it basically means Peking U, Tsinghua, USTC, Fudan, Beijing Normal U, and perhaps one or two other universities. My alma mater was Renmin U (in Beijing), which nobody knows, unfortunately. The statistics program at Renmin actually has got the highest ranking in China this year, and I’m not surprised at all. Perhaps Renmin does not offer the best math training to students in statistics, but I think its program is well balanced between application and theory. In recent years, they have been putting more emphasis on the math training to catch up with the “top” universities. Personally I do not think this is a good idea, but it seems to make the admission committees in the US more comfortable. Anyway, the first driving force of my admission to Iowa State U was probably my work on the animation package, which was also why I was acquainted with my PhD advisors Di and Heike before I applied to Iowa State.

To some degree, I was very fortunate since my research interest, statistical graphics, at that time was not the “mainstream” in statistics (it still is not), and it happened that there were two professors with the same research interest, so it was fairly easy to make the deal. Nan’s interests (machine learning/bioinformatics) are broader than mine, and I think he will face more competition consequently. Given his education background from a university that is not widely known, I’m trying to make him more visible, although my influence might be very limited. I believe he will make better contribution during his PhD training than me, if his potential can be well utilized.

I have known Nan for quite a few years. We have physically met only once during the 6th Chinese R conference last year, but I have been reading his forum posts in COS and blog posts since circa 2008. He is one of the best hackers that I know, with a very good sense of beauty. Apparently, hacking skills are becoming more and more important in this age of data (excuse me, but I do hate saying “big data” when “big” is meaningless). Let me enumerate some of my observations about him:

• He knows well about the web (scraping data, security issues, and so on). To his future advisor/department, this means he could be very helpful if you need to obtain data from the web, and he may be able to improve the department IT support, which often sucks from my experience.
• He is a superb presenter. He has an outstanding presenting style, which you can see from his past talks (it does not matter if you do not understand Chinese). You may underestimate the importance of this, but please recall how much you (or is it just me?) wanted to fall asleep during the Joint Statistical Meetings, when everybody was using the same blue Beamer style, with pages after pages of equations.
• My favorite illustration among his blog posts is this one: http://r2s.name/cn/r/ria.html
• He has deep interests in data visualization, in particular, network visualization. Look at his list of papers on his website! Aren’t those graphs beautiful?
• He has worked on the translation of three books into Chinese with other people. To translate a book, you certainly have to understand it. You probably should not have any doubts on how well he knows R, graphics, and data mining methods.
• I do not formally collaborate with him very often, but you may want to look at the SVD example in his projects. He did it after I said “How about a Shiny app?”. I believe there have been many other SVD examples with the similar idea, but I was still impressed how quickly he made it. If you are familiar with Shiny, you may also be impressed by his taste on design (I love the “Crouching Tiger Hidden Dragon” picture. Looks so cool!).
• I know little about bioinformatics, chemoinformatics, or pharmacology, so I’m not going to comment on these specifics. There is one thing that I’m sure about, though, which is his eagerness for making substantial contributions to science that can eventually benefit some people. I trust his sincereness.

If you are looking for a PhD student in a stats-related program in 2015, please consider this guy. A job in the industry is also a possibility for him, so please also consider him if your company has a position for an awesome hacker. Email me if you have more questions, or forward my blog post to your colleagues/friends who might be interested.

[attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) ) ) [date] => Array ( [raw] => Tue, 18 Nov 2014 00:00:00 +0000 [parsed] => 1416268800 ) [title] => A Letter of Recommendation for Nan Xiao [links] => Array ( [alternate] => Array ( [0] => https://yihui.name/en/2014/11/lor-nan-xiao/ ) [http://www.iana.org/assignments/relation/alternate] => Array ( [0] => https://yihui.name/en/2014/11/lor-nan-xiao/ ) ) [enclosures] => Array ( ) ) ) [2] => FeedWordPie_Item Object ( [feed] => FeedWordPie Object *RECURSION* [data] => Array ( [data] => [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) [child] => Array ( [] => Array ( [title] => Array ( [0] => Array ( [data] => A Few Notes on UseR! 2014 [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [link] => Array ( [0] => Array ( [data] => https://yihui.name/en/2014/07/a-few-notes-on-user2014/ [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [pubDate] => Array ( [0] => Array ( [data] => Sat, 26 Jul 2014 00:00:00 +0000 [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [guid] => Array ( [0] => Array ( [data] => https://yihui.name/en/2014/07/a-few-notes-on-user2014/ [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [description] => Array ( [0] => Array ( [data] =>

It has been a month since the UseR! 2014 conference, and I’m probably the last one who writes about it. UseR! is my favorite conference because it is technical and not too big. I have completely lost interest in big and broad conferences like JSM (to me, it has become Joint Sightseeing Meetings). Karl has written two blog posts about UseR! (1-2, 3-4), and I’m going to add a few more observations here. An important disclaimer before I move on: Karl Broman is not responsible for videotaping at UseR! 2014 (neither am I), and he is not even on the conference committee. I accidentally mentioned the videos when replying his tweet, which seemed to have caused confusion unfortunately. Any questions about the conference, including the time to publish videos, should be directed to the official organizing committee.

The conference website is hosted on Github. Awesome. Speakers can add links to their slides through pull requests. Genius. It is a little sad, though, that each UseR conference has its own website and twitter handle. There should have been a single website, a single domain name, and a single twitter account managing all the R conferences each year. Fragmentation is just such a natural thing in programmers’ world.

The on-campus dorms were fantastic (oh when I wrote “dorm” I almost typed “dnorm”), which saved us time on transportation. Dining halls were on campus as well. Breakfast was perfect, although I could not stand eating sandwiches, hamburgers, or pizza for four days. Okay, let’s talk about the talks. You can find most of the slides on the conference website.

• John Chambers: the three promising projects Rcpp/Rcpp11, RLLVM, and h2o. I do not know anything about h2o. I think Rcpp has been a great success, and I have my blind faith in Romain for Rcpp11. For RLLVM, I’m a little concerned: 15 stars, and 4 forks on Github. That is not a good sign. The bus factor is too low. Well, I have been admiring Duncan for a huge number of his amazing packages. Perhaps he can handle this one on his own as well.
• Ramnath Vaidyanathan: all JavaScript libraries are belong to Ramnath! If you want to get a certain JS library integrated with R, tell him the night before, go to sleep, and you will see it in the R world the next morning. I’m only partially kidding :)
• Gordon Woodhull: it was great to see the substantial progress in RCloud.
• Jeroen Ooms: I came from the RWeb age, so you know how excited I was when I saw OpenCPU a couple of years ago. There were things I wished I could do for years but were too complicated before OpenCPU was launched. For example, the knitr issue #51 was probably my first experiment with OpenCPU, and I had a lot of fun with it.
• Jonathan Godfrey: I did not attend his talk, but he attended my tutorial and talked to me a couple of times during the conference. That was the first time I had talked to a blind R user, and was surprised to know a few facts:
• PDF is bad for blind people, and HTML is much better (think R package vignettes);
• it is nearly impossible for them to read raster images, and SVG graphics can be better;
• if there is an image in an HTML document, its alt attribute is very important;
• LaTeX math expressions created by MathJax look excellent in the eyes of sighted people, but they seem to be hard to read by the blind (Jonathan mentioned to me afterwards that it may be possible to configure MathJax to make it readable but he is not sure how to do it; the math expressions on Wikipedia are displayed as images with the alt attribute, and he can read those math expressions);
• Matt Dowle: his data.table story with Patrick Burns was pretty interesting. You can read his slides.
• Aran Lunzer: LivelyR. It was not a talk. It was simply magic. I had absolutely no clue how they made it, even though I’m one of the (co-)authors of the packages that they used.
• Dirk Eddelbuettel: Rcpp and Docker. I was really glad that he mentioned Docker. Travis CI seems to have attracted a lot of attention of R package authors after I was inspired by a reader of my blog and experimented with it last year. The major missing piece on Travis CI is R for R users. apt-get install r-base every time is a waste of time and resources. It will be nice if one can build one’s own virtual machine with all the necessary packages. This is pretty simple with Docker. However, I have not found a free service like Travis CI that allows the users to build/test software with custom Docker containers.
• Martin Mächler: good practices in R programming. You can find slides on his homepage. This was an excellent talk. Precise and clear. I recommend everyone to read his slides. One minor and subjective issue is = vs <-. Not many people are with me, and once Alyssa Frazee’s post made me cried a little in the restroom (so excited to find another person using = in R).
• Andy Chen: RLint. Programming styles? I guess a single programming style will never happen. I insist on using = instead of <- for assignment in R, except when I collaborate with the left arrow party. Roger Peng insists 8 spaces for indentation. Excuse me? What is a programming style?
• John Nash: it was a great pleasure to meet John in person for the first time. I love hearing old stories from senior people, such as Jeff Laake telling me early stories about ADMB and CRAN, and John Kimmel mentioning John Tukey and the development of interactive graphics at Bell Labs. John (Nash) showed me some Fortran, Pascal, and BASIC programs to me that were even older than me. Personally I have no interest in these languages, but it was interesting to know what were done before you were born, and some of the programs are still “alive” in R. Since he was so eager for running Fortran code in knitr documents, we sat together for a few minutes, he wrote a Fortran example, and I just added a quick and dirty Fortran engine in knitr.
• I gave a talk titled “Knitr Ninja”, and a few people remembered sword(2) after that. I was extremely bored by myself after I had given so many talks on knitr, so I thought I should do a completely different talk that nobody had heard of, including my bosses at RStudio. It turned out that the RStudio viewer was pretty handy for presentations, and I could show Kakashi in it:

• Katharine Mullen: JSS. Honestly I’m a little concerned about JSS, although I strongly believe it is an outstanding journal. I have only published one paper on it (the animation package), and the review process was too slow. Four months goes by and you hear nothing back so you ask what’s up. Then four months goes by, you get the first round of review. Sometimes I do not quite understand how free journals work, or what motivates the anonymous reviewers. I think this is a pretty hard problem, and I would propose to open up the journal to the wild just like open source software. Even 58 editors on board is still too few compared to the number of authors and submissions. I was extremely excited that Jan de Leeuw’s very first proposal of establishing such a journal was that the journal should be done in HTML!! And interactive, where possible!! That was the year 1995 (I was still in the fifth grade in elementary school learning fractions in a village). Twenty years later, I think the infrastructure is good enough (e.g. R Markdown, Shiny, Shiny Server) to go back to his original proposal. I would love to see papers in HTML instead of PDF. Typesetting with HTML is a whole lot easier and attractive than LaTeX/PDF in my opinion, and there is a whole lot more interesting stuff to play with in HTML. With R Markdown v2, HTML and PDF are not mutually exclusive, although we will have to give up certain markups in LaTeX, but man, do you really need \proglang{} and \pkg{}?

Finally it is rumor time:

• iris has been officially declared (? by whom? perhaps by the many sleepy faces in the audience) as the dataset porn in R, and the next candidate will be ggplot2::diamonds!
[attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) ) ) [date] => Array ( [raw] => Sat, 26 Jul 2014 00:00:00 +0000 [parsed] => 1406332800 ) [title] => A Few Notes on UseR! 2014 [links] => Array ( [alternate] => Array ( [0] => https://yihui.name/en/2014/07/a-few-notes-on-user2014/ ) [http://www.iana.org/assignments/relation/alternate] => Array ( [0] => https://yihui.name/en/2014/07/a-few-notes-on-user2014/ ) ) [enclosures] => Array ( ) ) ) [3] => FeedWordPie_Item Object ( [feed] => FeedWordPie Object *RECURSION* [data] => Array ( [data] => [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) [child] => Array ( [] => Array ( [title] => Array ( [0] => Array ( [data] => library() vs require() in R [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [link] => Array ( [0] => Array ( [data] => https://yihui.name/en/2014/07/library-vs-require/ [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [pubDate] => Array ( [0] => Array ( [data] => Sat, 26 Jul 2014 00:00:00 +0000 [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [guid] => Array ( [0] => Array ( [data] => https://yihui.name/en/2014/07/library-vs-require/ [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [description] => Array ( [0] => Array ( [data] =>

While I was sitting in a conference room at UseR! 2014, I started counting the number of times that require() was used in the presentations, and would rant about it after I counted to ten. With drums rolling, David won this little award (sorry, I did not really mean this to you).

Ladies and gentlemen, I've said this before: require() is the wrong way to load an R package; use library() instead #useR2014

— Yihui Xie (@xieyihui) July 2, 2014

After I tweeted about it, some useRs seemed to be unhappy and asked me why. Both require() and library() can load (strictly speaking, attach) an R package. Why should not one use require()? The answer is pretty simple. If you take a look at the source code of require (use the source, Luke, as Martin Mächler mentioned in his invited talk), you will see that require() basically means “try to load the package using library() and return a logical value indicating the success or failure”. In other words, library() loads a package, and require() tries to load a package. So when you want to load a package, do you load a package or try to load a package? It should be crystal clear.

One bad consequence of require() is that if you require('foo') in the beginning of an R script, and use a function bar() in the foo package on line 175, R will throw an error object “bar” not found if foo was not installed. That is too late and sometimes difficult for other people to understand if they use your script but are not familiar with the foo package – they may ask, what is the bar object, and where is it from? When your code is going to fail, fail loudly, early, and with a relevant error message. require() does not signal an error, and library() does.

Sometimes you do need require() to use a package conditionally (e.g. the sun is not going to explode without this package), in which case you may use an if statement, e.g.

if (require('foo')) {
awesome_foo_function()
} else {
warning('You missed an awesome function')
}

That should be what require() was designed for, but it is common to see R code like this as well:

if (!require('foo')) {
stop('The package foo was not installed')
}

Sigh.

• library('foo') stops when foo was not installed
• require() is basically try(library())

Then if (!require('foo')) stop() is basically “if you failed to try to load this package, please fail”. I do not quite understand why it is worth the circle, except when one wants a different error message with the one from library(), otherwise one can simply load and fail.

There is one legitimate reason to use require(), though, and that is, “require is a verb and library is a noun!” I completely agree. require should have been a very nice name to choose for the purpose of loading a package, but unfortunately… you know.

If you take a look at the StackOverflow question on this, you will see a comment on “package vs library” was up-voted a lot of times. It used to make a lot of sense to me, but now I do not care as much as I did. There have been useRs (including me up to a certain point) desperately explaining the difference between the two terms package and library, but somehow I think R’s definition of a library is indeed unusual, and the function library() makes the situation worse. Now I’m totally fine if anyone calls my packages “libraries”, because I know what you mean.

Karthik Ram suggested this GIF to express “Ah a new library, but require? Noooooo“:

Since you have read the source code, Luke, you may have found that you can abuse require() a bit, for example:

> (require(c('MASS', 'nnet')))
Failed with error:  ‘'package' must be of length 1’
the condition has length > 1 and only the first element will be used
[1] FALSE

> (require(c('MASS', 'nnet'), character.only = TRUE))
Failed with error:  ‘'package' must be of length 1’
the condition has length > 1 and only the first element will be used
[1] FALSE

> library(c('MASS', 'nnet'), character.only = TRUE)
Error in library(c("MASS", "nnet"), character.only = TRUE) :
'package' must be of length 1

So require() failed not because MASS and nnet did not exist, but because of a different error. As long as there is an error (no matter what it is), require() returns FALSE.

One thing off-topic while I’m talking about these two functions: the argument character.only = FALSE for library() and require() is a design mistake in my eyes. It seems the original author(s) wanted to be lazy to avoid typing the quotes around the package name, so library(foo) works like library("foo"). Once you show people they can be lazy, you can never pull them back. Apparently, the editors of JSS (Journal of Statistical Software) have been trying to promote the form library("foo") and discourage library(foo), but I do not think it makes much sense now or it will change anything. If it were in the 90’s, I’d wholeheartedly support it. It is simply way too late now. Yes, two extra quotation marks will kill many kittens on this planet. If you are familiar with *nix commands, this idea is not new – just think about tar -z -x -f, tar -zxf, and tar zxf.

One last mildly annoying issue with require() is that it is noisy by default, because of the default quietly = FALSE, e.g.

> require('nnet')
> require('MASS', quietly = TRUE)

So when I tell you to load a package, you tell me you are loading a package, as if you had heard me. Oh thank you!

[attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) ) ) [date] => Array ( [raw] => Sat, 26 Jul 2014 00:00:00 +0000 [parsed] => 1406332800 ) [title] => library() vs require() in R [links] => Array ( [alternate] => Array ( [0] => https://yihui.name/en/2014/07/library-vs-require/ ) [http://www.iana.org/assignments/relation/alternate] => Array ( [0] => https://yihui.name/en/2014/07/library-vs-require/ ) ) [enclosures] => Array ( ) ) ) [4] => FeedWordPie_Item Object ( [feed] => FeedWordPie Object *RECURSION* [data] => Array ( [data] => [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) [child] => Array ( [] => Array ( [title] => Array ( [0] => Array ( [data] => Markdown or LaTeX? [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [link] => Array ( [0] => Array ( [data] => https://yihui.name/en/2013/10/markdown-or-latex/ [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [pubDate] => Array ( [0] => Array ( [data] => Sat, 19 Oct 2013 00:00:00 +0000 [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [guid] => Array ( [0] => Array ( [data] => https://yihui.name/en/2013/10/markdown-or-latex/ [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [description] => Array ( [0] => Array ( [data] =>

What happens if you ask for too much power from Markdown?

R Markdown is one of the document formats that knitr supports, and it is probably the most popular one. I have been asked many times about the choice between Markdown and LaTeX, so I think I’d better wrap up my opinions in a blog post. These two languages (do you really call Markdown a language?) are kind of at the two extremes: Markdown is super easy to learn and type, but it is primarily targeted at HTML pages, and you do not have fine control over typesetting ( really? really?), because you only have a very limited number of HTML tags in the output; LaTeX is relatively difficult to learn and type, but it allows you to do precise typesetting (you have control over anything, and that is probably why a lot of time can be wasted).

## What is the problem?

What is the root problem? I think one word answers everything: page! Why do we need pages? Printing is the answer.

In my eyes, the biggest challenge for typesetting is to arrange elements properly with the restriction of pages. This restriction seems trivial, but it is really the root of all “evil”. Without having to put things on pages, life can be much easier in writing.

What is the root of this root problem in LaTeX? One concept: floating environments. If everything comes in a strictly linear fashion, writing will be just writing; typesetting should be no big deal. Because a graph cannot be broken over two pages, it is hard to find a place to put it. By default, it can float to unexpected places. The same problem can happen to tables (see the end of a previous post). You may have to add or delete some words to make sure they float to proper places. That is endless trouble in LaTeX.

There is no such a problem in HTML/Markdown, because there is no page. You just keep writing, and everything appears linearly.

## Can I have both HTML and PDF output?

There is no fault being greedy, and it is natural to ask the question whether one can have both HTML and PDF output from a single source document. The answer is maybe yes: you can go from LaTeX to HTML, or from Markdown to LaTeX/PDF.

• pandoc can convert Markdown to almost anything
• many tools to convert LaTeX to HTML

But remember, Markdown was designed for HTML, and LaTeX was for PDF and related output formats. If you ask for more power from either language, the result is not likely be ideal, otherwise one of them must die.

## How to make the decision?

If your writing does not involve complicated typesetting and primarily consists of text (especially no floating environments), go with Markdown. I cannot think of a reason why you must use LaTeX to write a novel. See Hadley’s new book Advanced R programming for an excellent example of Markdown + knitr + other tools: the typesetting elements in this book are very simple – section headers, paragraphs, and code/output. That is pretty much it. Eventually it should be relatively easy to convert those Markdown files to LaTeX via Pandoc, and publish a PDF using the LaTeX class from Chapman & Hall.

For the rest of you, what I’d recommend is to think early and make a decision in the beginning; avoid having both HTML and PDF in mind. Ask yourself only one question: must I print the results nicely on paper? If the answer is yes, go with LaTeX; otherwise just choose whatever makes you comfortable. The book Text Analysis with R authored by Matthew Jockers is an example of LaTeX + knitr. Matt also asked me this question about Markdown vs LaTeX last week while he was here at Iowa State. For this particular book, I think Markdown is probably OK, although I’m not quite sure about a few environments in the book, such as the chapter abstracts.

It is not obvious whether we must print certain things. I think we are just too used to printing. For example, dear professors, must we print our homework? (apparently Jenny does not think so; I saw her grade homework on RPubs.com!) Or dear customers, must we submit reports in PDF? … In this era, you have laptops, iPad, Kindle, tablets and all kinds of electronic devices that can show rich media, why must you print everything (in black and white)?

For those who are still reading this post, let me finish with a side story: Matt, a LaTeX novice, taught himself LaTeX a few months ago, and he has finished the draft of a book with LaTeX! Why are you still hesitating about the choice of tools? Shouldn’t you just go ahead and get the * done? Although all roads lead to Rome, some people die at the starting line instead of on the roads.

[attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) ) ) [date] => Array ( [raw] => Sat, 19 Oct 2013 00:00:00 +0000 [parsed] => 1382140800 ) [title] => Markdown or LaTeX? [links] => Array ( [alternate] => Array ( [0] => https://yihui.name/en/2013/10/markdown-or-latex/ ) [http://www.iana.org/assignments/relation/alternate] => Array ( [0] => https://yihui.name/en/2013/10/markdown-or-latex/ ) ) [enclosures] => Array ( ) ) ) [5] => FeedWordPie_Item Object ( [feed] => FeedWordPie Object *RECURSION* [data] => Array ( [data] => [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) [child] => Array ( [] => Array ( [title] => Array ( [0] => Array ( [data] => Testing R Packages [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [link] => Array ( [0] => Array ( [data] => https://yihui.name/en/2013/09/testing-r-packages/ [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [pubDate] => Array ( [0] => Array ( [data] => Mon, 30 Sep 2013 00:00:00 +0000 [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [guid] => Array ( [0] => Array ( [data] => https://yihui.name/en/2013/09/testing-r-packages/ [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [description] => Array ( [0] => Array ( [data] =>

This guy th3james claimed Testing Code Is Simple, and I agree. In the R world, this is not anything new. As far as I can see, there are three schools of R users with different testing techniques:

1. tests are put under package/tests/, and a foo-test.Rout.save from R CMD BATCH foo-test.R; testing is done by comparing foo-test.Rout from R CMD check with your foo-test.Rout.save; R notifies you when it sees text differences; this is typically used by R core and followers
2. RUnit and its followers: formal ideas were borrowed from other languages and frameworks and it looks there is a lot to learn before you can get started
3. the testthat family: tests are expressed as expect_something() like a natural human language

At its core, testing is nothing but “tell me if something unexpected happened”. The usual way to tell you is to signal an error. In R, that means stop(). A very simple way to write a test for the function FUN() is:

if (!identical(FUN(arg1 = val1, arg2 = val2, ...), expected_value)) {
stop('FUN() did not return the expected value!')
}

That is, when we pass the values val1 and val2 to the arguments arg1 and arg2, respectively, the function FUN() should return a value identical to our expected value, otherwise we signal an error. If R CMD check sees an error, it will stop and fail.

For me, I only want one thing for unit testing: I want the non-exported functions to be visible to me during testing; unit testing should have all “units” available, but R’s namespace has intentionally restricted the objects that are visible to the end users of a package, which is a Very Good Thing to end users. It is less convenient to the package author, since he/she will have to use the triple colon syntax such as foo:::hidden_fun() when testing the function hidden_fun().

I wrote a tiny package called testit after John Ramey dropped by my office one afternoon while I was doing intern at Fred Hutchinson Cancer Research Center last year. I thought a while about the three testing approaches, and decided to write my own package because I did not like the first approach (text comparison), and I did not want to learn or remember the new vocabulary of RUnit or testthat. There is only one function for the testing purpose in this package: assert().

assert(
"1 plus 1 is equal to 2",
1 + 1 == 2
)

You can write multiple testing conditions, e.g.

assert(
"1 plus 1 is equal to 2",
1 + 1 == 2,
identical(1 + 1, 2),
(1 + 1 >= 2) && (1 + 1 <= 2), # mathematician's proof
c(is.numeric(1 + 1), is.numeric(2))
)

There is another function test_pkg() to run all tests of a package using an empty environment with the package namespace as its parent environment, which means all objects in the package, exported or not, are directly available without ::: in the test scripts. See the CRAN page for a list of packages that use testit, for example, my highr package, where you can find some examples of tests.

While I do not like the text comparison approach, it does not mean it is not useful. Actually it is extremely useful when testing text document output. It is just a little awkward when testing function output. The text comparison approach plays an important role in the development of knitr: I have a Github repository knitr-examples, which serves as both an example repo and a testing repo. When I push new commits to Github, I use Travis CI to test the package, and there are two parts of the tests: one is to run R CMD check on the package, which uses testit to run the test R scripts, and the other is to re-compile all the examples, and do git diff to see if there are changes. I have more than 100 examples, which should have reasonable coverage of possible problems in the new changes in knitr. This way, I feel comfortable when I bring new features or make changes in knitr because I know they are unlikely to break old documents.

If you are new to testing and only have 3 minutes, I’d strongly recommend you to read at least the first two sections of Hadley’s testthat article.

[attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) ) ) [date] => Array ( [raw] => Mon, 30 Sep 2013 00:00:00 +0000 [parsed] => 1380499200 ) [title] => Testing R Packages [links] => Array ( [alternate] => Array ( [0] => https://yihui.name/en/2013/09/testing-r-packages/ ) [http://www.iana.org/assignments/relation/alternate] => Array ( [0] => https://yihui.name/en/2013/09/testing-r-packages/ ) ) [enclosures] => Array ( ) ) ) [6] => FeedWordPie_Item Object ( [feed] => FeedWordPie Object *RECURSION* [data] => Array ( [data] => [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) [child] => Array ( [] => Array ( [title] => Array ( [0] => Array ( [data] => After Three Months I Cannot Reproduce My Own Book [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [link] => Array ( [0] => Array ( [data] => https://yihui.name/en/2013/09/cannot-reproduce-my-own-book/ [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [pubDate] => Array ( [0] => Array ( [data] => Thu, 05 Sep 2013 00:00:00 +0000 [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [guid] => Array ( [0] => Array ( [data] => https://yihui.name/en/2013/09/cannot-reproduce-my-own-book/ [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [description] => Array ( [0] => Array ( [data] =>

TL;DR I thought I could easily jump to a high standard (reproducibility), but I failed.

Some of you may have noticed that the knitr book is finally out. Amazon is offering a good price at the moment, so if you are interested, you’d better hurry up.

I avoided the phrase “Reproducible Research” in the book title, because I did not want to take that responsibility, although it is related to reproducible research in some sense. The book was written with knitr v1.3 and R 3.0.1, as you can see from my sessionInfo() in the preface.

Three months later, several things have changed, and I could not reproduce the book, but that did not surprise me. I’ll explain the details later. Here I have extracted the first three chapters, and released the corresponding source files in the knitr-book repository on Github. You can also find the link to download the PDF there. This repository may be useful to those who plan to write a book using R.

What I could not reproduce were not really important. The major change in the recent knitr versions was the syntax highlighting commands, e.g. \hlcomment{} is \hlcom{} now, and the syntax highlighting has been improved by the highr package (sorry, Romain). This change brought a fair amount of changes when I look at git diff, but these are only cosmetic changes.

I tried my best to avoid writing anything that is likely to change in the future into the book, but as a naive programmer, I have to say sorry that I have broken two little features, although they may not really affect you:

• the preferred way to stop knitr in case of errors is to set the chunk option error = FALSE instead of the package option stop_on_error, which has been deprecated (Section 6.2.4);
• for external code chunks (Section 9.2), the preferred chunk delimiter is ## ---- instead of ## @knitr now;

Actually the backward-compatibility is still there, so they will not really break until a long time later.

With exactly the same software environment, I think I can reproduce the book, but that does not make much sense. Things are always evolving. Then there are two types of reproducible research:

1. the “dead” reproducible research (reproduce in a very specific environment);
2. the reproducible research that evolves and generalizes;

I think the latter is more valuable. Being reproducible alone is not the goal, because you may be reproducing either real findings or simply old mistakes. As Roger Peng wrote,

[…] reproducibility cannot really address the validity of a scientific claim as well as replication

Roger’s recent three blog posts on reproducible research are very worth reading. This blog post of mine is actually not quite relevant (no data analysis here), so I recommend my readers to move over there after you haved checked out the knitr-book repository.

[attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) ) ) [date] => Array ( [raw] => Thu, 05 Sep 2013 00:00:00 +0000 [parsed] => 1378339200 ) [title] => After Three Months I Cannot Reproduce My Own Book [links] => Array ( [alternate] => Array ( [0] => https://yihui.name/en/2013/09/cannot-reproduce-my-own-book/ ) [http://www.iana.org/assignments/relation/alternate] => Array ( [0] => https://yihui.name/en/2013/09/cannot-reproduce-my-own-book/ ) ) [enclosures] => Array ( ) ) ) [7] => FeedWordPie_Item Object ( [feed] => FeedWordPie Object *RECURSION* [data] => Array ( [data] => [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) [child] => Array ( [] => Array ( [title] => Array ( [0] => Array ( [data] => My first Bioconductor conference (2013) [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [link] => Array ( [0] => Array ( [data] => https://yihui.name/en/2013/07/bioconductor-2013/ [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [pubDate] => Array ( [0] => Array ( [data] => Sun, 21 Jul 2013 00:00:00 +0000 [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [guid] => Array ( [0] => Array ( [data] => https://yihui.name/en/2013/07/bioconductor-2013/ [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [description] => Array ( [0] => Array ( [data] =>

The BioC 2013 conference was held from July 17 to 19. I attended this conference for my first time, mainly because I’m working at the Fred Hutchinson Cancer Research Center this summer, and the conference venue was just downstairs! No flights, no hotels, no transportation, yeah.

Last time I wrote about my first ENAR experience, and let me tell you why the BioC conference organizers are smart in my eyes.

## A badge that never flips

I do not need to explain this simple design – it just will not flip to the damn blank side:

## The conference program book

The program book was only four pages of the schedule (titles and speakers). The abstracts are online. Trees saved.

## Lightning talks

There were plenty of lightning talks. You can talk whatever you want.

## Live coding

On the developer’s day, Martin Morgan presented some buggy R code to the audience (provided by Laurent Gatto), and asked us to debug it right there. Wow!

## Everything is free after registration

The registration includes almost everything: lunch, beer, wine, coffee, fruits, snacks, and most importantly, Amazon Machine Instances (AMI)!

## AMI

This is a really shiny point of BioC! If you have ever tried to do a software tutorial, you probably know the pain of setting up the environment for your audience, because they use different operating systems, different versions of packages, and who knows what is going to happen after you are on your third slide. At a workshop last year, I had the experience of spending five minutes figuring out why a keyboard shortcut did not work for one Canadian lady in the audience, and it turned out she was using the French keyboard layout.

The BioC organizers solved this problem beautifully by installing the RStudio server on AMI. Every participant was sent a link to the Amazon virtual machine, and all they need is a web browser and wireless connection in the room. All people run R in exactly the same environment.

Isn’t that smart?

## Talks

I do not really know much about biology, although a few biological terms have been added to my volcabulary this summer. When a talk becomes biologically oriented, I will have to give up.

Simon Urbanek talked about big data in R this year, which is unusual, as mentioned by himself. Normally he shows fancy graphics (e.g. iplots). I did not realize the significance of this R 3.0.0 news item until his talk:

It is now possible to write custom connection implementations outside core R using R_ext/Connections.h. Please note that the implementation of connections is still considered internal and may change in the future (see the above file for details).

Given this new feature, he implemented the HDFS connections and 0MQ-based connections in R single-handedly (well, that is always his style).

You probably have noticed the previous links are Github repositories. Yes! Some R core members really appreciate the value of social coding now! I’m sure Simon does. I’m aware of other R core members using Github quietly (DB, SF, MM, PM, DS, DTL, DM), but I do not really know their attitude toward it.

Joe Cheng’s Shiny talk is shiny as usual. Each time I attend his talk, he will show a brand new amazing demo. Joe is the only R programmer that makes me feel “the sky is the limit (of R)”. The audience were shocked when they saw a heatmap that they were so familiar with suddently became interactive in a Shiny app! BTW, Joe has a special sense of humor when he talks about an area in which he is not an expert (statistics or biology).

RStudio 0.98 is going to be awesome. I’m not going to provide the links here, since it is not released yet. I’m sure you will find the preview version if you really want it.

## Bragging rights

• I met Robert Gentleman for the first time!
• I dare fall asleep during Martin Morgan’s tutorial! (sorry, Martin)
• some Bioconductor web pages were built with knitr/R Markdown!

## Next steps

Given Biocondutor’s open-mindedness to new technologies (GIT, Github, AMI, Shiny, …), let’s see if it is going to take over the world. Just kidding. But not completely kidding. I will keep the conversation going before I leave Seattle around mid-August, and get something done hopefully.

If you have any feature requests or suggestions to Bioconductor, I will be happy to serve as the “conductor” temporarily. I guess they should set up a blog at some point.

[attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) ) ) [date] => Array ( [raw] => Sun, 21 Jul 2013 00:00:00 +0000 [parsed] => 1374364800 ) [title] => My first Bioconductor conference (2013) [links] => Array ( [alternate] => Array ( [0] => https://yihui.name/en/2013/07/bioconductor-2013/ ) [http://www.iana.org/assignments/relation/alternate] => Array ( [0] => https://yihui.name/en/2013/07/bioconductor-2013/ ) ) [enclosures] => Array ( ) ) ) [8] => FeedWordPie_Item Object ( [feed] => FeedWordPie Object *RECURSION* [data] => Array ( [data] => [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) [child] => Array ( [] => Array ( [title] => Array ( [0] => Array ( [data] => R Package Versioning [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [link] => Array ( [0] => Array ( [data] => https://yihui.name/en/2013/06/r-package-versioning/ [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [pubDate] => Array ( [0] => Array ( [data] => Thu, 27 Jun 2013 00:00:00 +0000 [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [guid] => Array ( [0] => Array ( [data] => https://yihui.name/en/2013/06/r-package-versioning/ [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [description] => Array ( [0] => Array ( [data] =>

This should be what it feels like to bump the major version of your software:

For me, the main reason for package versioning is to indicate the (slight or significant) differences among different versions of the same package, otherwise we can keep on releasing the version 1.0.

That seems to be a very obvious fact, so here are my own versioning rules, with some ideas borrowed from Semantic Versioning:

1. a version number is of the form major.minor.patch (x.y.z), e.g., 0.1.7
2. only the version x.y is released to CRAN
3. x.y.z is always the development version, and each time a new feature or a bug fix or a change is introduced, bump the patch version, e.g., from 0.1.3 to 0.1.4
4. when one feels it is time to release to CRAN, bump the minor version, e.g., from 0.1 to 0.2
5. when a change is crazy enough that many users are presumably going to yell at you (see the illustration above), it is time to bump the major version, e.g., from 0.18 to 1.0
6. the version 1.0 does not imply maturity; it is just because it is potentially very different from 0.x (such as API changes); same thing applies to 2.0 vs 1.0

I learned the rule #3 from Michael Lawrence (author of RGtk2) and I think it is a good idea. In particular, it is important for brave users who dare install the development versions. When you ask them for their sessionInfo(), you will be aware of which stage they are at.

Rule #2 saves us a little bit energy in the sense that we do not need to write or talk about the foo package 1.3.548, which is boring to type or speak. Normally we say foo 1.3. As a person whose first language is not English, speaking the patch version does consume my brain memory and slows down my thinking while I’m talking. When I say it in Chinese, I feel boring and unnecessarily geeky. Yes, I know I always have weird opinions.

[attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) ) ) [date] => Array ( [raw] => Thu, 27 Jun 2013 00:00:00 +0000 [parsed] => 1372291200 ) [title] => R Package Versioning [links] => Array ( [alternate] => Array ( [0] => https://yihui.name/en/2013/06/r-package-versioning/ ) [http://www.iana.org/assignments/relation/alternate] => Array ( [0] => https://yihui.name/en/2013/06/r-package-versioning/ ) ) [enclosures] => Array ( ) ) ) [9] => FeedWordPie_Item Object *RECURSION* [10] => FeedWordPie_Item Object ( [feed] => FeedWordPie Object *RECURSION* [data] => Array ( [data] => [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) [child] => Array ( [] => Array ( [title] => Array ( [0] => Array ( [data] => A Few Tips for Writing an R Book [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [link] => Array ( [0] => Array ( [data] => https://yihui.name/en/2013/06/tips-for-writing-an-r-book/ [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [pubDate] => Array ( [0] => Array ( [data] => Mon, 03 Jun 2013 00:00:00 +0000 [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [guid] => Array ( [0] => Array ( [data] => https://yihui.name/en/2013/06/tips-for-writing-an-r-book/ [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [description] => Array ( [0] => Array ( [data] =>

I just finished fixing (hopefully all) the problems in the knitr book returned from the copy editor. David Smith has kindly announced this book before I do. I do not have much to say about this book: almost everything in the book can be found in the online documentation, questions & answers and the source code. The point of buying this book is perhaps you do not have time to read through all the two thousand questions and answers online, and I did that for you.

This is my first book, and obviously there have been a lot for me to learn about writing a book. In retrospect, I want to share a few tips that I found useful (in particular, for those who plan to write for Chapman & Hall):

1. Although it sounds like shameless self-promotion, using knitr made it a lot easier to manage R code and its output for the book; for example, I could quickly adapt to R 3.0.1 from 2.15.3 after I came back from a vacation; if I were to write a second edition, I do not think I will have big trouble with my R code in the book (it is easy to make sure the output is up-to-date);

2. I put my source documents under version control, which helped me watch the changes in the output closely; for example, I noticed the source code of the function fivenum() in base R was changed from R 2.15.3 to 3.0.0 thanks to GIT (R core have been updating base R everywhere!);

3. (opinionated) Some people might be very bored to hear this: use LyX instead of plain LaTeX… because you are writing, not coding; LaTeX code is not fun to read…

4. for the LaTeX document class krantz.cls (by Chapman & Hall):

• to solve the only stupid problem in LaTeX (i.e., floating environments float to silly places by default), use something like this:

\renewcommand{\textfraction}{0.05}
\renewcommand{\topfraction}{0.8}
\renewcommand{\bottomfraction}{0.8}
\renewcommand{\floatpagefraction}{0.75}

I’m aware of the float package and the H option, and options like !tbp; I just do not want to force LaTeX to do anything – it may or may not be happy at some point.

• put \usepackage{emptypage} in the preamble to make empty pages really empty, as required by the copy editor.
• the document class krantz.cls does not work with the hyperref package, meaning that you cannot create bookmarks in the PDF; I have posted the solution here.
5. for authors whose native language is not English like me, here is a summary of my problems in English:

• when you want to use which, use that instead, unless there is a comma ahead, or you really want to emphasize a very specific object; e.g.,

“here is a package that is helpful” (correct)

“here is a package which is helpful” (wrong)

“we will introduce an extremely important technology next, which has revolutionized the life of poor statisticians”

• it is “A, B, and C” instead of “A, B and C”

• do not forget the comma in other places, either: “e.g.,”, “i.e.,”, “foo and bar, respectively”; actually, try to use the comma whenever possible to break long sentences into shorter pieces

6. for the plots, use the cairo_pdf() device when possible; in knitr, this means the chunk option dev = 'cairo_pdf'; the reason for the choice of cairo_pdf() over the normal pdf() device is that it can embed fonts in the PDF plot files, otherwise the copy editor will require you to embed all the fonts in the final PDF file of the book; normally pdflatex will embed fonts, and if there are fonts that are not embedded, it is very likely that they are from R graphics;

7. include as many figures as possible (I have 51 figures in this 200-page book), because this will make the number of pages grow faster (I’m evil) so that you will not feel frustrated, and the readers will not fall into the hell of endless text, just pages after pages;

8. prepare an extra monitor for copyediting;

9. learn a little bit about pdftk, because you may want to use it finally, e.g., replace one page with a blank page in the frontmatter;

10. learn these copy editing symbols (thanks, Matt Shotwell);

One thing I did not really understand was the punctuation marks like commas and periods should go inside quotation marks, e.g.,

I have “foo” and “bar.”

This makes me feel weird. I’m more comfortable with

I have “foo” and “bar”.

There was also one thing that I did not catch by version control – one figure file went wrong and I did not realize it, because normally I do not put binary files under version control. Fortunately, I caught it by my eyes. Karl Broman mentioned the same problem to me a while ago. I know there are tools for comparing images (ImageMagick, for example), and I was just too lazy to learn them.

I will be glad to know the experience of other authors, and will try to update this post according to the comments.

[attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) ) ) [date] => Array ( [raw] => Mon, 03 Jun 2013 00:00:00 +0000 [parsed] => 1370217600 ) ) ) [11] => FeedWordPie_Item Object ( [feed] => FeedWordPie Object *RECURSION* [data] => Array ( [data] => [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) [child] => Array ( [] => Array ( [title] => Array ( [0] => Array ( [data] => Travis CI for R! (not yet) [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [link] => Array ( [0] => Array ( [data] => https://yihui.name/en/2013/04/travis-ci-general-purpose/ [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [pubDate] => Array ( [0] => Array ( [data] => Fri, 12 Apr 2013 00:00:00 +0000 [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [guid] => Array ( [0] => Array ( [data] => https://yihui.name/en/2013/04/travis-ci-general-purpose/ [attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) [description] => Array ( [0] => Array ( [data] =>

A few days ago I wrote about Travis CI, and was wondering if we could integrate the testing of R packages into this wonderful platform. A reader (Vincent Arel-Bundock) pointed out in the comments that Travis was running Ubuntu that allows you to install software packages at your will.

I took a look at the documentation, and realized they were building and testing packages in virtual machines. No wonder sudo apt-get works. Remember apt-get -h | tail -n1:

This APT has Super Cow Powers. (APT有超级牛力)

## R on Travis CI

Now we are essentially system admins, and we can install anything from Ubuntu repositories, so it does not really matter that Travis CI does not support R yet. Below are a few steps to integrate your R package (on Github) into this system:

1. follow the official guide util you see .travis.yml;
2. copy my .travis.yml for the knitr package if you want, or write your own;
• I use a custom library path ~/R to install add-on R packages so that I do not have to type sudo everywhere
• at the moment I use the RDev PPA by Michael Rutter to install R 3.0.0 since his plan for R 3.0 on CRAN is in May; at that time I’ll change this PPA to a CRAN repository
• since R CMD check requires all packages in Suggests as well, I install knitr using install.packages(dep = TRUE) to make sure all relevant packages are installed
• make install and make check are wrappers of R CMD build and R CMD check respectively, defined in the Makefile
3. push this .travis.yml to Github, and Travis CI will start building your package when a worker is available (normally within a few seconds);

By default you will receive email notifications when there are changes in the build. You can also find the guide on the build status image in the documentation as well, e.g.

What I described here actually applies to any software packages (not only R), as long as the dependencies are available under Ubuntu, or you know how to build them.

## But it is still far from CRAN

OK, it works, but we are still a little bit far from what CRAN does, because Travis CI does not have official support for R. Each time we have to install one Gigabyte of additional software to create the R testing environment (sigh, if only R did not have to tie itself to LaTeX). If these packages are pre-built in the virtual machines, it will save us a lot of time.

The second problem is, there is no Windows support on Travis CI (one developer told us on Twitter that it was coming). There is a page for OS X, but I did not really figure out how to build software under OS X there.

The third problem is Travis CI only builds and tests packages; it does not provide downloads like CRAN. Perhaps we can upload the packages using encryption keys to our own servers.

## R-Forge, where are you going?

I will shut up here since I realized I was not being constructive. Let me spend more time thinking about this, and I love to hear suggestions from readers as well.

So, two potential Google Summer of Code projects:

• make R an officially supported language on Travis CI (this really depends on if the Travis team want it or not)
• improve R-Forge (of course this depends on if the R-Forge team think they need help or not)
[attribs] => Array ( ) [xml_base] => [xml_base_explicit] => [xml_lang] => [xmlns] => Array ( [atom] => http://www.w3.org/2005/Atom ) ) ) ) ) [date] => Array ( [raw] => Fri, 12 Apr 2013 00:00:00 +0000 [parsed] => 1365724800 ) ) )