R Tip: Introduce Indices to Avoid for() Class Loss Issues

March 8, 2018
By

(This article was originally published at Statistics – Win-Vector Blog, and syndicated at StatsBlogs.)

Here is an R tip. Use loop indices to avoid for()-loops damaging classes.

Below is an R annoyance that occurs again and again: vectors lose class attributes when you iterate over them in a for()-loop.

d <- c(Sys.time(), Sys.time())
print(d)
#> [1] "2018-02-18 10:16:16 PST" "2018-02-18 10:16:16 PST"

for(di in d) {
  print(di)
}
#> [1] 1518977777
#> [1] 1518977777

Notice we printed numbers, not dates/times. To avoid this problem introduce an index, and loop over that, not over the vector contents.

for(ii in seq_along(d)) {
  di <- d[[ii]]
  print(di)
}
#> [1] "2018-02-18 10:16:16 PST"
#> [1] "2018-02-18 10:16:16 PST"

seq_along() is a handy function similar to what we discussed in R Tip: Use seq_len() to Avoid The Backwards List Trap.

The introduction of indices is ugly, as index-free iteration is generally superior. Also, as we have mentioned before, for-loops should not be considered anathema in R– they are a useful tool when used correctly.

Note base::ifelse() also loses class attributes, though dplyr::if_else() avoids the problem. Also base::lapply() and base::vapply() do not have the problem (for example try: vapply(d, as.character, character(1)) and lapply(d, class)).

In both cases R is treating a vector of numbers as a complex class by adding a class attr to the vector. This means the vector is a single object holding multiple times, not a list of individual time objects. Any subsetting that strips attr values loses the class information and the derived vector reverts to its underlying type (in this case double).

For pre-allocation ideas (an important compliment to for-loops) please see R Tip: Use vector() to Pre-Allocate Lists (also includes some discussion of for-loops).



Please comment on the article here: Statistics – Win-Vector Blog

Tags: , , , , , ,


Subscribe

Email:

  Subscribe