Beware the naked LOC

November 19, 2012

(This article was originally published at The DO Loop, and syndicated at StatsBlogs.)

The LOC function is one of the most important functions in the SAS/IML language. The LOC function finds elements of a vector or matrix that satisfy some condition. For example, if you are going to apply a logarithmic transform to data, you can use the LOC function to find all of the positive elements.

In many cases, however, you do not know whether any observations satisfy the condition of your search. As an example, use the following statements to generate 10 random values from the normal distribution with mean –1 and standard deviation 1:

proc iml;
call randseed(123);
x = j(10,1); /* allocate 10 elements */ 
call randgen(x, "Normal", -1, 1); /* x ~ N(-1,1) */

Suppose that you want to read the data into a SAS/IML vector and extract the positive values. You might be tempted to write the following statements:

idx = loc(x>0);  /* find indices for which x>0 */
PosX = x[idx];   /* extract those elements...BE CAREFUL! */

You need to be careful here. You can't be sure that there are any positive values in the data. As I say in my book, Statistical Programming with SAS/IML Software (p. 64):

Never use the result of the LOC function without knowing that the result is a nonempty matrix. You can use the TYPE, NROW, or NCOL functions to check whether a matrix is empty.

I colloquially refer to expressions such as x[idx] or x[loc(x>0)] as using a "naked LOC" because the return value of the LOC function is bare and unprotected. If x does not contain positive values, then the LOC function returns an empty matrix, and the statement x[EmptyMatrix] causes an error.

The correct approach is to check that LOC returns a nonempty matrix. An empty matrix has zero rows and zero columns, so you can use the NCOL function to check whether a vector is empty:

if ncol(idx)>0 then 
   PosX = x[idx];
else print "There are no positive values";

Sometimes you might know that the LOC function will return a nonempty matrix, such as in the very useful UNIQUE-LOC technique. In those cases, you can omit the IF-THEN statement and simply extract the elements that you want. In general, however, it is a good programming practice to check that at least one element satisfies the condition.

So beware the "naked LOC." If the LOC function might return an empty matrix, protect yourself. Don't write expressions like PosX = x[loc(x>0)] unless you know that the LOC function will always return a nonempty matrix.

tags: Getting Started, Tips and Techniques

Please comment on the article here: The DO Loop

Tags: , ,