(This article was originally published at The DO Loop, and syndicated at StatsBlogs.)
Did you know that you can index into SAS/IML matrices by using unique strings that you assign via the MATTRIB statement? The MATTRIB statement associates various attributes to a matrix. Usually, these attributes are only used for printing, but you can also use the ROWNAME= and COLNAME= attributes to subset a matrix.
For example, the following statements read the names and populations (in 2005) of 197 countries into SAS/IML vectors:
proc iml;
use Sashelp.Demographics;
read all var {Name Pop};
close;You can use the MATTRIB statement to assign the names of countries to the rows of the Pop matrix. You can then subset the rows by using a character array of names, as follows:
mattrib Pop rowname=Name; /* access rows by using the country name */
/* extract and print populations of large countries */
countries = {"China" "India" "United States" "Indonesia" "Brazil"};
BigPop = Pop[countries, ]; /* extract subset */
print BigPop[rowname=countries format=comma13.];
Using names rather than indices can make a program more readable. For example, to find the population of a continent, you can add up the populations of the countries in the continent:
/* find population of North America */ NAPop = Pop["United States", ] + Pop["Canada", ]; print NAPop[label="Population of North America (2005)" format=comma13.];
Of course, for many data analysis tasks, you don't want to hard-code the observations, but want to compute them by using some sort of query. For example, you might be interested in the populations of island nations. To find all countries that have the word "islands" as part of their name and print their populations, run the following statements:
/* find population of countries with "Islands" in their name */ idx = loc( find(Name, "ISLANDS") ); u = propcase( Name[idx] ); print (Pop[u, ])[rowname=u format=comma13.];
Using the MATTRIB function to assign row names or column names can result in readable code. However, it isn't as efficient as directly accessing the indices of a matrix, so I would usually extract the populations of island nations by using Pop[idx].
I sometimes use this technique for demonstrations and presentations, and I think it would be useful for teaching. Can you think of another instance in which this technique would be useful?
Please comment on the article here: The DO Loop
