Three SAS Tips


2/2011

Many statistical agencies often use SAS for statistical work. In this blog post, I’ll show a few tips that I’ve learned that have helped make reviewing of data more efficient. The code below will create a “temp” dataset of hypothetical current and prior year data for two variables for five establishments.

data temp;
input name $ x2011 x2010 y2011 y2010;
datalines;
Estab1 10 20 40 30
Estab2 10 40 20 50
Estab3 20 50 50 40
Estab4 40 40 40 40
Estab5 30 40 30 40
;
run;

Ordering your dataset

Often when you have a dataset and do various processing on it, including creating new variables, sorting, setting, transposing, merging, etc., your variables can be in an order that you do not like. One way to order your columns how you prefer them is to open the dataset in SAS, manually drag the columns where you want them, then re-save the dataset. There is a better approach.

The current order of the columns is name, x2011, x2010, y2011, y2010. Say we want to order them as name, x2010, y2010, x2011, y2011. The following code will do just that.

data temp2;
retain name x2010 y2010 x2011 y2011;
set temp;
run;

Keeping certain variables

Often we have datasets with many variables we do not need for the task we are doing. Say I want to keep just the x-named variables in a dataset. I could do

data temp3 (keep = x2011 x2010);
set temp;
run;

However, notice that I have to type in each of the x-named variables. For a large number of variables this is intolerable. Try instead to use the colon

data temp4 (keep = x:);
set temp;
run;

The task is more complicated if you want to, say, keep all variables with “sales” in the names, where “sales” can occur anywhere in the name, like sales2010, or tempsales, etc.

Inspecting subsets of your dataset

Often we work with large datasets. If I want to quickly inspect the columns and rows of a dataset to see if the contents are reasonable, typically before moving to the next step in processing, one way to do this is to look at the first few observations. The following code will look at the first 3 observations of the “temp” dataset.

data temp5;
set temp (obs=3);
run;

However, if the data are sorted in an order, looking at the first so many might give you a non-representative picture of the dataset. To select a few observations for random inspection, we can do

proc surveyselect data=temp method=srs n=3 out=temp6;
run;

Note that the SRS option requests simple random sampling, which is selection with equal probability and without replacement. There are more advanced options.

It seems that every year, through documentation, internet searches, trial and error, and co-workers, I learn a few great SAS tips which improve how I work. I hope you found these few tips useful and put them to work for you.


If you enjoyed any of my content, please consider supporting it in a variety of ways: