Using PROC TRANSPOSE in SAS


     As a Clinical SAS Programmer, you would come across situations where the data is not in the appropriate form to perform the comparisons of variables or derive meaningful insights wherein data has to be summarised further. Sponsor/stakeholders require only the part of the data to be produced in a more meaningful way and in most of the cases this makes life harder for the programmers to write huge lines of code writing all the conditions in an if/then & do loops which makes the program run slow and eat up a lot of CPU resources and also may become more complex for a new programmer to understand the code.

     PROC Transpose in layman terms is used when you want to see the variables transformed into observations or observations transformed into rows. Most of the clinical trials have lab data and they are very complex in their structure because of huge data which is getting collected in subsequent period/visits of the trials (which are very important in safety/efficacy related findings of a clinical trial) and to reduce the number of variables the database is designed in such a way that the values present in the variable itself represent a variable which in-turn results in having more observations in a dataset.

     Now I will assume that you have gone through the basic syntax of PROC Transpose and have known all the basic rules to implement the same we will only consider various options that can be used more effectively for example, BY statement or ID statements. The basic syntax of proc transpose can be as follows:

proc transpose data=<dataset name> out=<output dataset name> delimiter=<option> Prefix=<option> (keep,drop,where,rename statements);

by <variable names>;

copy <variable(s)>;

id <variable>;

var <variables to be transposed>;

idlabel <variable>;

run;      

     Out= option of outputting dataset should always be given as its always a good practice to output datasets separately in another dataset. DELIMITER= option is used to separate values of two variables specified in the ID statement. By variables are the ones which you need to keep in the output dataset.  COPY statement copies the variables directly without transposing them. ID statement is used to represent the values of that variable as a column. VAR statement is used to specify the variable to be transposed (A ':' keyword can be used at the end of the VAR statement which tells SAS to consider the variables which are identical/have the same prefixes). IDLABEL statement creates labels for the transposed variables.

Proc transpose data=xyz out=xyz1;

by pt;

id aeser;

var aegrpid:;

run;

     In this above example VAR statement has a ':' after the variable name aegrpid this signifies that SAS should use all the variables which contains this name (for example if you have created aegrpid1,aegrpid01,aegrpid2,aegrpid02 etc., then all these variables will be considered for the transpose hence the usage of the ':' option in VAR statement can be used when you are not sure of all the variable names but you don't want to manually specify the list  of the variables).

  Now as we know that the PROC transpose can only transpose known variables which are specified in the VAR statement. But what if the number of variables/variable names are not known to us then it will require an automated approach to achieve the output. There are many ways in which this can be achieved in one way is by making use of macros and another way is by using arrays to define the variables and calling arrays in the VAR statement.

     There are also cases when you have to use ID and VAR statements both at the same time and SAS recognises every ID vsalue should be a unique one. When you specify an ID which has repeating values or which has duplicate values then SAS throws a warning message (which is not acceptable as this will not be a good programming practice). There are very few approaches which I use when I encounter such situations. The first solution would be to use the LET option which will create the output dataset for you but it will pop up the warning messages (this approach can be used to verify whether the data can be transformed/how the data will look after transposing). The second option would be to create a unique variable such as a sequence number by SUBJID which will act as a unique identifier for each row. Thus, by using this newly created sequence number in the BY option, the warnings will disappear.   

     I have provided examples below for all the above-mentioned scenarios with the code, output and SAS logs for your reference.

SAS Logs

Typical Proc Transpose

The above example shows a typical PROC transpose code to the dataset sashelp.class. The output dataset is being processed as:

Output Dataset

     As in the dataset ID value, Sex is repeating in most of the records SAS is not able to create the output dataset throwing an error for the same. This issue can be resolved using the following approaches:

Dataset ID Value

     In the above code, option LET was used to override SAS to perform the operation. This option can only be used for testing whether the data is properly coming in the output.

     However, in the industry programs should not have any warnings/errors while being processed therefore it is recommended to use an alternative approach for resolving the issue in this scenario.

Alternative Approach

     The Second method would be by creating sequence number so that each record has a unique value. Hence by doing so I have created a column SEQ by using first.name (as name is the unique variable having different names in the dataset i will assume that you know about using first. And last. Options)

Second Method

Second Method

     In the above images, you could see that the output dataset is created as desired by us. There are two new variables Male and Female and has names within them. By creating a sequence number, you can solve the problem of repeating ID values. Always remember that sequence number should be unique per row.

Author: Nikhil, SAS Programmer

Tags