One of the reports that is run for quality assurance and audit purposes is our “Missing and Unknown

icon

9

pages

icon

English

icon

Documents

Écrit par

Publié par

Le téléchargement nécessite un accès à la bibliothèque YouScribe Tout savoir sur nos offres

icon

9

pages

icon

English

icon

Documents

Le téléchargement nécessite un accès à la bibliothèque YouScribe Tout savoir sur nos offres

An Example of Using Differing Methods to Create Excel Output from SAS Julie Kilburn, Rizvan Mamet, Anna Ter Veer, Ann Vanderplas, Rebecca Ottesen, Joyce Niland, City of Hope, Duarte, CA ABSTRACT DDE, ODS HTML, ODS XML with tagsets…Which is best? The answer depends entirely on the customer’s desired output. This paper will review, by example, the use of DDE to create various Excel reports. It will then explain in detail a more effective way to move data from SAS to Excel for our particular need. In this case, the customers were clinical research associates (CRAs) at 39 institutions collecting data for the National Comprehensive Cancer Network (NCCN) Outcomes Database. The desired output was a detailed report summarizing the number and proportion of patients with missing or unknown categories for each data element in the NCCN database, and institution-specific patient listings for the CRA to review for quality assurance purposes. The pros and cons of each approach and the rationale for our selected approach in this challenging multi-center environment will be described. INTRODUCTION The National Comprehensive Cancer Network (NCCN) Outcomes Database collects outcomes data on breast cancer, non-Hodgkin’s lymphoma, non-small cell lung, ovarian, and colorectal cancers from 19 NCCN member institutions, and 20 community sites affiliated with a NCCN member institution, across the United States. Each n has one or more clinical research ...
Voir icon arrow

Publié par

Langue

English

An Example of Using Differing Methods to Create Excel Output from SAS
Julie Kilburn, Rizvan Mamet, Anna Ter Veer, Ann Vanderplas, Rebecca Ottesen,
Joyce Niland, City of Hope, Duarte, CA
ABSTRACT
DDE, ODS HTML, ODS XML with tagsets…Which is best?
The answer depends entirely on the customer’s desired
output.
This paper will review, by example, the use of DDE to create various Excel reports.
It will then explain in
detail a more effective way to move data from SAS to Excel for our particular need.
In this case, the customers were
clinical research associates (CRAs) at 39 institutions collecting data for the National Comprehensive Cancer Network
(NCCN) Outcomes Database.
The desired output was a detailed report summarizing the number and proportion of
patients with missing or unknown categories for each data element in the NCCN database, and institution-specific
patient listings for the CRA to review for quality assurance purposes.
The pros and cons of each approach and the
rationale for our selected approach in this challenging multi-center environment will be described.
INTRODUCTION
The National Comprehensive Cancer Network (NCCN) Outcomes Database collects outcomes data on breast
cancer, non-Hodgkin’s lymphoma, non-small cell lung, ovarian, and colorectal cancers from 19 NCCN member
institutions, and 20 community sites affiliated with a NCCN member institution, across the United States.
Each
institution has one or more clinical research associates (CRAs) assigned to collecting eligible patient data from
electronic or paper medical records, and entering the data into a network-based database.
For institutions that have
more programming savvy, data can be entered into their legacy system and sent to the NCCN database via quarterly
FTP submission.
The data management of the NCCN Outcomes Database includes rigorous quality assurance (QA)
processes.
These QA processes include initial and follow-up data management training of the CRAs, edit checks
during data entry, logic checks of the data elements and tables that are relational, and quarterly QA reports for each
participating institution across the five cancer sites.
In addition to the quarterly QA reports, annual audits of a random
sample of patients comparing source documents against entered data takes place at each participating institution.
The Data Coordinating Center (DCC) of the NCCN Outcomes Database is located at City of Hope, and consists of a
team of database programmers and statisticians lead by the DCC principal investigator.
The DCC oversees the
database operations and data analysis, with the assistance of the Scientific Office (SO) at the Dana-Farber Cancer
Institute.
One of the many analytic functions of the DCC is to program QA reports for each institution and disease
site, and post the reports on the NCCN Outcomes Database webpage for institution review.
An additional function of
the DCC is to program outcomes reports, including an annual report of benchmarked data, the focus of two other
papers by ter Veer and Ottesen 2009 presented at WUSS
1,2
.
This paper introduces two examples of QA reports, both of which are output into Excel.
The focus of this paper is to
discuss three different methods of outputting data from SAS to Excel: 1) Dynamic Data Exchange, 2) ODS HTML,
and 3) ODS XML with tagsets.
For each method, a screenshot of a partial Excel report is shown, along with the SAS
code that was used to generate the Excel report.
The advantages and disadvantages (pros and cons) of each
method will be described, including the rationale for choosing a preferred method for the final resulting QA reports.
METHOD 1:
DYNAMIC DATA EXCHANGE (DDE)
To determine how complete the data collection is at a certain date, the CRAs examine results in the “Missing and
Unknown Summary Report”.
The report comes in two main sections: the first part lists a summary of the percentages
of missing and unknown for each variable in a data table, the second section lists the patients with missing and
unknown data in each table so that they CRA can review the patient level data.
This Excel report is distributed to the
CRAs on a quarterly basis for each center and across disease sites.
Clearly, with so many reports to distribute, the
programming and output needs to be automated.
In this case, the one method we will discuss for outputting and
automating the Excel reports is DDE.
Figure 1 is an example of two tables within the summary section of the Missing and Unknown Summary Report,
specifically the Demographics and Study Accession Data tables.
The report lists every data element collected in the
tables across all tables in the database.
Each portion of the summary section represents a data table contained in
the database.
The report across disease sites is not completely identical, as each disease site has additional data
tables and data elements that are collected for that type of cancer only.
The first column, named Variable, lists the
data element description.
The Missing column shows the percent of patients that do not have data entered, that is,
the CRA has not collected data for that percent of patients.
The Unknown column shows the percent of patients that
have an entry of unknown, that is, the CRA reviewed the chart but the answer for that data element is not listed in the
chart .
For example, of the 1725 patients that had data for the Study Accession Table, 0% were missing education
status, and 22% had an unknown education status, as this information was not listed in the chart.
The red font
1
signifies a percent above 25%; the green font signifies a percent from 10% to <25%.
This color classification helps
underscore the data elements that require further scrutiny by the CRAs.
Additionally rules are built into the
programming so that data elements will be checked for missing/unknown status when it makes sense to do so.
For
example racial background other, an open text field in the Demographics table, is only checked for the N=12 patients
who indicated that their race was other.
Figure 1. An Example of Missing and Unknown Summary Report
The process for creating this first section of the Missing and Unknown Summary Report requires creating an Excel
template, manipulating the data in SAS to get a dataset that contains the desired output, and outputting the data from
SAS to Excel using DDE.
The initial task is to create a template of the desired report in Excel, and is typically
completed by the project manager at the SO and does not require the knowledge of SAS programming.
All of the
formatting is defined in Excel, including header and footer information, borders, shading, color coding, and page
numbering.
The next task is to create a SAS dataset for the desired output.
In turn, the SAS variables from this
dataset are output from SAS to Excel using DDE.
The DDE portion of these tasks are programmed in the sample
SAS code listed below.
2
SAMPLE SAS CODE
Initially the coding for the Missing and Unknown Summary report involves running a series of PROCs to review the
patient level data and calculate the number of missing and unknown per variable in each data table.
The coding for
this portion is beyond the scope of this paper.
Once a data table level data set has been created which contains the
number and percentages for missing and unknown per variable we can output this information via DDE to the report.
In this data set we have the following variables:
The SAS System
The CONTENTS Procedure
Variables in Creation Order
#
Variable
Type
Len
1
var
Char
15
2
longname
Char
50
3
denom
Num
8
4
pcntmiss
Num
8
5
pcntunk
Num
8
6
total
Num
8
** Macro to output data to report via DDE;
%macro dumpnow(dsn);
data
_null_; set &dsn;
file alldump notab;
** Outputs each desired variable and adds text formatting to denom variable;
put longname
' (N=' denom ')' '09'x pcntmiss '09'x pcntunk '09'x total '09'x;
run;
%mend;
** Wakes up Excel and opens the template file for the summary report;
options noxsync noxwait;
X ' c:\nccn\WUSS\Templates\Missing
_Unknown_Summary DDE.xls';
** DDE triplet points to location in Excel where to store Demographics output;
filename alldump DDE
'Excel| c:\nccn\WUSS\Templates\
[Missing_Unknown_Summary DDE.xls]all!R3C1:R17C4';
** Calling output macro for Demographics results;
%dumpnow(MUDemographics);
** DDE triplet points to location in Excel where to store Demographics output;
filename alldump DDE
'Excel| c:\nccn\WUSS\Templates\[Missing_Unknow
n_Summary DDE.xls]all!R23C1:R43C4';
** Calling output macro for Demographics results;
%dumpnow(MUStudyacc);
In summary, for this first section of the missing/unknown report the use of Dynamic Data Exchange along with SAS
macros simplifies the creation of multiple reports across centers and disease sites from existing templates.
Almost all
of the monthly and quarterly reports generated can be run with these tools.
Table 1 lists the pros and cons of
choosing DDE for Excel reports.
Table 1. Pros and Cons for Choosing DDE for Excel Reports
DDE Pros:
Get the ‘look’ we want by creating template
Can pinpoint very specific data ranges (can output to one cell if needed)
DDE Cons:
Initially labor intensive
Changes must be updated in template and programming
Must know exactly where data will be placed
3
METHOD 2: ODS HTML
After the CRAs have reviewed the Missing and Unknown summary section, the next step is for the CRAs to review
the actual list of patients that have a high percent of missing and/or unknown data elements.
This way, the CRA can
request the charts on these patients and complete the data collection for the missing data elements.
To add this
level of detail we had to consider another method for automating the Excel reports.
This is because DDE would no
longer work for automation purposes as the number of patients from any given table and institution would vary.
Therefore we decided to use ODS HTML as our first attempt.
Figure 2 shows an example of a Missing and Unknown Patient List for the variables racial background other and
hispanic, data element s that are collected for all five disease sites.
This list is specific to a center, and the center is
listed under the Center ID column.
The patient identifiers are listed under the Patient ID column.
The last column
lists whether the patient record was missing or unknown for racial background and then again for hispanic.
In this
example, all the listed patient records have a racial background other, collected as a character variable, of missing
and Hispanic, collected as numeric, is unknown.
Figure 2. Example of Missing and Unknown Patient List
As previously mentioned the DDE method is not ideal for generating the Missing Unknown Patient List.
To begin
with, generating the specific patient list can be tedious, and does not lend itself to use for multiple centers, since each
center may need to see different data elements.
The DDE method requires the making of a template for the one-time
request of a handful of data elements. Because of the requirement to specifically place the incoming data, the
DDE/Template method necessitates the creation of one worksheet per data element (we could have any number of
4
patients from the database with a missing or unknown value, from zero to potentially thousands).
A complete
workbook would have hundreds of sheets, many of which would contain no data.
The ODS HTML method allows the flexibility of one or multiple PRINT procedures to be output to Excel. Only the lists
with more than zero observations are shown. The programming to move the data from SAS to Excel is simple.
However, the report can be unwieldy, with all lists dumping one after the other in one worksheet.
The look of the
report is controlled only with the STLYE option in the ODS statement. However, the header and footer cannot be
controlled from SAS.
As exemplified in Figure 2, the SAS-generated title outputs to the first cell and causes the list
spacing to be distorted for the Center ID column.
The sample SAS code is listed below.
SAMPLE SAS CODE
** Including a format to distinguish for variables with missing or unknown;
proc format;
value mu
.='Missing'
-1='Unknown'
other='Valid';
run;
** Close the listing destination and open an html destination;
** Note the resulting file is saved as xls;
ods listing close;
ods html file='c:\nccn\WUSS\output\MU_pt_list.xls' style=journal;
proc print data=demoX
(where=(pid^=. and compstat in(.,-1))) label noobs;
title 'Missing Unknown Patient List: Completion Status';
var cid pid compstat;
format compstat mu.;
label cid='Center ID'
pid='Patient ID'
compstat='Completion Status';
run;
** Note -1 for dob is a valid value because it is a date;
proc print data=demoX
(where=(pid^=. and dob in(.))) label noobs;
title 'Missing Unknown Patient List: Date of Birth';
var cid pid dob;
label cid='Center ID'
pid='Patient ID'
dob='Date of Birth';
run;
proc print data=demoX
(where=(compstat^=. and racebg in(.,-
1))) label noobs;
title 'Missing Unknown Patient List: Racial Background';
var cid pid racebg;
format racebg mu.;
label cid='Center ID'
pid='Patient ID'
racebg='Racial Background';
run;
** Racebgother has a subcondition for patients who mark other as their race;
** Because this is character data missing is the only value searched;
proc print data=demoX
(where=(compstat^=. and racebg=0 and racebgother=' ')) label
noobs;
title 'Missing Unknown Patient List: Racial Background Other';
var cid pid racebgother;
label cid='Center ID'
pid='Patient ID'
racebgother='Racial Background Other';
run;
5
proc print data=demoX (where=(compstat^=. and hispanic in(.,-1))) label noobs;
title 'Missing Unknown Patient List: Spanish/Hispanic Background';
var cid pid hispanic;
format hispanic mu.;
label cid='Center ID'
pid='Patient ID'
hispanic='Spanish/Hispanic Background';
run;
/* ... and so on for all data elements ... */
** Closing the html destination and reactivating the listing window;
ods html close;
ods listing;
In summary, the use of ODS HTML to create the desired Excel output for the Missing Unknown Patient Listing
section is in its simplicity.
All that is needed to designate an Excel file is the .xls extension within the ODS HTML
FILE statement.
Table 2 lists the pros and cons of choosing ODS HTML for Excel reports.
Table 2. Pros and Cons for Choosing ODS HTML for Excel Reports
ODS HTML Pros:
Easy to program
Quick turn around time
ODS HTML Cons
:
Little control over formatting
Output unwieldy
METHOD 3: ODS XML WITH TAGSET EXCELXP
In contrast to the ODS HTML method, the ODS XML with tagset ExcelXP can easily create multiple sheet Excel
workbooks, with control over tab names, formatting, and even header and footer information. A tagset is a special
type of ODS destination (HTML, RTF, etc.).
Tagsets can be modified to meet specific needs by using the
TEMPLATE procedure.
The tagset we use here is called “ExcelXP.”
It creates XML output that can be opened using
Excel.
A comprehensive lesson on what can be done with this tagset, can be found in DelGobbo’s WUSS 2008
Presentation.
3
The sample programs and data used in his workshop are available at the SAS Presents Web site.
4
Within the downloadable package is the ExcelXP.sas program, and a sample of how to incorporate it is in the SAS
program CompleteCode.sas
Figure 3 shows an example of a Missing and Unknown Patient List Report for health insurance data elements, which
are collected for all five disease sites.
This list is specific to a center, and the center is listed under the Center ID
column.
The patient identifiers are listed under the Patient ID column.
The assessment identifiers are listed under
the Assessment ID column.
The last column lists whether the patient record was Missing or Unknown for change in
health insurance.
In this example, all the listed patient records have a change in health insurance of Missing.
Also
note that there are three separate worksheets, one per data table.
With ExcelXP, the output can be divided at any
point by using an additional ODS ExcelXP statement.
Figure 4 calls attention to the header that we would like to appear.
Note the use of all sections of the header, as well
as the formatting such as bold, italic, font styles and sizes. The worksheet has a title within the header and titles for
each of the output blocks in the body. This is all handled in the SAS program. Coding for this header and footer (not
shown) is found in the sample code below.
6
Figure 3. Missing and Unknown Patient List Report
Figure 4. Print Preview of Missing and Unknown Patient List Report
For the Missing and Unknown Patient List Report, the data is manipulated into a format that is ready for output.
For
the sake of discussion, a simplified version of the actual programming has been created.
We are using a PROC
PRINT that refers to data in the W9 directory.
The output file is an XML file that can be opened in Excel, and then
can be saved as an Excel Workbook.
The sample SAS code is included below.
7
SAMPLE SAS CODE
** include a version of the ExcelXP ODS tagset;
%include "&SAMPDIR.ExcelXP.sas";
** Close listing destination, and open ExcelXP destination;
ods listing close;
ods tagsets.ExcelXP path="C:\nccn\WUSS\output\" file="&outfile" style=journal;
**Set global options;
**Descriptions of the many tagset ExcelXP options can be found in
DelGobbo’s WUSS 2008 paper as well;
ods tagsets.ExcelXP options
(embedded_titles='yes'
suppress_bylines='yes'
Print_Header='&L&"Arial,Bold Italic"&10CONFIDENTIAL
&C&"Arial,Bold Italic"&10BCA Outcomes Project Missing and
Unknown Report: Patient Listing &RFor Internal Use Only'
Print_Footer='&L&8© 2009 National Comprehensive Cancer Network
&C&8June 17, 2009 &R&8&P'
Center_Horizontal='Yes');
** Create a new worksheet for each data table by using an ODS statement;
ods tagsets.ExcelXP options(sheet_interval='none' sheet_name='Demographics');
proc print data=w9.racial noobs label split='*';
var cid pid racebg;
title 'Demographics: Racial Background';
format racebg mu.;
label
cid='Center*ID'
pid='Patient*ID'
racebg='Racial*Background';
run;
** continue to print other Demographic table variables here…;
** switch to Health Insurance data and worksheet;
ods tagsets.ExcelXP options(sheet_interval='none' sheet_name='Insurance');
proc print data=w9.hichange noobs label split='*';
var cid pid assessid inschange;
title 'Insurance: Change in Health Insurance';
format inschange mu.;
label
cid='Center*ID'
pid='Patient*ID'
assessid='Assesssment*ID'
inschange='Change in*Health*Insurance';
run;
proc print data=w9.primaryt noobs label split='*';
var cid pid assessid ins1type;
title 'Insurance: Primary Insurance Type';
format ins1type mu.;
label
cid='Center*ID'
pid='Patient*ID'
assessid='Assesssment*ID'
ins1type='Primary Health*Ins Type';
run;
** continue to print other Health Insurance table variables here…;
8
9
** switch to Study Accession data and worksheet;
ods tagsets.ExcelXP options(sheet_interval='none' sheet_name='Study
Accession');
proc print data=w9.education noobs label split='*';
var cid pid edustat;
title 'Study Accession: Education Status';
format edustat mu.;
label
cid='Center*ID'
pid='Patient*ID'
edustat='Education*Status';
run;
** continue to print other StudyAcc table variables here…;
** Close ExcelXP destination and reopen the listing destination;
ods tagsets.ExcelXP close;
ods listing;
In summary, the use of ODS XML with tagset ExcelXP to create the desired Excel output is a much better method for
the patient listings of missing and unknown data elements.
We are able to group worksheets by data table and have
better control over what is output to each worksheet. Table 3 lists the pros and cons of choosing ODS XML with
tagsets ExcelXP for Excel reports.
Table 3. Pros and Cons for Choosing ODS XML with tagsets ExcelXP for Excel Reports
TAGSET EXCELXP Pros:
Creating template not required
Can accommodate multiple worksheets
Changes are made in SAS program
Will only create worksheets where there is output available
TAGSET EXCELXP Cons:
Must save as an Excel file
May be some format editing post-creation
CONCLUSION
There are many ways to move data from SAS to Excel.
Each has its advantages and disadvantages, and each is
quite useful depending on what the output actually contains and on the need of the client.
If a quick query is
requested, a simple ODS HTML output will do.
If the report needs to be a very specific set of tables or a specific
layout, then the DDE/template method will work quite well. If flexibility is important, but so also is style, then ODS
ExcelXP is a great option.
REFERENCES
1) Ottesen R,
Custom Outcomes Benchmarking Reports Using Dynamic Data Exchange, The Only Way To Go
,
Proceedings of the WUSS 2009 Conference, HOR-Ottesen.
2) Ter Veer A,
Reporting Quality of Cancer for the NCCN – A SAS Bridge to Excel
, Proceedings of the WUSS 2009
Conference, HOR-terVeer.
3)
DelGobbo ,
Tips and Tricks for Creating Multi-Sheet Excel Workbooks the Easy Way with SAS
,
Proceedings of the WUSS 2008 Conference, HOW07.
4)
DelGobbo,
Tips and Tricks for Creating Multi-Sheet Excel Workbooks the Easy Way with SAS
,
http://support.sas.com/rnd/papers/index.html#excel2008.
CONTACT INFORMATION
Your comments and questions are valued and encouraged.
Please contact the author at:
Julie Kilburn
559.783.9660
jkilburn@coh.org
City of Hope
1500 E. Duarte Road
Duarte, CA 91010
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS
Institute Inc. in the USA and other countries. ® indicates USA registration.
Other brand and product names are trademarks of their respective companies.
Voir icon more
Alternate Text