Categorical Data Analysis Using SAS, Third Edition , livre ebook

icon

507

pages

icon

English

icon

Ebooks

2012

Lire un extrait
Lire un extrait

Obtenez un accès à la bibliothèque pour le consulter en ligne En savoir plus

Découvre YouScribe et accède à tout notre catalogue !

Je m'inscris

Découvre YouScribe et accède à tout notre catalogue !

Je m'inscris
icon

507

pages

icon

English

icon

Ebooks

2012

Lire un extrait
Lire un extrait

Obtenez un accès à la bibliothèque pour le consulter en ligne En savoir plus

Statisticians and researchers will find Categorical Data Analysis Using SAS, Third Edition, by Maura Stokes, Charles Davis, and Gary Koch, to be a useful discussion of categorical data analysis techniques as well as an invaluable aid in applying these methods with SAS. Practical examples from a broad range of applications illustrate the use of the FREQ, LOGISTIC, GENMOD, NPAR1WAY, and CATMOD procedures in a variety of analyses. Topics discussed include assessing association in contingency tables and sets of tables, logistic regression and conditional logistic regression, weighted least squares modeling, repeated measurements analyses, loglinear models, generalized estimating equations, and bioassay analysis.
The third edition updates the use of SAS/STAT software to SAS/STAT 12.1 and incorporates ODS Graphics. Many additional SAS statements and options are employed, and graphs such as effect plots, odds ratio plots, regression diagnostic plots, and agreement plots are discussed. The material has also been revised and reorganized to reflect the evolution of categorical data analysis strategies. Additional techniques include such topics as exact Poisson regression, partial proportional odds models, Newcombe confidence intervals, incidence density ratios, and so on.
This book is part of the SAS Press program.
Voir icon arrow

Publié par

Date de parution

31 juillet 2012

EAN13

9781612900902

Langue

English

Poids de l'ouvrage

23 Mo

The correct bibliographic citation for this manual is as follows: Stokes, Maura E., Charles S. ® Davis, and Gary G. Koch. 2012., Third Edition.Categorical Data Analysis Using SAS Cary, NC: SAS Institute Inc. ® Categorical Data Analysis Using SAS , Third Edition Copyright © 2012, SAS Institute Inc., Cary, NC, USA ISBN 978-1-61290-090-2 (electronic book) ISBN 978-1-60764-664-8 All rights reserved. Produced in the United States of America. For a hard-copy book:No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, or otherwise, without the prior written permission of the publisher, SAS Institute Inc. For a Web download or e-book:Your use of this publication shall be governed by the terms established by the vendor at the time you acquire this publication. The scanning, uploading, and distribution of this book via the Internet or any other means without the permission of the publisher is illegal and punishable by law. Please purchase only authorized electronic editions and do not participate in or encourage electronic piracy of copyrighted materials. Your support of others' rights is appreciated. U.S. Government Restricted Rights Notice:Use, duplication, or disclosure of this software and related documentation by the U.S. government is subject to the Agreement with SAS Institute and the restrictions set forth in FAR 52.227-19, Commercial Computer Software-Restricted Rights (June 1987). SAS Institute Inc., SAS Campus Drive, Cary, North Carolina 27513-2414 1st printing, July 2012 SAS Institute Inc. provides a complete selection of books and electronic products to help customers use SAS software to its fullest potential. For more information about our e-books, e-learning products, CDs, and hard-copy books, visit the SAS Books Web site at http://support.sas.com/publishing/index.htmlor call 1-800-727-3228. ® SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are registered trademarks or trademarks of their respective companies. Conversion Date: 11-Nov-2012
Contents
Chapter 1. Introduction Chapter 2. The 2 × 2 Table Chapter 3. Sets of 2 × 2 Tables Chapter 4. 2 × r and s × 2 Tables Chapter 5. The s × r Table Chapter 6. Sets of s × r Tables Chapter 7. Nonparametric Methods Chapter 8. Logistic Regression I: Dichotomous Response Chapter 9. Logistic Regression II: Polytomous Response Chapter 10. Conditional Logistic Regression Chapter 11. Quantal Response Data Analysis Chapter 12. Poisson Regression and Related Loglinear Models Chapter 13. Categorized Time-to-Event Data Chapter 14. Weighted Least Squares Chapter 15. Generalized Estimating Equations References Index Accelerate Your SAS Knowledge with SAS Books
Preface to the Third Edition
® Thethsoftwareird edition accomplishes several purposes. First, it updates the use of SAS to current practices. Since the last edition was published more than 10 years ago, numerous ® sets of example statements have been modified to reflect best applications of SAS/STAT software. Second, the material has been expanded to take advantage of the many graphs now provided by SAS/STAT software through ODS Graphics. Beginning with SAS/STAT 9.3, these graphs ® are available with SAS/STAT—no other product license is required (a SAS/GRAPH license was required for previous releases). Graphs displayed in this edition include:
• mosaic plots • effect plots • odds ratio plots • predicted cumulative proportions plot • regression diagnostic plots • agreement plots
Third, the book has been updated and reorganized to reflect the evolution of categorical data analysis strategies. The previousChapter 14, “Repeated Measurements Using Weighted Least Squares,” has been combined with the previousChapter 13, “Weighted Least Squares,” to create the currentChapter 14, “Weighted Least Squares.” The material previously in Chapter 16, “Loglinear Models,” is found in the currentChapter 12, “Poisson Regression and Related Loglinear Models.” The material inChapter 10, “Conditional Logistic Regression,” has been rewritten, andChapter 8, “Logistic Regression I: Dichotomous Response,” andChapter 9, “Logistic Regression II: Polytomous Response,” have been expanded. In addition, the previous Chapter 16, “Categorized Time-to-Event Data” is the currentChapter 13. Numerous additional techniques are covered in this edition, including:
• incidence density ratios and their confidence intervals • additional confidence intervals for difference of proportions • exact Poisson regression • difference measures to reflect direction of association in sets of tables • partial proportional odds model • use of the QIC statistic in GEE analysis • odds ratios in the presence of interactions • Firth penalized likelihood approach for logistic regression
In addition, miscellaneous revisions and additions have been incorporated throughout the book. However, the scope of the book remains the same as described inChapter 1, “Introduction.”
Computing Details
The examples in this third edition were executed with SAS/STAT 12.1, although the revision
was largely based on SAS/STAT 9.3. The features specific to SAS/STAT 12.1 are:
• mosaic plots in the FREQ procedure • partial proportional odds model in the LOGISTIC procedure • Miettinen-Nurminen confidence limits for proportion differences in PROC FREQ • headings for the estimates from the FIRTH option in PROC LOGISTIC
Because of limited space, not all of the output that is produced with the example SAS code is shown. Generally, only the output pertinent to the discussion is displayed. An ODS SELECT statement is sometimes used in the example code to limit the tables produced. The ODS GRAPHICS ON and ODS GRAPHICS OFF statements are used when graphs are produced. However, these statements are not needed when graphs are produced as part of the SAS windowing environment beginning with SAS 9.3. Also, the graphs produced for this book were generated with the STYLE=JOURNAL option of ODS because the book does not feature color.
For More ïnformation
http://www.sas.com/catbook The website contains further information that pertains to topics in the book, including data (where possible) and errata.
Acknowledgments
We are grateful to the many people who have contributed to this revision. Bob Derr, Amy Herring, Michael Hussey, Diana Lam, Siying Li, Michela Osborn, Ashley Lauren Paynter, Margaret Polinkovsky, John Preisser, David Schlotzhauer, Todd Schwartz, Valerie Smith, Daniela Soltres-Alvarez, Donna Watts, Catherine Wiener, Laura Elizabeth Weiner, and Laura Zhou provided reviews, suggestions, proofing, and numerous other contributions that are greatly appreciated. And, of course, we remain thankful to those persons who contributed to the earlier editions. They include Diane Catellier, Sonia Davis, Bob Derr, William Duckworth II, Suzanne Edwards, Stuart Gansky, Greg Goodwin, Wendy Greene, Duane Hayes, Allison Kinkead, Gordon Johnston, Lisa LaVange, Antonio Pedroso-de-Lima, Annette Sanders, John Preisser, David Schlotzhauer, Todd Schwartz, Dan Spitzner, Catherine Tangen, Lisa Tomasko, Donna Watts, Greg Weier, and Ozkan Zengin. Anne Baxter and Ed Huddleston edited this book. Tim Arnold provided documentation programming support.
Chapter 1 Introduction
Contents 1.1Overview 1.2 Scale of Measurement 1.3 Sampling Frameworks 1.4 Overview of Analysis Strategies 1.4.1 Randomization Methods 1.4.2 Modeling Strategies 1.5 Working with Tables in SAS Software 1.6 Using This Book
1.1 Overview
Data analysts often encounter response measures that are categorical in nature; their outcomes reflect categories of information rather than the usual interval scale. Frequently, categorical data are presented in tabular form, known as contingency tables. Categorical data analysis is concerned with the analysis of categorical response measures, regardless of whether any accompanying explanatory variables are also categorical or are continuous. This book discusses hypothesis testing strategies for the assessment of association in contingency tables and sets of contingency tables. It also discusses various modeling strategies available for describing the nature of the association between a categorical response measure and a set of explanatory variables. An important consideration in determining the appropriate analysis of categorical variables is their scale of measurement.Section 1.2describes the various scales and illustrates them with data sets used in later chapters. Another important consideration is the sampling framework that produced the data; it determines the possible analyses and the possible inferences.Section 1.3describes the typical sampling frameworks and their ramifications.Section 1.4introduces the various analysis strategies discussed in this book and describes how they relate to one another. It also discusses the target populations generally assumed for each type of analysis and what types of inferences you are able to make to them.Section 1.5reviews how SAS software handles contingency tables and other forms of categorical data. Finally,Section 1.6 provides a guide to the material in the book for various types of readers, including indications of the difficulty level of the chapters.
1.2 Scale of Measurement
The scale of measurement of a categorical response variable is a key element in choosing an appropriate analysis strategy. By taking advantage of the methodologies available for the
particular scale of measurement, you can choose a well-targeted strategy. If you do not take the scale of measurement into account, you may choose an inappropriate strategy that could lead to erroneous conclusions. Recognizing the scale of measurement and using it properly are very important in categorical data analysis. Categorical response variables can be
• dichotomous • ordinal • nominal • discrete counts • grouped survival times
Dichotomousresponses are those that have two possible outcomes—most often they are yes and no. Did the subject develop the disease? Did the voter cast a ballot for the Democratic or Republican candidate? Did the student pass the exam? For example, the objective of a clinical trial for a new medication for colds is whether patients obtained relief from their pain-producing ailment. ConsiderTable 1.1, which is analyzed inChapter 2, “The 2 × 2 Table.”
able 1.1Respiratory Outcomes
The placebo group contains 64 patients, and the test medication group contains 60 patients. The columns contain the information concerning the categorical response measure: 40 patients in the Test group had a favorable response to the medication, and 20 subjects did not. The outcome in this example is thus dichotomous, and the analysis investigates the relationship between the response and the treatment. Frequently, categorical data responses represent more than two possible outcomes, and often these possible outcomes take on some inherent ordering. Such response variables have an ordinalscale of measurement. Did the new school curriculum produce little, some, or high enthusiasm among the students? Does the water exhibit low, medium, or high hardness? In the former case, the order of the response levels is clear, but there is no clue as to the relative distances between the levels. In the latter case, there is a possible distance between the levels: medium might have twice the hardness of low, and high might have three times the hardness of low. Sometimes the distance is even clearer: a 50% potency dose versus a 100% potency dose versus a 200% potency dose. All three cases are examples of ordinal data. An example of an ordinal measure occurs in data displayed inTable 1.2, which is analyzed in Chapter 9, “Logistic Regression II: Polytomous Response.” A clinical trial investigated a treatment for rheumatoid arthritis. Male and female patients were given either the active treatment or a placebo; the outcome measured was whether they showed marked, some, or no improvement at the end of the clinical trial. The analysis uses the proportional odds model to assess the relationship between the response variable and gender and treatment.
able 1.2Arthritis Data
Note that categorical response variables can often be managed in different ways. You could combine the Marked and Some columns inTable 1.2to produce a dichotomous outcome: No Improvement versus Improvement. Grouping categories is often done during an analysis if the resulting dichotomous response is also of interest. If you have more than two outcome categories, and there is no inherent ordering to the categories, you have anominalmeasurement scale. Which of four candidates did you vote for in the town council election? Do you prefer the beach, mountains, or lake for a vacation? There is no underlying scale for such outcomes and no apparent way in which to order them. ConsiderTable 1.3, which is analyzed inChapter 5, “Thes×rTable.” Residents in one town were asked their political party affiliation and their neighborhood. Researchers were interested in the association between political affiliation and neighborhood. Unlike ordinal response levels, the classifications Bayside, Highland, Longview, and Sheffeld lie on no conceivable underlying scale. However, you can still assess whether there is association in the table, which is done inChapter 5.
able 1.3Distribution of Parties in Neighborhoods
Categorical response variables sometimes containdiscrete counts.Instead of falling into categories that are labeled (yes, no) or (low, medium, high), the outcomes are numbers themselves. Was the litter size 1, 2, 3, 4, or 5 members? Did the house contain 1, 2, 3, or 4 air conditioners? While the usual strategy would be to analyze the mean count, the assumptions required for the standard linear model for continuous data are often not met with discrete counts that have small range; the counts are not distributed normally and may not have homogeneous variance. For example, researchers examining respiratory disease in children visited children in different regions two times and determined whether they showed symptoms of respiratory illness. The response measure was whether the children exhibited symptoms in 0, 1, or 2 periods.Table 1.4contains these data, which are analyzed inChapter 14, “Weighted Least Squares.”
able 1.4Colds in Children
The table represents a cross-classification of gender, residence, and number of periods with colds. The analysis is concerned with modeling mean colds as a function of gender and residence. Finally, another type of response variable in categorical data analysis is one that represents survival times.With survival data, you are tracking the number of patients with certain outcomes (possibly death) over time. Often, the times of the condition are grouped together so that the response variable represents the number of patients who fail during a specific time interval. Such data are calledgrouped survival times.For example, the data displayed in Table 1.5are fromChapter 13, “Categorized Time-to-Event Data.” A clinical condition is treated with an active drug for some patients and with a placebo for others. The response categories are whether there are recurrences, no recurrences, or whether the patients withdrew from the study. The entries correspond to the time intervals 0-1 years, 1-2 years, and 2-3 years, which make up the rows of the table.
able 1.5Life Table Format for Clinical Condition Data
1.3 Sampling Frameworks
Categorical data arise from different sampling frameworks. The nature of the sampling framework determines the assumptions that can be made for the statistical analyses and in turn influences the type of analysis that can be applied. The sampling framework also determines the type of inference that is possible. Study populations are limited to target populations, those populations to which inferences can be made, by assumptions justified by the sampling framework. Generally, data fall into one of three sampling frameworks: historical data, experimental data, and sample survey data.Historical dataare observational data, which means that the study
Voir icon more
Alternate Text