Building Regression Models with SAS , livre ebook

icon

464

pages

icon

English

icon

Ebooks

2023

Écrit par

Publié par

icon jeton

Vous pourrez modifier la taille du texte de cet ouvrage

Lire un extrait
Lire un extrait

Obtenez un accès à la bibliothèque pour le consulter en ligne En savoir plus

Découvre YouScribe en t'inscrivant gratuitement

Je m'inscris

Découvre YouScribe en t'inscrivant gratuitement

Je m'inscris
icon

464

pages

icon

English

icon

Ebooks

2023

icon jeton

Vous pourrez modifier la taille du texte de cet ouvrage

Lire un extrait
Lire un extrait

Obtenez un accès à la bibliothèque pour le consulter en ligne En savoir plus

Advance your skills in building predictive models with SAS!


Building Regression Models with SAS: A Guide for Data Scientists teaches data scientists, statisticians, and other analysts who use SAS to train regression models for prediction with large, complex data. Each chapter focuses on a particular model and includes a high-level overview, followed by basic concepts, essential syntax, and examples using new procedures in both SAS/STAT and SAS Viya. By emphasizing introductory examples and interpretation of output, this book provides readers with a clear understanding of how to build the following types of models:


  • general linear models
  • quantile regression models
  • logistic regression models
  • generalized linear models
  • generalized additive models
  • proportional hazards regression models
  • tree models
  • models based on multivariate adaptive regression splines

Building Regression Models with SAS is an essential guide to learning about a variety of models that provide interpretability as well as predictive performance.


Voir icon arrow

Publié par

Date de parution

18 avril 2023

Nombre de lectures

1

EAN13

9781951684006

Langue

English

Poids de l'ouvrage

59 Mo

sas
.

co
m

/boo
k

s

®
A Guid

e

e

f
or Data Scient is t

s
Ro

b

b
ert N. Rodri

g
ue

z
The corre ct bibli ograp hic citati on for this m anual is as foll ow s: Ro driguez, Ro bert N. 2023. Bui ldi ng Regr ession Models with SA S®: A
Guide for Dat a Scientists . Cary , NC: SAS Institute Inc.
Building Regr ession Models wit h SAS®: A G uide for Data Scie ntis ts
Copyrig ht © 20 23, SAS I nstitute Inc., Ca ry, NC, U SA
ISBN 97 8-1- 955977- 94- 4 (Har dcove r)
ISBN 978-1-63526-155-4 (Paperback)
ISBN 978-1-63526-190-5 (We b PDF)
ISBN 978-1-951684-00-6 (EPUB)
ISBN 978-1-951684-01-3 (Kindle)
All Righ ts Reserved . Produ ced in th e United States of America.
For a hard-cop y book: No pa rt of th is publica tion m ay be r eprod uced, store d in a ret rieval syste m, or tra nsmitted, i n any form or by any
means, el ectroni c, mechani cal, pho tocopying , or oth erwise, with out the prior writ ten pe rmiss ion of t he pub lisher , SAS I nstitut e Inc.
For a web down load or e-book: Your use of t his public ation sha ll be go verned by the terms esta blished by t he ve ndor at t he time you
acquire this public ation.
The sc anning, upload ing, a nd d istr ibution of this book via the In ternet or any other m eans w ithout t he permis sion of the pu bli sher is illegal
and pu nishab le by law. Please p urchase on ly autho rized el ectron ic editions a nd do not partici pate in or encourage electro nic p ira cy of
copyri ghted ma terials . Your s uppor t of ot hers’ rights is appre c iated .
U.S. Gover nment License Rights; R estricte d Rights: The S oftware and its docume ntation is c omm ercia l computer sof t ware developed at
private e xpens e and is provi ded with R ESTRIC TED RIG HTS to th e U nited Sta tes G overnme nt. Use, du plication, or dis closure of the
Software by th e United St ates Governmen t is sub ject
to the lic ense term s of this A greem ent purs uant to, as a pplicab le , F AR 12. 212, DFAR 22 7.7202- 1(a ), DFAR 2 27.72 02-3(a ), an d DFA R
227.72 02-4, an d, to th e exten t required und er U.S. federal law, the m inimum restric ted rights as set o ut in FAR 52. 227-19 (DEC 2007) . If
FAR 52.2 27-19 is appl icable, t his provis ion ser ves as notice u n der clause (c) ther eof and no othe r no tice is r equired to be a f fixe d to th e
Softwar e or doc umenta tio n. The Gove rnment ’s right s in Sof tware and d ocume ntation s hall be on ly t hose se t fort h in this Agree me nt.
SAS Instit ute Inc ., SAS Ca mpus Drive, Ca ry, NC 27513-241 4
April 20 23
SAS
®
and a ll other SAS Inst itute Inc. pr oduct or servic e name s are r egistered trad emarks or trademark s of SAS In stitute Inc. in the USA
and other countr ies. ® indicate s U SA registr ation.
Other b rand and product names ar e trad emarks of their respecti v e compani es.
SAS so ftware may b e provided with certain thi rd-party soft ware, inc luding but not lim ited to open-s ourc e softwa re, whic h is li censed under
its app licable th ird-party so ftware license ag reement. F or lice nse information a bout third- part y softw are distributed wit h SAS software,
refer to http://support.sas.com/thirdpartylicenses .
Contents
Chapter 1. Introduction ........................

1
I General Linear Models 9
Chapter 2. Building General Linear Models: Concepts ............. 11
Chapter 3. Building General Linear Models: Issues .............. 33
Chapter 4. Building General Linear Models: Methods ............. 39
Chapter 5. Building General Linear Models: Procedures ............ 49
Chapter 6. Building General Linear Models: Collinearity ............ 85
Chapter 7. Building General Linear Models: Model A veraging ......... 119
II Specialized Regression Models 133
Chapter 8. Building Quantile Regression Models ............... 135
Chapter 9. Building Logistic Regression Models ............... 159
Chapter 10. Building Generalized Linear Models ............... 191
Chapter 11. Building Generalized Additi ve Models .............. 223
Chapter 12. Building Proportional Hazards Models .............. 253
Chapter 13. Building Classification and Regression T rees ............ 269
Chapter 14. Building Adapti ve Re gression Models .............. 295
III A ppendices about Algorithms and Computational Methods 313
Appendix A. Algorithms for Least Squares Estimation ............. 315
Appendix B. Least Squares Geometry ................... 321
Appendix C. Akaike’ s Information Criterion ................ 323
Appendix D. Maximum Likelihood Estimation for Generalized Linear Models .... 325
Appendix E. Distributions for Generalized Linear Models ........... 333
Appendix F . Spline Methods ...................... 351
Appendix G. Algorithms for Generalized Additi ve Models ........... 365
IV A ppendices about Common T opics 377
Appendix H. Methods for Scoring Data .................. 379
Appendix I. Coding Schemes for Categorical Predictors ............ 389
Appendix J. Essentials of ODS Graphics ................. 397
Appendix K. Modifying a Procedure Graph ................. 403
Appendix L. Marginal Model Plots .................... 411

Glossary
  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  . 

415
References
 .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .

421
Subject Index 437
Syntax Index 447
iv
Quic k Guide to K e y Procedures
Ta b l e 1 SAS 9 Procedures f or Building Regression Models
Procedur e Model Introduction Example
GLMSELECT General linear models including page 50 page 56
least squares re gression
QU ANTSELECT Quantile regression page 143 page 146
HPLOGISTIC Logistic regression page 163 page 174
HPGENSELECT Generalized linear models page 197 page 201
GAMPL Generalized additi ve models page 228 page 230
HPSPLIT Classification and re gression trees page 274 page 276
AD APTIVEREG Multi variate adapti ve re gression splines page 302 page 304
Ta b l e 2 SAS Viya Procedures f or Building Regression Models
Procedur e Model Introduction Example
REGSELECT General linear models including page 77 page 81
least squares re gression
QTRSELECT Quantile regression page 155 page 156
LOGSELECT Logistic regression page 183 page 187
GENSELECT Generalized linear models page 215 page 216
GAMMOD Generalized additi ve models page 239 page 239
GAMSELECT Generalized additiv e models page 244 page 246
PHSELECT Proportional hazards models page 259 page 261
TREESPLIT Classification and re gression trees page 282 page 283
vi
Pref ace
Highway  53  in  Cibola  National  Forest,  New  Mexico
If you trav el in the western mountains of the United States, you will e ventually encounter the
Continental Di vide. When a thunderstorm drops its contents on the di vide, a portion flows eastw ard
to the Mississippi Ri ver and then to the Atlantic Ocean; the other portion flows westward to the
Pacific Ocean. During the 1800s, the Great Di vide, as it is kno wn, was the highest hurdle f aced by
settlers trekking across the American frontier until the construction of railw ays.
Great divides are also encountered in scientific fields, where philosophical differences impede
practical applications until the y are ev entually resolved—often by breakthroughs in technology . In
the field of statistics, the great divide of the 20th century was the disagreement between proponents
of frequentist and Bayesian approaches. T oday , objectiv e Bayesian methods are widely accepted due
to computational advances in the 1990s.
Machine learning has created a new di vide for the practice of statistics, which relies heavily on data
from well-designed studies for modeling and inference. Statistical methods now vie with algorithms
that learn from large amounts of observational data. In particular , the new divide influences ho w
regression models are viewed and applied. While statistical analysts view regression models as
platforms for inference, data scientists view them as platforms for prediction. And while statistical
analysts prefer to specify the effects in a model by drawing on subject matter knowledge, data
scientists rely on algorithms to determine the form of the model.
This book equips both groups to cross the divide and find value on the other side by presenting SAS
procedures that build regression models for prediction from large numbers of candidate effects. It
introduces statistical analysts to methods of predictive modeling drawn from supervised learning,
and at the same time it introduces data scientists to a rich variety of models drawn from statistics.
Throughout, the book uses the term model building because the procedures provide far more than
sequential methods for model selection such as stepwise regression. The procedures also provide
shrinkage methods, methods for model a veraging, methods for constructing spline effects, and
methods for building trees.
Motiv ation f or the Book
The need for this book originated some years ago with the introduction of SAS/ST A T procedures that
were specifically designed to build re gression models for prediction. The first was the
GLMSELECT
procedure, which b uilds general linear models ( Cohen 2006 ). It not only equips analysts with modern
methods for prediction but also provides the scalability that is essential in data mining and business
analytics, where the number of observ ations can be in the millions and the number of potential
predictors can be in the tens of thousands.
The GLMSELECT procedure was followed by a series of procedures that build other types of
models. For instance, the HPLOGISTIC procedure builds logistic regression models, and the
HPGENSELECT procedure builds generalized linear models.
Naturally , with so many new tools to choose from, SAS users began to ask questions such as the
follo wing:
Ho w is the GLMSELECT pr

Voir icon more
Alternate Text