394
pages
English
Ebooks
2017
Vous pourrez modifier la taille du texte de cet ouvrage
Obtenez un accès à la bibliothèque pour le consulter en ligne En savoir plus
Découvre YouScribe en t'inscrivant gratuitement
Découvre YouScribe en t'inscrivant gratuitement
394
pages
English
Ebooks
2017
Vous pourrez modifier la taille du texte de cet ouvrage
Obtenez un accès à la bibliothèque pour le consulter en ligne En savoir plus
Publié par
Date de parution
20 juillet 2017
Nombre de lectures
0
EAN13
9781635260380
Langue
English
Poids de l'ouvrage
27 Mo
Publié par
Date de parution
20 juillet 2017
Nombre de lectures
0
EAN13
9781635260380
Langue
English
Poids de l'ouvrage
27 Mo
Predictive Modeling with SAS Enterprise Miner
Practical Solutions for Business Applications
Third Edition
Kattamuri S. Sarma, PhD
sas.com/books
The correct bibliographic citation for this manual is as follows: Sarma, Kattamuri S., Ph.D. 2017. Predictive Modeling with SAS Enterprise Miner : Practical Solutions for Business Applications, Third Edition . Cary, NC: SAS Institute Inc.
Predictive Modeling with SAS Enterprise Miner : Practical Solutions for Business Applications, Third Edition
Copyright 2017, SAS Institute Inc., Cary, NC, USA ISBN 978-1-62960-264-6 (Hard copy) ISBN 978-1-63526-038-0 (EPUB) ISBN 978-1-63526-039-7 (MOBI) ISBN 978-1-63526-040-3 (PDF)
All Rights Reserved. Produced in the United States of America.
For a hard-copy book: No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, or otherwise, without the prior written permission of the publisher, SAS Institute Inc.
For a web download or e-book: Your use of this publication shall be governed by the terms established by the vendor at the time you acquire this publication.
The scanning, uploading, and distribution of this book via the Internet or any other means without the permission of the publisher is illegal and punishable by law. Please purchase only authorized electronic editions and do not participate in or encourage electronic piracy of copyrighted materials. Your support of others rights is appreciated.
U.S. Government License Rights; Restricted Rights: The Software and its documentation is commercial computer software developed at private expense and is provided with RESTRICTED RIGHTS to the United States Government. Use, duplication, or disclosure of the Software by the United States Government is subject to the license terms of this Agreement pursuant to, as applicable, FAR 12.212, DFAR 227.7202-1(a), DFAR 227.7202-3(a), and DFAR 227.7202-4, and, to the extent required under U.S. federal law, the minimum restricted rights as set out in FAR 52.227-19 (DEC 2007). If FAR 52.227-19 is applicable, this provision serves as notice under clause (c) thereof and no other notice is required to be affixed to the Software or documentation. The Government s rights in Software and documentation shall be only those set forth in this Agreement.
SAS Institute Inc., SAS Campus Drive, Cary, NC 27513-2414
July 2017
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. indicates USA registration.
Other brand and product names are trademarks of their respective companies.
SAS software may be provided with certain third-party software, including but not limited to open-source software, which is licensed under its applicable third-party software license agreement. For license information about third-party software distributed with SAS software, refer to http://support.sas.com/thirdpartylicenses .
Contents
About This Book
About The Author
Chapter 1: Research Strategy
1.1 Introduction
1.2 Types of Inputs
1.2.1 Measurement Scales for Variables
1.2.2 Predictive Models with Textual Data
1.3 Defining the Target
1.3.1 Predicting Response to Direct Mail
1.3.2 Predicting Risk in the Auto Insurance Industry
1.3.3 Predicting Rate Sensitivity of Bank Deposit Products
1.3.4 Predicting Customer Attrition
1.3.5 Predicting a Nominal Categorical (Unordered Polychotomous) Target
1.4 Sources of Modeling Data
1.4.1 Comparability between the Sample and Target Universe
1.4.2 Observation Weights
1.5 Pre-Processing the Data
1.5.1 Data Cleaning Before Launching SAS Enterprise Miner
1.5.2 Data Cleaning After Launching SAS Enterprise Miner
1.6 Alternative Modeling Strategies
1.6.1 Regression with a Moderate Number of Input Variables
1.6.2 Regression with a Large Number of Input Variables
1.7 Notes
Chapter 2: Getting Started with Predictive Modeling
2.1 Introduction
2.2 Opening SAS Enterprise Miner 14.1
2.3 Creating a New Project in SAS Enterprise Miner 14.1
2.4 The SAS Enterprise Miner Window
2.5 Creating a SAS Data Source
2.6 Creating a Process Flow Diagram
2.7 Sample Nodes
2.7.1 Input Data Node
2.7.2 Data Partition Node
2.7.3 Filter Node
2.7.4 File Import Node
2.7.5 Time Series Nodes
2.7.6 Merge Node
2.7.7 Append Node
2.8 Tools for Initial Data Exploration
2.8.1 Stat Explore Node
2.8.2 MultiPlot Node
2.8.3 Graph Explore Node
2.8.4 Variable Clustering Node
2.8.5 Cluster Node
2.8.6 Variable Selection Node
2.9 Tools for Data Modification
2.9.1 Drop Node
2.9.2 Replacement Node
2.9.3 Impute Node
2.9.4 Interactive Binning Node
2.9.5 Principal Components Node
2.9.6 Transform Variables Node
2.10 Utility Nodes
2.10.1 SAS Code Node
2.11 Appendix to Chapter 2
2.11.1 The Type, the Measurement Scale, and the Number of Levels of a Variable
2.11.2 Eigenvalues, Eigenvectors, and Principal Components
2.11.3 Cramer s V
2.11.4 Calculation of Chi-Square Statistic and Cramer s V for a Continuous Input
2.12 Exercises
Notes
Chapter 3: Variable Selection and Transformation of Variables
3.1 Introduction
3.2 Variable Selection
3.2.1 Continuous Target with Numeric Interval-scaled Inputs (Case 1)
3.2.2 Continuous Target with Nominal-Categorical Inputs (Case 2)
3.2.3 Binary Target with Numeric Interval-scaled Inputs (Case 3)
3.2.4 Binary Target with Nominal-scaled Categorical Inputs (Case 4)
3.3 Variable Selection Using the Variable Clustering Node
3.3.1 Selection of the Best Variable from Each Cluster
3.3.2 Selecting the Cluster Components
3.4 Variable Selection Using the Decision Tree Node
3.5 Transformation of Variables
3.5.1 Transform Variables Node
3.5.2 Transformation before Variable Selection
3.5.3 Transformation after Variable Selection
3.5.4 Passing More Than One Type of Transformation for Each Interval Input to the Next Node
3.5.5 Saving and Exporting the Code Generated by the Transform Variables Node
3.6 Summary
3.7 Appendix to Chapter 3
3.7.1 Changing the Measurement Scale of a Variable in a Data Source
3.7.2 SAS Code for Comparing Grouped Categorical Variables with the Ungrouped Variables
Exercises
Note
Chapter 4: Building Decision Tree Models to Predict Response and Risk
4.1 Introduction
4.2 An Overview of the Tree Methodology in SAS Enterprise Miner
4.2.1 Decision Trees
4.2.2 Decision Tree Models
4.2.3 Decision Tree Models vs. Logistic Regression Models
4.2.4 Applying the Decision Tree Model to Prospect Data
4.2.5 Calculation of the Worth of a Tree
4.2.6 Roles of the Training and Validation Data in the Development of a Decision Tree
4.2.7 Regression Tree
4.3 Development of the Tree in SAS Enterprise Miner
4.3.1 Growing an Initial Tree
4.3.2 P-value Adjustment Options
4.3.3 Controlling Tree Growth: Stopping Rules
4.3.3.1 Controlling Tree Growth through the Split Size Property
4.3.4 Pruning: Selecting the Right-Sized Tree Using Validation Data
4.3.5 Step-by-Step Illustration of Growing and Pruning a Tree
4.3.6 Average Profit vs. Total Profit for Comparing Trees of Different Sizes
4.3.7 Accuracy /Misclassification Criterion in Selecting the Right-sized Tree (Classification of Records and Nodes by Maximizing Accuracy)
4.3.8 Assessment of a Tree or Sub-tree Using Average Square Error
4.3.9 Selection of the Right-sized Tree
4.4 Decision Tree Model to Predict Response to Direct Marketing
4.4.1 Testing Model Performance with a Test Data Set
4.4.2 Applying the Decision Tree Model to Score a Data Set
4.5 Developing a Regression Tree Model to Predict Risk
4.5.1 Summary of the Regression Tree Model to Predict Risk
4.6 Developing Decision Trees Interactively
4.6.1 Interactively Modifying an Existing Decision Tree
4.6.3 Developing the Maximal Tree in Interactive Mode
4.7 Summary
4.8 Appendix to Chapter 4
4.8.1 Pearson s Chi-Square Test
4.8.2 Calculation of Impurity Reduction using Gini Index
4.8.3 Calculation of Impurity Reduction/Information Gain using Entropy
4.8.4 Adjusting the Predicted Probabilities for Over-sampling
4.8.5 Expected Profits Using Unadjusted Probabilities
4.8.6 Expected Profits Using Adjusted Probabilities
4.9 Exercises
Notes
Chapter 5: Neural Network Models to Predict Response and Risk
5.1 Introduction
5.1.1 Target Variables for the Models
5.1.2 Neural Network Node Details
5.2 General Example of a Neural Network Model
5.2.1 Input Layer
5.2.2 Hidden Layers
5.2.3 Output Layer or Target Layer
5.2.4 Activation Function of the Output Layer
5.3 Estimation of Weights in a Neural Network Model
5.4 Neural Network Model to Predict Response
5.4.1 Setting the Neural Network Node Properties
5.4.2 Assessing the Predictive Performance of the Estimated Model
5.4.3 Receiver Operating Characteristic (ROC) Charts
5.4.4 How Did the Neural Network Node Pick the Optimum Weights for This Model?
5.4.5 Scoring a Data Set Using the Neural Network Model
5.4.6 Score Code
5.5 Neural Network Model to Predict Loss Frequency in Auto Insur