360
pages
English
Ebooks
2013
Vous pourrez modifier la taille du texte de cet ouvrage
Obtenez un accès à la bibliothèque pour le consulter en ligne En savoir plus
Découvre YouScribe en t'inscrivant gratuitement
Découvre YouScribe en t'inscrivant gratuitement
360
pages
English
Ebooks
2013
Vous pourrez modifier la taille du texte de cet ouvrage
Obtenez un accès à la bibliothèque pour le consulter en ligne En savoir plus
Publié par
Date de parution
22 avril 2013
Nombre de lectures
0
EAN13
9781629592534
Langue
English
Publié par
Date de parution
22 avril 2013
Nombre de lectures
0
EAN13
9781629592534
Langue
English
Simulating Data with SAS ®
Rick Wicklin support.sas.com/bookstore
The correct bibliographic citation for this manual is as follows: Wicklin, Rick. 2013. Simulating Data with SAS ® . Cary, NC: SAS Institute Inc.
Simulating Data with SAS ®
Copyright © 2013, SAS Institute Inc., Cary, NC, USA
ISBN 978-1-61290-622-5 (electronic book) ISBN 978-1-61290-332-3
All rights reserved. Produced in the United States of America.
For a hard-copy book: No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, or otherwise, without the prior written permission of the publisher, SAS Institute Inc.
For a web download or e-book: Your use of this publication shall be governed by the terms established by the vendor at the time you acquire this publication.
The scanning, uploading, and distribution of this book via the Internet or any other means without the permission of the publisher is illegal and punishable by law. Please purchase only authorized electronic editions and do not participate in or encourage electronic piracy of copyrighted materials. Your support of others' rights is appreciated.
U.S. Government Restricted Rights Notice: Use, duplication, or disclosure of this software and related documentation by the U.S. government is subject to the Agreement with SAS Institute and the restrictions set forth in FAR 52.227-19, Commercial Computer Software-Restricted Rights (June 1987).
SAS Institute Inc., SAS Campus Drive, Cary, North Carolina 27513-2414
1st printing, April 2013 1st electronic book, May 2013
SAS provides a complete selection of books and electronic products to help customers use SAS ® software to its fullest potential. For more information about our e-books, e-learning products, CDs, and hard-copy books, visit support.sas.com/bookstore or call 1-800-727-3228.
SAS ® and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Other brand and product names are registered trademarks or trademarks of their respective companies.
Contents
Acknowledgments
I Essentials of Simulating Data
Chapter 1. Introduction to Simulation
Chapter 2. Simulating Data from Common Univariate Distributions
Chapter 3. Preliminary and Background Information
II Basic Simulation Techniques
Chapter 4. Simulating Data to Estimate Sampling Distributions
Chapter 5. Using Simulation to Evaluate Statistical Techniques
Chapter 6. Strategies for Efficient and Effective Simulation
III Advanced Simulation Techniques
Chapter 7. Advanced Simulation of Univariate Data
Chapter 8. Simulating Data from Basic Multivariate Distributions
Chapter 9. Advanced Simulation of Multivariate Data
Chapter 10. Building Correlation and Covariance Matrices
IV Applications of Simulation in Statistical Modeling
Chapter 11. Simulating Data for Basic Regression Models
Chapter 12. Simulating Data for Advanced Regression Models
Chapter 13. Simulating Data from Times Series Models
Chapter 14. Simulating Data from Spatial Models
Chapter 15. Resampling and Bootstrap Methods
Chapter 16. Moment Matching and the Moment-Ratio Diagram
V Appendix
Appendix A. A SAS/IML Primer
Index
About The Author Rick Wicklin is a principal researcher in computational statistics at SAS, where he develops and supports the IML procedure and the SAS/IML Studio application. He received a PhD from Cornell University and has been a SAS user since 1997. Rick has presented numerous tutorials and papers at statistical and SAS users group conferences and is active in the American Statistical Association. Rick maintains a blog for statistical programmers at blogs.sas.com/content/iml/ .
Learn more about this author by visiting his author page at http://support.sas.com/wicklin . There you can download free chapters, access example code and data, read the latest reviews, get updates, and more.
Acknowledgments
I would like to thank Robert Rodriguez and Phil Gibbs for pointing out the need for a book about simulating data in SAS. “Simulation” is a vast topic, and early discussions with them helped me to whittle down the possible topics. Bob and Maura Stokes provided many opportunities for me to develop this material by inviting me to present papers and workshops at conferences. My supervisors at SAS fully supported me as I prepared for and participated in these conferences.
I thank the many SAS users who encouraged me to write a book that emphasizes the practical side of simulation. Discussions with SAS users helped me to determine what topics are of practical importance to statisticians and analysts in business and industry.
I thank my colleagues at SAS from whom I have learned many statistical and programming techniques. Special thanks to Randy Tobias, who always provides sound advice and statistical wisdom for my naive questions. Thanks also to Tim Arnold and Warren Kuhfeld for their ‘saslatex’ documentation system that automatically produced all tables and graphs in this book from the programs that appear in the text.
I thank my editor, John West, and the other employees at SAS Press for their work producing and promoting the book. I thank two reviewers, Clement Stone and Bob Pearson, who provided insightful comments about the book's content and organization.
Thanks to several colleagues and friends who read and commented on early drafts of this book. This includes the following individuals: Rob Agnelli, Jason Brinkley, Tonya Chapman, Steve Denham, Bruce Elsheimer, Betsy Enstrom, Phil Gibbs, Emily Lada, Pushpal Mukhopadhyay, Bill Raynor, Robert Rodriguez, Jim Seabolt, Udo Sglavo, Ying So, Jill Tao, Randy Tobias, Ian Wakeling, Donna Watts, and Min Zhu.
Finally, I would like to thank my wife, Nancy, for her constant support, and my parents for instilling in me a love of learning.
Part I
Essentials of Simulating Data
Chapter 1
Introduction to Simulation
Contents
1.1 Overview of Simulation of Data
1.2 The Goal of This Book
1.3 Who Should Read This Book?
1.4 The SAS/IML Language
1.5 Comparing the DATA Step and SAS/IML Language
1.6 Overview of This Book
1.7 Obtaining the Programs Used in This Book
1.8 Specialized Simulation Tools in SAS Software
1.9 References
1.1 Overview of Simulation of Data
There are many kinds of simulation. Climate scientists use simulation to model the interactions between the earth's atmosphere, oceans, and land. Astrophysicists use simulation to model the evolution of galaxies. Biologists use simulation to model the spread of epidemics and the effects of vaccination programs. Engineers use simulation to study the safety and fuel efficiency of automobile and airplane designs. In these simulations of physical systems, scientists model reality and use a computer to study the model under various conditions.
Statisticians also build models. For example, a simple model of human height might assume that height is normally distributed in the population. This is a useful model, but it turns out that human heights are not actually normally distributed (Schilling, Watkins, and Watkins 2002). Even if you restrict the data to a single gender, there are more very tall and very short people than would be expected from a normal distribution of heights.
If a set of data is only approximately normal, what does that mean for statistical tests that assume normality? If you compute a t test to compare the means of two groups—a test that assumes that the two underlying populations are normally distributed—how sensitive is your conclusion to the actual population distribution? If the populations are slightly nonnormal, does that invalidate the t test? Or are the results fairly robust to deviations from normality?
One way to answer these questions is to simulate data from nonnormal populations. If you construct a distributional model, then you can generate random samples from the model and examine how the t test performs on the simulated data. Simulation gives you complete control over the characteristics of the population model from which the (simulated) data are drawn.
Simulating data is also useful for comparing two different statistical techniques. Perhaps Technique A performs better on skewed data than Technique B. Perhaps Technique B is more robust to the presence of outliers. To a practicing statistician, this kind of information is quite valuable. As Gentle (2009, p. xi) says, “Learning to simulate data with given characteristics means that one understands those characteristics. Applying statistical methods to simulated data…helps us better to understand those methods and the principles underlying them.”
This book is about simulating data in SAS software. This book demonstrates how to generate observations from populations that have specified statistical characteristics. In this book, the phrases “simulating data,” “generating a random sample,” and “sampling from a distribution” are used interchangeably.
A large portion of this book is about learning how to construct statistical models (distributions) that have certain statistical properties. Skewed distributions, fat-tailed distributions, bimodal distributions—these are a few examples of models that you can construct by using the t