155
pages
English
Ebooks
2017
Vous pourrez modifier la taille du texte de cet ouvrage
Obtenez un accès à la bibliothèque pour le consulter en ligne En savoir plus
Découvre YouScribe en t'inscrivant gratuitement
Découvre YouScribe en t'inscrivant gratuitement
155
pages
English
Ebooks
2017
Vous pourrez modifier la taille du texte de cet ouvrage
Obtenez un accès à la bibliothèque pour le consulter en ligne En savoir plus
Publié par
Date de parution
01 mai 2017
Nombre de lectures
0
EAN13
9781635261486
Langue
English
Poids de l'ouvrage
19 Mo
Access and clean up data easily using JMP®!
Data acquisition and preparation commonly consume approximately 75% of the effort and time of total data analysis. JMP provides many visual, intuitive, and even innovative data-preparation capabilities that enable you to make the most of your organization's data.
Preparing Data for Analysis with JMP® is organized within a framework of statistical investigations and model-building and illustrates the new data-handling features in JMP, such as the Query Builder. Useful to students and programmers with little or no JMP experience, or those looking to learn the new data-management features and techniques, it uses a practical approach to getting started with plenty of examples. Using step-by-step demonstrations and screenshots, this book walks you through the most commonly used data-management techniques that also include lots of tips on how to avoid common problems.
With this book, you will learn how to:
Publié par
Date de parution
01 mai 2017
Nombre de lectures
0
EAN13
9781635261486
Langue
English
Poids de l'ouvrage
19 Mo
Preparing Data for Analysis with JMP
Robert Carver
The correct bibliographic citation for this manual is as follows: Carver, Robert. 2017. Preparing Data for Analysis with JMP . Cary, NC: SAS Institute Inc.
Preparing Data for Analysis with JMP
Copyright 2017, SAS Institute Inc., Cary, NC, USA
ISBN 978-1-62960-418-3 (Hard copy) ISBN 978-1-63526-148-6 (EPUB) ISBN 978-1-63526-149-3 (MOBI) ISBN 978-1-63526-150-9 (PDF)
All Rights Reserved. Produced in the United States of America.
For a hard copy book: No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, or otherwise, without the prior written permission of the publisher, SAS Institute Inc.
For a web download or e-book: Your use of this publication shall be governed by the terms established by the vendor at the time you acquire this publication.
The scanning, uploading, and distribution of this book via the Internet or any other means without the permission of the publisher is illegal and punishable by law. Please purchase only authorized electronic editions and do not participate in or encourage electronic piracy of copyrighted materials. Your support of others rights is appreciated.
U.S. Government License Rights; Restricted Rights: The Software and its documentation is commercial computer software developed at private expense and is provided with RESTRICTED RIGHTS to the United States Government. Use, duplication, or disclosure of the Software by the United States Government is subject to the license terms of this Agreement pursuant to, as applicable, FAR 12.212, DFAR 227.7202-1(a), DFAR 227.7202-3(a), and DFAR 227.7202-4, and, to the extent required under U.S. federal law, the minimum restricted rights as set out in FAR 52.227-19 (DEC 2007). If FAR 52.227-19 is applicable, this provision serves as notice under clause (c) thereof and no other notice is required to be affixed to the Software or documentation. The Government s rights in Software and documentation shall be only those set forth in this Agreement.
SAS Institute Inc., SAS Campus Drive, Cary, NC 27513-2414
May 2017
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. indicates USA registration.
Other brand and product names are trademarks of their respective companies.
SAS software may be provided with certain third-party software, including but not limited to open-source software, which is licensed under its applicable third-party software license agreement. For license information about third-party software distributed with SAS software, refer to http://support.sas.com/thirdpartylicenses .
Contents
About This Book
About The Author
Chapter 1: Data Management in the Analytics Process
Introduction
A Continuous Process
Asking Questions That Data Can Help to Answer
Sourcing Relevant Data
Reproducibility
Combining and Reconciling Multiple Sources
Identifying and Addressing Data Issues
Data Requirements Shaped by Modeling Strategies
Plan of the Book
Conclusion
References
Chapter 2: Data Management Foundations
Introduction
Matching Form to Function
JMP Data Tables
Data Types and Modeling Types
Data Types
Modeling Types
Basics of Relational Databases
Conclusion
References
Chapter 3: Sources of Data and Their Challenges
Introduction
Internal Data in Flat Files
Relational Databases
External Data on the World Wide Web
User-Facing Query Interfaces
Tabular Data Pages
Evolving WWW Data Standards
Ethical and Legal Considerations
Conclusion
References
Chapter 4: Single Files
Introduction
Review of JMP File Types
Common Formats Other than JMP
MS Excel
Text Files
SAS Files
Other Data File Formats
Conclusion
References
Chapter 5: Database Queries
Introduction
Sample Databases in This Chapter
Connecting to a Database
Extracting Data from One Table in a Database
Import an Entire Table
Import a Subset of a Table
Querying a Database from JMP
Query Builder
An Illustrative Scenario: Bicycle Parts
Designing a Query with Query Builder
Query Builder for SAS Server Data
Conclusion
References
Chapter 6: Importing Data from Websites
Introduction
Variety of Web Formats
Internet Open
Common Issues to Anticipate
Conclusion
References
Chapter 7: Reshaping a Data Table
Introduction
What Shape Is a Data Table?
Wide versus Long Format
Reasons for Wide and Long Formats
Stacking Wide Data
Unstacking Narrow Data
Additional Examples
Stacking Wide Data
Scripting for Reproducibility
Splitting Long Data
Transposing Rows and Columns
Reshaping the WDI Data
Conclusion
References
Chapter 8: Joining, Subsetting, and Filtering
Introduction
Combining Data from Multiple Tables with Join
Saving Memory with a Virtual Join
Why and How to Select a Subset
A Brief Detour: Creating a New Column from an Existing Column
Row Filters: Global and Local
Global Filter
Local Filter
A More Durable Subset
Combining Rows with Concatenate
Query Builder for Tables
Back to the Movies
Olympic Medals and Development Indicators
Conclusion
References
Chapter 9: Data Exploration: Visual and Automated Tools to Detect Problems
Introduction
Common Issues to Anticipate
On the Hunt for Dirty Data
Distribution
Columns Viewer
Multivariate (Correlations and Scatterplot Matrix)
More Tools within the Multivariate Platform
Principal Components
Outlier Analysis
Item Reliability
Explore Outliers
Quantile Range Outliers
Robust Fit Outliers
Multivariate Robust Outliers
Multivariate k-Nearest Neighbors Outliers
Explore Missing
Conclusion
References
Chapter 10: Missing Data Strategies
Introduction
Much Ado about Nothing?
Four Basic Approaches
Working with Complete Cases
Analysis with Sampling Weights
Imputation-based Methods
Recode
Informative Missing
Multivariate Normal Imputation
Multivariate SVD Imputation
Special Considerations for Time Series
Conclusion and a Note of Caution
References
Chapter 11: Data Preparation for Analysis
Introduction
Common Issues and Appropriate Strategies
Distribution of Observations
Noisy Data
Skewness or Outliers
Scale Differences among Model Variables
Too Many Levels of a Categorical Variable
High Dimensionality: Abundance of Columns
Correlated or Redundant Variables
Missing or Sparse Observations across Columns
A PCA Example
Abundance of Rows
Partitioning into Training, Validation, and Test Sets
Aggregating Rows with Summary Tables
Oversampling Rare Events
Date and Time-Related Issues
Formatting Dates and Times
Some Date Functions: Extracting Parts
Aggregation
Row Functions Especially Useful in Time-Ordered Data
Elapsed Time and Date Arithmetic
Conclusion
References
Chapter 12: Exporting Work to Other Platforms
Introduction
Why Export or Exchange Data?
Fit the Method to the Purpose
Save As
Export to a Database
Export to a SAS Library
Exporting Reports
Interactive Graphics
Static Images: Graphics Formats, PowerPoint, and Word
Conclusion
References
Index
About This Book
What Does This Book Cover?
In a 2008 interview, Google s chief economist, Hal Varian, remarked:
I keep saying the sexy job in the next ten years will be statisticians. People think I m joking, but who would ve guessed that computer engineers would ve been the sexy job of the 1990s? The ability to take data-to be able to understand it, to process it, to extract value from it, to visualize it, to communicate it-that s going to be a hugely important skill in the next decades 1
Perhaps the very least attractive aspect of the sexy job is the work involved in assembling, reconciling, tidying, cleaning, and otherwise preparing data from various sources before the serious work of processing, extracting value, visualizing, and communicating. Although data preparation typically consumes an enormous share of the time in most projects, it receives comparably little attention in the data analytics literature.
It is as if data preparation is a dark art or a nasty family secret, widely acknowledged but not spoken about in polite company. This book is all about using the extensive capabilities of JMP to facilitate and regularize the phases of preparing data for analysis.
This book is entirely and exclusively about the stages that precede the actual analysis in a statistical investigation. It covers methods for extracting data from various sources and in different formats and converting them to JMP data tables. Because so many projects call for merging multiple data tables, we see how the powerful JMP Query Builder for Tables facilitates such operations, enabling the analyst to manage data consolidation at scale and at relatively high speed.
As practitioners know all too well, once the data are all in one place, the