AN INTRODUCTION TO Data Science Jeffrey Stanton, Syracuse University INTRODUCTION TO DATA SCIENCE © 2012, Jeffrey Stanton This book is distributed under the Creative Commons Attribution- This book was developed for the Certificate of Data Science pro- NonCommercial-ShareAlike 3.0 license. You are free to copy, dis- gram at Syracuse University’s School of Information Studies. If tribute, and transmit this work. You are free to add or adapt the you find errors or omissions, please contact the author, Jeffrey Stan- work. You must attribute the work to the author(s) listed above. ton, at A PDF version of this book and code ex- You may not use this work or derivative works for commercial pur- amples used in the book are available at: poses. If you alter, transform, or build upon this work you may dis- tribute the resulting work only under the same or similar license. For additional details, please see: i Data Science: Many Skills Data Science refers to an emerging area of work concerned with the collection, preparation, analysis, visualization, management, and preservation of large collections of information. Although the name Data Science seems to connect most strongly with areas such as databases and computer science, many different kinds of skills - including non-mathematical skills - are needed. ii SECTION 1 Data Science: Many Skills Overview 1.