263
pages
English
Ebooks
2020
Vous pourrez modifier la taille du texte de cet ouvrage
Obtenez un accès à la bibliothèque pour le consulter en ligne En savoir plus
Découvre YouScribe en t'inscrivant gratuitement
Découvre YouScribe en t'inscrivant gratuitement
263
pages
English
Ebooks
2020
Vous pourrez modifier la taille du texte de cet ouvrage
Obtenez un accès à la bibliothèque pour le consulter en ligne En savoir plus
Publié par
Date de parution
03 septembre 2020
Nombre de lectures
6
EAN13
9789389845655
Langue
English
Publié par
Date de parution
03 septembre 2020
Nombre de lectures
6
EAN13
9789389845655
Langue
English
Hands-on Data Analysis and Visualization with Pandas
Engineer, Analyse and Visualize Data, Using Powerful Python Libraries
Purna Chander Rao. Kathula
www.bpbonline.com
FIRST EDITION 2020
Copyright © BPB Publications, India
ISBN: 978-93-89845-648
All Rights Reserved. No part of this publication may be reproduced or distributed in any form or by any means or stored in a database or retrieval system, without the prior written permission of the publisher with the exception to the program listings which may be entered, stored and executed in a computer system, but they can not be reproduced by the means of publication.
LIMITS OF LIABILITY AND DISCLAIMER OF WARRANTY
The information contained in this book is true to correct and the best of author’s & publisher’s knowledge. The author has made every effort to ensure the accuracy of these publications, but cannot be held responsible for any loss or damage arising from any information in this book.
All trademarks referred to in the book are acknowledged as properties of their respective owners but BPB Publications cannot guarantee the accuracy of this information.
Distributors:
BPB PUBLICATIONS
20, Ansari Road, Darya Ganj
New Delhi-110002
Ph: 23254990/23254991
MICRO MEDIA
Shop No. 5, Mahendra Chambers,
150 DN Rd. Next to Capital Cinema,
V.T. (C.S.T.) Station, MUMBAI-400 001
Ph: 22078296/22078297
DECCAN AGENCIES
4-3-329, Bank Street,
Hyderabad-500195
Ph: 24756967/24756400
BPB BOOK CENTRE
376 Old Lajpat Rai Market,
Delhi-110006
Ph: 23861747
Published by Manish Jain for BPB Publications, 20 Ansari Road, Darya Ganj, New Delhi-110002 and Printed by him at Repro India Ltd, Mumbai
www.bpbonline.com
Dedicated to
My Guru Kalyan Ram Kuppachi Vice President, Engineering (Pramati Technologies Pvt Ltd)
About the Author
Purna Chander is currently working as a Data Architect with Pramati Technologies Pvt Ltd, Hyderabad. He has around 17 years of experience working with a wide variety of diverse domains like insurance, mobility, HRMS, storage, database, search engines, ad-tech, gaming, big data, and analytics. He holds a Bachelor’s degree (B.Tech) in mechanical from College of Engineering G.I.T.A.M.
He is a data science enthusiast and seasoned software programmer in a vast array of programming languages, including Perl, C, C++, Java, and Python. He is a coursera certified in Applied Data Science with Python Specialization from the University of Michigan. He is a frequent speaker at data science and data engineering user groups, and he regularly delivers webinars and conducts training on hadoop, big data, data analysis, and visualizations.
About the Reviewer
Sampath Kumar Maddula is a passionate software engineer who enjoys building analytical and data science-centric products with strong coding skills in Python and SQL. He has 8 years of experience in building analytical dashboards, ETL & data engineering, data pipelines, and scalable cloud data processing frameworks. He worked for MNCs clients like Standard Chartered Bank & Hyperion Insurance Group and also for fast-moving startups like Castlight Health, Sema4Genomics & Clara Analytics. He has certifications in Apache Spark with Scala, Machine Learning & AI Foundation, Python Programming Efficiently, Python Design Patterns, Hadoop - Spark Starter Kit, 2017 in the past three years.
In addition, Sampath is well-versed in technologies related to Distributed Design Architectures, Database Design, Machine Learning, and Data Science. He holds a Master’s degree in Information Systems from Birla Institute of Technology and Science, Pilani and he is currently working as a Principal Engineer at Pramati Technologies, Hyderabad.
Acknowledgement
First and foremost, I would like to thank God for giving me the courage to write this book. A warm thanks to all the members of BPB Publications team for giving me this opportunity to publish my book.
I would like to thank my family for their support, and for helping me in numerous ways. Writing this book was not an easy task. I would also like to thank all my friends for their useful discussions, suggestions, and providing moral support when needed.
Lastly, I would like to thank my critics. Without their criticism, I would never be able to write this book.
Preface
Python is a multi-paradigm programming language. It supports object-oriented, procedural, functional, and imperative programming and has a large and comprehensive standard library. Python is open source, simple to learn, and supports major cross-platform operating systems such as windows, linux, mac, and so on. It does support different domains such as Web and internet development, Internet of things, Desktop GUI’s, Gaming, DevOps, Big data, Web Testing/Automation, AI/ Data Science, and much more. The primary goal of this book focuses on subset sections of data science called data analysis and visualization. Data Analysis is the core area where data scientists spend most of their time in cleaning and organizing the data. The main focus of the book is to learn the usage of data science libraries of python. This book will guide you through Python basic and advanced concepts such as list comprehensions, lambdas, functional programming that help in data manipulation. This book contains many examples and real-time datasets that help you to understand the concepts better. This book is divided into 11 chapters and provides a detailed understanding of Python Data science libraries such as JupyterLab, Numpy, Pandas, Scipy, Matplotlib, and Seaborn that help in cleaning and reorganizing the data, data analysis and visualization.
Chapter 1 introduces the core concepts of data science, machine learning, artificial intelligence, and the different processes involved in data analysis. It also describes why Python is used as an essential tool in the data science domain, the core libraries used for data analysis, and the installation process.
Chapter 2 addresses the core and fundamental tool, jupyterlab, which is used as an IDE to create and share the documents from Python codes to a full-blown report. This chapter covers the architecture of jupyterlab and different components such as cells and cell modes for writing the code and documenting it using the markdown language and usage of keyboard shortcuts and toolbars.
Chapter 3 covers overview of python with basic concepts about the data types and their methods. This chapter also covers functions, lambdas, list comprehensions, functional programming, and datetime objects, which are used for working with data analysis and visualizations.
Chapter 4 explains briefly about the numpy library and its use for numerical computation. Here we also cover the internal storage, type-check, and execution speed for both Python lists and numpy arrays. This chapter also covers slicing and dicing of numpy arrays, statistical operations, fancy indexing, and broadcasting.
Chapter 5 guides you through a basic introduction to pandas. It also covers pandas data structures series, dataframes, and their methods and attributes. We also run through a real-time sample dataset to understand the concepts better.
Chapter 6 covers handling different file formats, header manipulations, filtering data based on rows, columns, indexes, groupby operations, and performing aggregations on groupby objects, concatenate and merge the dataframes, filling the missing data, pivot tables, crosstabs and handling large datasets using various methodologies.
Chapter 7 addresses creating date range using various parameters such as start and end dates, periods, yearly, monthly, hourly and seconds, converting the string and unix based dates to datetime objects, time-series analysis on a real-time dataset in finding the insights about data, handling different time zones and holidays.
Chapter 8 addresses a brief introduction to statistics. This chapter covers population, sample, measures of central tendency (mean, median and mode), inferential and differential statistics, standardization, central limit theorem, confidence intervals, and hypothesis testing along with practical examples for each topic.
Chapter 9 explains the concept of data visualization using matplotlib. This chapter describes a brief introduction to matplotlib architecture, the backend, artistic and scripting layer along with sample examples, parameters and methods controlling the visualization plots. This chapter also covers different inbuilt visualization plots, scatter plots, bar plots, line charts, pie charts, histograms and subplots.
Chapter 10 introduces the concept of data visualization using seaborn. This chapter covers statistical visualization using a real-time dataset pokemon. It also covers visualizing statistical relationships between variables, plotting the categorical data, and visualizing the distribution of data using univariate and bivariate features.
Chapter 11 This chapter is a combination of all the previous chapters, it is called exploratory data analysis. To have a better understanding of this chapter, we have taken a real-time (Titanic) dataset. This chapter covers the analysis of the dataset, filling the blank or null values, variable identification and visualizing the information using univariate and multivariate analysis, handling outliers and finding the insights about the data using different visualization techniques such as scatter plots, bar plots, line plots, heatmaps and making inferences about the data.
Downloading the code bundle and coloured images:
Please follow the link to download the Code Bundle and the Coloured Images of the book:
https://rebrand.ly/5d390
Errata
We take immense pride in our work at BPB Publications and follow best practices to ensure the accuracy of our content to provide with an indulging reading experience to our subscribers. Our readers are our mirrors, and we use their inputs to reflect and improve upon human errors if any, occurred during the publishing proc