110
pages
English
Ebooks
2020
Vous pourrez modifier la taille du texte de cet ouvrage
Obtenez un accès à la bibliothèque pour le consulter en ligne En savoir plus
Découvre YouScribe en t'inscrivant gratuitement
Découvre YouScribe en t'inscrivant gratuitement
110
pages
English
Ebooks
2020
Vous pourrez modifier la taille du texte de cet ouvrage
Obtenez un accès à la bibliothèque pour le consulter en ligne En savoir plus
Publié par
Date de parution
24 février 2020
Nombre de lectures
4
EAN13
9789388176613
Langue
English
Publié par
Date de parution
24 février 2020
Nombre de lectures
4
EAN13
9789388176613
Langue
English
Fundamentals of Deep Learning and Computer Vision
A Complete Guide to become an Expert in Deep Learning and Computer Vision
by
Nikhil Singh
Paras Ahuja
FIRST EDITION 2020
Copyright © BPB Publications, India
ISBN: 978-93-88511-858
All Rights Reserved. No part of this publication may be reproduced or distributed in any form or by any means or stored in a database or retrieval system, without the prior written permission of the publisher with the exception to the program listings which may be entered, stored and executed in a computer system, but they can not be reproduced by the means of publication.
LIMITS OF LIABILITY AND DISCLAIMER OF WARRANTY
The information contained in this book is true to correct and the best of author’s & publisher’s knowledge. The author has made every effort to ensure the accuracy of these publications, but cannot be held responsible for any loss or damage arising from any information in this book.
All trademarks referred to in the book are acknowledged as properties of their respective owners.
Distributors:
BPB PUBLICATIONS
20, Ansari Road, Darya Ganj
New Delhi-110002
Ph: 23254990/23254991
MICRO MEDIA
Shop No. 5, Mahendra Chambers,
150 DN Rd. Next to Capital Cinema,
V.T. (C.S.T.) Station, MUMBAI-400 001
Ph: 22078296/22078297
DECCAN AGENCIES
4-3-329, Bank Street,
Hyderabad-500195
Ph: 24756967/24756400
BPB BOOK CENTRE
376 Old Lajpat Rai Market,
Delhi-110006
Ph: 23861747
Published by Manish Jain for BPB Publications, 20 Ansari Road, Darya Ganj, New Delhi-110002 and Printed by him at Repro India Ltd, Mumbai
About the Authors
Nikhil Singh is an accomplished data scientist and is currently working as the Lead Data Scientist at Proarch IT Solutions Pvt. Ltd, London. He has experience in designing and delivering complex and innovative computer vision and NLP centered solutions for a large number of global companies. He has been an AI consultant to a few companies and mentored many apprentice Data Scientists.
Your LinkedIn Profile: https://www.linkedin.com/in/nikhil-singh-b953ba122/
Paras Ahuja is a seasoned data science practitioner and is currently working as the Lead Data Scientist at Reliance Jio in Hyderabad. He has great experience in designing and deploying deep learning-based scalable solutions focused in the areas of Computer Vision, NLP and Recommendation Systems. He has mentored and coached dozens of data science enthusiasts and beginners.
Your LinkedIn Profile: https://www.linkedin.com/in/parasahuja/
Acknowledgement
Firstly, we would like to thank The Almighty for giving us the courage and capability to write this book and then our parents, none of this would have been possible without them supporting us. We are thankful to BPB Publications for giving us this opportunity to publish our book.
We are also grateful to all our colleagues at our respective organizations.
At last, we thank our critics for their valuable inputs to give final shape to this book.
Preface
In recent years, there has been commendable progress made in the field of Computer Vision i.e. how machines see, process, analyze and interpret images, with advancements in deep learning - a technique which makes it possible for the computers to learn by example. This has opened up new opportunities and Computer Vision is now being used for various purposes from medical imaging to driverless cars to SnapChat's filters. This book discusses the fundamental concepts of Computer Vision and Deep Learning which form the basis of all such applications. The book is divided into 5 chapters and provides a lucid and intuitive explanation of the core concepts of computer vision and deep learning.
Chapter 1 introduces the deep learning framework - TensorFlow and discusses its fundamental concepts.
Chapter 2 discusses the core concept of Deep Learning - neural networks along with other related concepts such as loss functions, gradient descent optimization, activation functions and how backpropagation works for training multi-layer perceptrons.
Chapter 3 introduces the convolution operation before moving on to the convolutional neural networks and thereafter describes different building blocks of the CNN architecture such as kernel size, stride, padding, and pooling and how to build a small CNN model.
Chapter 4 discusses different popular CNN architectures such as AlexNet, VGGNet, Inception, and ResNets along with different object detection algorithms such as RCNN, SSD, and YOLO.
Chapter 5 discusses sequential models, such as RNN, GRU, and LSTMs, their architectures and their applications in machine translation, image/video captioning and video classification.
Errata
We take immense pride in our work at BPB Publications and follow best practices to ensure the accuracy of our content to provide with an indulging reading experience to our subscribers. Our readers are our mirrors, and we use their inputs to reflect and improve upon human errors if any, occurred during the publishing processes involved. To let us maintain the quality and help us reach out to any readers who might be having difficulties due to any unforeseen errors, please write to us at :
errata@bpbonline.com
Your support, suggestions and feedbacks are highly appreciated by the BPB Publications’ Family.
Table of Contents
1. Introduction to TensorFlow
Structure
Objective
Machine learning and deep learning
What is TensorFlow?
TensorFlow installation
Virtual environment
Dataflow graphs
Tensors
Graph
Dataflow graph in TensorFlow
TensorFlow operation
Static shape
Dynamic shape
Session in TensorFlow
Create TensorFlow graph
Fetches
Feedings in TensorFlow
Placeholders
Variables
Assign value
Name scopes
Distributed computing in TensorFlow
Conclusion
2. Introduction to Neural Networks
Structure
Objective
Introduction to ANN
Feedforward Neural Network
XOR Function Using a Linear Model
Learning Based on Gradients
Cost/Loss Functions
Least Square Function
Cross-Entropy Function
Softmax Function
Optimization
Gradient Descent
Stochastic Gradient Descent
Activation Function
The Sigmoid Function
The Tanh Function
Relu
Leaky relu
Backpropagation
Overfitting and Underfitting
Conclusion
3. Convolutional Neural Network
Structure
Objectives
Introduction to CNN
Convolution operation
Why CNN?
Spatial relation among pixels
Convolution in 2D
Translation equivariance
Stride
Padding
Convolution in 3D image
Notations used in CNN
Pooling layer
Architecture of CNN
Conclusion
4. CNN Architectures
Structure
Objective
AlexNet
VGG Net
GoogLeNet/Inception network
ResNets
Deep residual learning framework
Fast R-CNN
Faster R-CNN
Single Shot Detector (SSD)
You Only Look Once (YOLO)
Direct location prediction
Dimension cluster
Feature upsampling and concatenation
Conclusion
5. Sequential Models
Structure
Objective
RNN
Why RNN?
Forward pass
Variants of RNN
Bidirectional RNNs
Motivation for RNN
Language modeling
Machine translation
Image Captioning
Backward pass
Vanishing gradient problem
Long Short-Term Memory (LSTM)
Architecture of LSTM
Variants on LSTM
Application of RNNs in Image and Video Analytics
Object recognition and captioning in video
Video description
Video classification
Conclusion
Bibiliography
C HAPTER 1
Introduction to TensorFlow
W e live in the information age more precisely the digital age. Technology has been advancing by leaps and bounds over the past few years, and this has led to the creation of various smart devices. In this pervasive world, smart devices like smartphones, vehicles, smartwatches, household appliances or any Internet of Things (IoT) devices are becoming ubiquitous and involve communication with databases maintained in the cloud. These communications create lots of data that gets stored in huge databases. The Internet is exploding with a huge amount of data as every second elapses in the time continuum. Around 2.5 quintillion bytes of data gets created each day at current pace. Images and videos are the major contributors to this huge data source. With the development of cloud and flexible storage capacity, developers are opting for more the merrier approach, and actively working to gather more data. This helps them to enhance their technology.
With proliferation of IoT devices and advent of social media, a huge amount of multimedia data is being generated and most of it are unstructured and multimodal.
Hence, it requires computation of multimedia data which has created huge opportunities in storage, processing, and analysis.
Structure
In this chapter we will be covering: Defining tensors Basic operations using TensorFlow Session logging and variables TensorBoard
Objective
Learn basic manipulations like assigning variables, matrix multiplication, transpose of matrix, resizing vectors and matrices using TensorFlow.
Machine learning and deep learning
It is apt to tell that computer vision is at the frontier of an intersection of computation, storage and the future of deep learning research. Some important applications in computer vision include the following: Self-driving transportation Fraud detection Security system Public administration Content analysis, management, and retrieval
Alongside the proliferation of data, it requires various computationally efficient techniques to use these data in a meaningful manner. But the growth in CPU speed has not been at par with data creation speed, leading up to the development of many parallel processing architectures. Lately, we have seen a rise in usage of GPUs, to overcome this issue of computation, which have primarily been used for computer games, now it is being used for the computational purpose, and it has helped immensely in the rise of machine learning field.
Machine learning is a technique that uses statistical and mathematical models to extract so