I have a Transportation Engineering (Civil Engineering Domain) background. Thus, I have to deal with lots of data. The process involves data cleaning, processing, analysing and making inferences. Conventional software often creates hurdle when dealing with a large volume of data; mainly, when using traditional software, such as excel. To cope with this problem, I started learning programming languages.
I still remember; initially, I had no idea where to begin. So, I started searching for programming and data science-related blogs and came across two programming languages; Python and R, which are both open-source and free. The next question came across my mind, whether to start with Python or R; it often remains an ultimate paradox of choice for many beginners. To begin with, I picked R. I know it is a little bit odd, but I started with the hard one and later learned Python too. As of now, I have practical knowledge in both R and Python as a data science tool for data cleaning, manipulating, visualising, statistical computing and machine learning.
I believe that the majority of us will agree with the fact that it is often hard to identify a first stepping stone, to begin with, especially when dreaming of starting a data science journey. Books are the best source for gaining knowledge, precisely when you want to learn on your own. Still, an individual often faces difficulty while selecting a book that satisfies the requirement and provides proper guidance. There are plenty of books available online on Amazon or Flipkart (an Indian eCommerce website), but the big question is which one you should pick to begin with (the paradox of choice).
A lot of people asking me questions about how to start their journey into Data Science. After all, “Data Scientist is now viewed as the Sexiest Job of the 21st Century” as per Harvard Business Review. Besides online learning, I will recommend working through books. To clear the clutter, in this blog, I will guide you through some books that would help you build insight to start your programming and data science journey, and even strengthen the existing knowledge base.
The books recommendation can be divided into three broad categories:
- Programming Books for Beginners
- Books for Data Science Learning (manipulation and plotting)
- Books for Machine Learning (predictive modelling)
Programming Books for Beginners
First of all, to dive into data science or machine learning, one needs to have a data-crunching tool, a programming language such as R or Python. Here, I have listed a few books that guide you through and give you a concrete understanding of R or Python.
Automate The Boring Stuff with Python (AL Sweigart)
“Automate the Boring Stuff with Python” is one of the most popular books written considering beginners in mind, especially for those who want to start the journey with the popular general-purpose programming language Python. The author has introduced a wide variety of practical coding examples that will help you grasp the concepts with minimal effort.
This book will help you to grasp python’s basic concepts, such as data types, functions, lists, dictionaries, tuples, strings, regular expression, reading and writing different file types, and many more. Additionally, this gives you an initial overview of automation with small example projects.
Hands-On Programming with R (Garrett Grolemund)
If you are beginning with R programming language, then you can start your journey with “Hands-On Programming with R” written by popular books and R package writers Garrett Grolemund and Hadley Wickham. They are both known for their contribution to the R community.
This book will give you a hands-on practical experience of the R language and guides you though R objects (atomic vectors, doubles, integers, characters, logical and complex), functions, loops, S3 objects, packages, value imputation, and missing values, with vast and vivid examples.
Learning R (Richard Cotton)
“Learning R” is a beautiful introductory book. This book helps you familiarize with almost every programming concept related to the R programming language. Initially, the book will introduce you to the language as a scientific calculator.
Besides, it gives an overview of critical concepts, such as variables and environment, vector, matrices and arrays, list, data frame, loops, functions, strings, factors, and packages. Even introduce you to the fundamental basics of data visualization and distribution. Additionally, it will make you familiar with advanced programming concepts and package development.
Books for Data Science Learning
After learning the basics of programming language, next, you are ready to dive into the pool of data science.
Python for Data Analysis (Wes Mckinney)
If you are a python enthusiastic then “Python for Data Analysis” by Wes McKinney (the creator of pandas) is the best book to start your data science journey. In data science, data collection and wrangling is one of the vital parts of data workflow, which requires 80% of the time. This book teaches you, how you can efficiently use popular libraries such as NumPy and Pandas for data manipulation and summarization, with a vast number of examples.
In short, this book gives you an overview of built-in data structures, arrays, data frame manipulation (loading, writing, cleaning, joining, combining, and reshaping) strategies. Additionally, it helps in plotting, aggregation, and make you familiar with time series and advance pandas.
R for Data Science (Hadley Wickham & Garrett Grolemund)
If you are R enthusiastic then “R for Data Science” by Hadley Wickham & Garrett Grolemun (famous writers and package contributors) is the best book to start your data science journey.
This book introduces you to the concept of “tidy data” and walks you through the popular data manipulation package “tidyverse”, which makes your data science fast, fluent and fun. Additionally, this helps you to explore and visualise (ggplot2), wrangle (dplyr and tidyr), model (modelr, purrr and broom) and communicate data with reproducibility (rmarkdown).
Statistical Inference via Data Science (Chester Ismay and Albert Y. Kim)
In data science, statistics often plays a vital role. As a data scientist, you often have to dig out inferences from your data. Knowing statistics will help you make correct inferences. The book “Statistical Inference via Data Science” will guide you through and help you understand the inference retrieval process using data science tools, widely used in industry and academia.
Further, it will reintroduce the “tidyverse” package and enforce your understanding. After providing a solid background of data science tools, you will be introduced to the world of traditional statistics; confidence interval, hypothesis testing, and regression with a visual representation.
Books for Machine Learning (predictive modelling)
Machine learning is the process by which the computers learns the data representation without being explicitly programmed. A Machine Learning algorithm is different from the conventional rule-based system. It’s presented with numerous examples, and it finds representation structure in these examples that eventually allows them to come up with own set of rules for automating the task (see illustration below). There are many books available that will introduce you to the world of Machine Learning.
Hands-on Machine Learning with Scikit-Learn, Keras & TensorFlow (Aurelien Geron)
This book “Hands-on Machine Learning with Scikit-Learn, Keras & TensorFlow” will introduce you to the world of intelligent machines and systems. The author has provided an intuitive overview of the concepts with minimal theory and more practical production-ready examples.
You will be introduced to a wide variety of ML algorithms. Model algorithms, such as linear regression, support vector machines, decision trees, ensemble methods, and even to deep learning framework using popular ML libraries, such as scikit-learn, Keras, and Tensor Flow.
Machine Learning with R (Brett Lantz)
The book “Machine Learning with R” guides you by introducing to the history of Machine Learning and describes how a machine learns the data representation. After an overview of ML concepts, this book guides you through various ML algorithms (supervised and unsupervised), their implementation strategy, advantages and drawbacks.
Algorithms such as lazy learning (classification using nearest neighbor), probabilistic learning (classification using Naive Bayes), divide and conquer (Decision trees and Rules), forecasting (Regression methods), Black Box Methods (Neural Networks and Support Vector Machines) and ensembles (random forests) have been introduced in this book. Additionally, it will make you familiar with model evaluation and bias avoidance strategies. Finally, this introduces you to the emerging big data technologies, such as Spark (big data analytics tool), H2o (popular ML library) and TensorFlow (popular deep learning library).
Deep Learning with Python (Francios Chollet) and Deep Learning with R (Francios Chollet with J.J Allaire)
Love python or R? want to learn the concepts of deep learning, then “Deep Learning with Python/R” will be the best book to refer. The book is written by Francois Chollet, author of one of the most popular deep learning framework “Keras”. Step by step, this book will guide you through the concept of data representation to deep learning implementation using the popular deep learning library “Keras”.
Currently, deep learning is being used in solving a variety of problems, such as image recognition, object detection, text classification, speech recognition (natural language processing), sequence prediction, neural style transfer, text generation, image reconstruction, and many more.
It is the technology used behind self-driving cars, speech recognition used in Siri, Alexa or Google, photo tagging on Facebook, song recommendation on Spotify, and product recommendation engines. Now even researches are using deep learning to understand complex patterns in data, for example detecting glaucoma in diabetes patients, disaster management (earthquake and flood predictions), fake news detection, robotics, and biomechanics. To better understanding the practical application of deep learning, I will recommend you to watch the YouTube Series “The Age of A.I.”.
Always learn something new
Especially, when the technology is consistently changing. I believe that you should never stop learning. There is always room for learning as well as improvement. Even bit by bit, learning for a long time provides extraordinary cumulative results. I tend to read a book in two months. Feel free to post your recommendation and even share it with your friends when they start asking you the same question.