Data Science Reference Guide

Alex Zieky
6 min readNov 30, 2020

Data Science can be an intimating world to jump into. That is why below I have listed some helpful resources to help someone get started. While Data Science is a challenge, there are countless sources around the internet, in your local bookstores, or easily purchased that can help someone break into the industry.

Non-Technical Books

Thinking Fast and Slow by Daniel Kahneman

This international bestseller authored by noted economist and psychologist Daniel Kahneman takes the readers on a fascinating journey by dissecting the mind and goes onto explain two distinct systems that affect our way of thinking and making choices. Of these two systems, one is intuitive, emotional yet fast while the other one is more logical and deliberative. As every data scientist has to have the qualities of a storyteller as well as a decision-maker, this book is a perfect read for him/her.

Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy by Cathy O’Neil

Weapons of Math Destruction is all about what goes wrong when people utilize the tools of data science without a firm conceptual understanding of their roles and responsibilities. Data science is a SCIENCE and poor procedures, unclear objectives and individual biases always lead to completely in-viable and often destructive outcomes. After reading this book, you begin to see the unintended weaponization of data everywhere. While it may induce some anxiety to see how poorly-implemented the worlds most powerful data systems are, none of us can fix the problems we fail to recognize.

Factfullness: Ten Reasons We’re Wrong About The World — And Why Things Are Better Than You Think by Hans Rosling

In the book, Rosling suggests the vast majority of human beings are wrong about the state of the world. He demonstrates that his test subjects believe the world is poorer, less healthy, and more dangerous than it actually is, attributing this not to random chance but to misinformation

The Signal and the Noise by Nate Silver

The Signal and the Noise is probably one of the most popular statistics books around. ‘The signal in the noise’ is a metaphor that is often used in data science: identifying the relevant information ‘signal’ that is correlated to the solution of a given problem from within a ‘noisy’ data set or system. The world is full of distractions, and many of the things that end up effecting our decision making are diverting our attention away from indicators that are more closely correlated to our objectives.

The Functional Art by Alberto Cairo

Unlike any time before in our lives, we have access to vast amounts of free information. With the right tools, we can start to make sense of all this data to see patterns and trends that would otherwise be invisible to us. By transforming numbers into graphical shapes, we allow readers to understand the stories those numbers hide. In this practical introduction to understanding and using information graphics, you’ll learn how to use data visualizations as tools to see beyond lists of numbers and variables and achieve new insights into

Technical Books

Statistical Rethinking by Richard McElreath

Hands-on Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems by Aurélien Géron

Applied Predictive Modeling by Max Kuhn & Kjell Johnson

The Visual Display of Quantitative Information by Edward R. Tufte

Blogs & Websites

“Data Science Central” (datasciencecentral.com)

Data Science Central does exactly what its name suggests and acts as an online resource hub for just about everything related to data science and big data. The site covers a wide array of data science topics regarding analytics, technology, tools, data visualization, code, and job opportunities. Industry experts contribute discussion and insights about key topics.

“SmartData Collective” (smartdatacollective.com)

SmartData Collective is a community site focused on trends in business intelligence and data management. Similar to Data Science Central, it also features insights into data science through contributions by industry experts. Where Data Science Central focuses directly on data science as a whole, SmartData Collective looks at the wider field and how data science can intersect with business.

Flowing Data (flowingdata.com)

Flowing data is one of the best source available online from where you’ll have everything from best books to online tutorials. On Flowing data, Dr. Nathan Yau, PhD explores various aspects of data science to help the budding data lovers around the world to have a better understanding of data. What are the mistakes often made in the data analysis, how to tackle them or how to bring handle challenges that comes across handling data, everything is discussed here.

Kaggle (kaggle.com)

For online competitions, Kaggle is the best website where you will see a plenty of ongoing competition on data science, big data, machine learning, and Hadoop. Compete with the best in the world to see where you stand. Allow yourself to learn something new by competing with the data lovers around the world. Not just learning, you can win exciting prizes also.

Information is Beautiful (informationisbeautiful.net)

Ted Talks

The Best Stats You’ve Ever Seen by Hans Rosling

To most of the world, statistics and data science can be dry and difficult to understand due to the complexity and use of jargon. However, this classic TED Talk by Hans Rosling presents data with the drama and urgency of a sportscaster and breaks down the mythology and common misconceptions about the world as it is through the use of data analytics.

The Beauty of Data Visualization by David McCandless

One of the most renowned data journalists in the world, David McCandless’ love for complex data sets and appealing data visualizations shines through in this TED Talk, as he makes use of data and design to create value by reducing information silos from a wide range of sources. Through this presentation, you’ll find data navigation more interesting than ever.

We’re All Data Scientists by Rebecca Nugent

In this talk, Rebecca Nugent concentrates on how data science has changed education at a fundamental level, empowering students and employees from all backgrounds, including the humanities and social sciences. In fact, according to Nugent, data science is the “science of the people,” as the power of data can be harnessed by everyone regardless of what field they are in.

Subreddits

r/datascience

r/MachineLearning

r/learningmachinelearning

r/datasets

r/dataisbeautiful

r/visualization

r/learnpython

Twitter Accounts

Andrew Ng (@AndrewYNg)

KD Nuggets (@kdnuggets)

Lillian Pierson (@Strategy_Gal)

Wes Mckinney (@wesmckinn)

Yann Lecun (@ylecun)

Kirk Borne (@KirkDborne)

Movies/Documentaries/TV Shows

Moneyball

The Imitation Game

Westworld

The Social Dilemma

The Human Face of Big Data

Humans Need Not Apply

--

--

Alex Zieky

Financial professional with experience in data acquisition, data modeling, statistical analysis, machine learning, deep learning, and NLP.