Data Science Reference Guide
Data Science can be an intimating world to jump into. That is why below I have listed some helpful resources to help someone get started. While Data Science is a challenge, there are countless sources around the internet, in your local bookstores, or easily purchased that can help someone break into the industry.
Non-Technical Books
Thinking Fast and Slow by Daniel Kahneman
This international bestseller authored by noted economist and psychologist Daniel Kahneman takes the readers on a fascinating journey by dissecting the mind and goes onto explain two distinct systems that affect our way of thinking and making choices. Of these two systems, one is intuitive, emotional yet fast while the other one is more logical and deliberative. As every data scientist has to have the qualities of a storyteller as well as a decision-maker, this book is a perfect read for him/her.
Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy by Cathy O’Neil
Weapons of Math Destruction is all about what goes wrong when people utilize the tools of data science without a firm conceptual understanding of their roles and responsibilities. Data science is a SCIENCE and poor procedures, unclear objectives and individual biases always lead to completely in-viable and often destructive outcomes. After reading this book, you begin to see the unintended weaponization of data everywhere. While it may induce some anxiety to see how poorly-implemented the worlds most powerful data systems are, none of us can fix the problems we fail to recognize.
Factfullness: Ten Reasons We’re Wrong About The World — And Why Things Are Better Than You Think by Hans Rosling
In the book, Rosling suggests the vast majority of human beings are wrong about the state of the world. He demonstrates that his test subjects believe the world is poorer, less healthy, and more dangerous than it actually is, attributing this not to random chance but to misinformation
The Signal and the Noise by Nate Silver
The Signal and the Noise is probably one of the most popular statistics books around. ‘The signal in the noise’ is a metaphor that is often used in data science: identifying the relevant information ‘signal’ that is correlated to the solution of a given problem from within a ‘noisy’ data set or system. The world is full of distractions, and many of the things that end up effecting our decision making are diverting our attention away from indicators that are more closely correlated to our objectives.
The Functional Art by Alberto Cairo
Unlike any time before in our lives, we have access to vast amounts of free information. With the right tools, we can start to make sense of all this data to see patterns and trends that would otherwise be invisible to us. By transforming numbers into graphical shapes, we allow readers to understand the stories those numbers hide. In this practical introduction to understanding and using information graphics, you’ll learn how to use data visualizations as tools to see beyond lists of numbers and variables and achieve new insights into
Technical Books
Statistical Rethinking by Richard McElreath
Hands-on Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems by Aurélien Géron
Applied Predictive Modeling by Max Kuhn & Kjell Johnson
The Visual Display of Quantitative Information by Edward R. Tufte
Blogs & Websites
“Data Science Central” (datasciencecentral.com)
Data Science Central does exactly what its name suggests and acts as an online resource hub for just about everything related to data science and big data. The site covers a wide array of data science topics regarding analytics, technology, tools, data visualization, code, and job opportunities. Industry experts contribute discussion and insights about key topics.
“SmartData Collective” (smartdatacollective.com)
SmartData Collective is a community site focused on trends in business intelligence and data management. Similar to Data Science Central, it also features insights into data science through contributions by industry experts. Where Data Science Central focuses directly on data science as a whole, SmartData Collective looks at the wider field and how data science can intersect with business.
Flowing Data (flowingdata.com)
Flowing data is one of the best source available online from where you’ll have everything from best books to online tutorials. On Flowing data, Dr. Nathan Yau, PhD explores various aspects of data science to help the budding data lovers around the world to have a better understanding of data. What are the mistakes often made in the data analysis, how to tackle them or how to bring handle challenges that comes across handling data, everything is discussed here.
Kaggle (kaggle.com)
For online competitions, Kaggle is the best website where you will see a plenty of ongoing competition on data science, big data, machine learning, and Hadoop. Compete with the best in the world to see where you stand. Allow yourself to learn something new by competing with the data lovers around the world. Not just learning, you can win exciting prizes also.
Information is Beautiful (informationisbeautiful.net)
Ted Talks
The Best Stats You’ve Ever Seen by Hans Rosling
To most of the world, statistics and data science can be dry and difficult to understand due to the complexity and use of jargon. However, this classic TED Talk by Hans Rosling presents data with the drama and urgency of a sportscaster and breaks down the mythology and common misconceptions about the world as it is through the use of data analytics.
The Beauty of Data Visualization by David McCandless
One of the most renowned data journalists in the world, David McCandless’ love for complex data sets and appealing data visualizations shines through in this TED Talk, as he makes use of data and design to create value by reducing information silos from a wide range of sources. Through this presentation, you’ll find data navigation more interesting than ever.
We’re All Data Scientists by Rebecca Nugent
In this talk, Rebecca Nugent concentrates on how data science has changed education at a fundamental level, empowering students and employees from all backgrounds, including the humanities and social sciences. In fact, according to Nugent, data science is the “science of the people,” as the power of data can be harnessed by everyone regardless of what field they are in.
Subreddits
r/datascience
r/MachineLearning
r/learningmachinelearning
r/datasets
r/dataisbeautiful
r/visualization
r/learnpython
Twitter Accounts
Andrew Ng (@AndrewYNg)
KD Nuggets (@kdnuggets)
Lillian Pierson (@Strategy_Gal)
Wes Mckinney (@wesmckinn)
Yann Lecun (@ylecun)
Kirk Borne (@KirkDborne)
Movies/Documentaries/TV Shows
Moneyball
The Imitation Game
Westworld
The Social Dilemma
The Human Face of Big Data
Humans Need Not Apply