Intro to Data Visualization

Alex Zieky
3 min readOct 30, 2020

When working with mass amounts of data it is impossible to get an understanding of the data by simply looking at the values. That is why as a data scientist it is important to be able to properly visualize the data. There are many different ways in Python to effectively visualize data, in this article I will summarize a few of them.

Matplotlib

First, and probably the easiest way for us to visual data is by using the Python library Matplotlib. When beginning with Matplotlib it is important to understand the graphic below.

This graphic shows the typical structure of a Matplotlib visualization.

Figure: The outermost container

Axes: The real plots. Each “Figure” can contain one or more “Axes”

Axis: X & Y, title, legend, axis labels, major & minor ticks etc.

Coding a visualization in Python

Step 1. Create the Figure

fig = plt.figure()

Step 2. Add the axes, where all quantities are in fractions of figure width and height

axes = fig.add_axes([left,bottom,width,height], projection = “Your Choice”)

Step 3. Plot y versus x as lines and/or markers

axes.plot(x,y)

Step 4. Set the label for the x-axis

axes.set_xlabel(“X Label”)

Step 5. Set the label for the y-axis

axes.set_ylabel(“Y Label”)

Step 6. Set the title of the current axes

axes.set_title(“Title”)

This is just a basic demonstration on how to code a Matplotlib visualization, but there are a lot more intricacies that could potentially go into this process. There are many advantages to using Matplotlib. It is known as the most widely used data visualization library. Also, because Matplotlib was the first Python data visualization library, many other libraries are built on top of it or designed to work in tandem with-it during analysis. That being said, there are also some cons to working with the library. The most notable being that while the visualization give a good sense of the data it’s not very useful for creating publication-quality charts quickly and easily.

Seaborn

Seaborn is another Python library that takes the power of matplotlib to create more appealing charts. Where matplotlib lacks in visually aesthetic visualizations, Seaborn excels. That being said, Seaborn is built on-top of matplotlib, so you’ll need to now matplotlib to work with Seaborn.

Plotly

Plotly is another library that can be accessed from Python. It is somewhat more complex than Matplotlib and Seaborn, but it has many unique functionalities such as contour plots, dendrograms, and 3D charts, it has visualizations like scatter plots, line charts, bar charts, error bars, box plots, histograms, multiple axes, subplots and many others.

Effective Data Storytelling

“The ability to take data — to be able to understand it, to process it, to extract value from it, to visualize it, to communicate it — that’s going to be a hugely important skill in the next decades.” (Dykes, 2019)

The below graphic does a good job explaining how to use data to properly tell a story.

· When narrative is coupled with data, it helps to explain to your audience what’s happening in the data and why a particular insight is important.

· When visuals are applied to data, they can enlighten the audience to insights that they wouldn’t see without charts or graphs.

· When narrative and visuals are merged together, they can engage or even entertain an audience.

· When you combine the right visuals and narrative with the right data, you have a data story that can influence and drive change.

I have attached some professional examples below that demonstrate excellent data visualizations to tell a story.

  • https://flowingdata.com/
  • https://informationisbeautiful.net/
  • https://junkcharts.typepad.com/
  • https://pudding.cool/

References

Dykes, B. (2019, December 20). Data Storytelling: The Essential Data Science Skill Everyone Needs. Retrieved October 30, 2020, from https://www.forbes.com/sites/brentdykes/2016/03/31/data-storytelling-the-essential-data-science-skill-everyone-needs/?sh=3b1e2b9a52ad

--

--

Alex Zieky

Financial professional with experience in data acquisition, data modeling, statistical analysis, machine learning, deep learning, and NLP.