In this class we would cover Data Visualization with Python. This class follows from Part 1: Your First Data Science Class. I recommend you check out that first if you are a beginner.
The following is covered in this class:
- Our Dataset
- Univariate Plots: Understanding Attributes Independently
- Density Plots
- Box Plots (Whisker Plot)
- Multivariate Plots: Relationship Variables
- Correlation Matrix Plot
- Scatter Matrix Plot
- Using Heatmap – Seaborn
- Using Matshow
1. Our Datatset – The Wine Dataset
The dataset of wine was obtained from the UCI Machine Learning Repository. We would use the wine.csv file which is available for free from here. According to the documentation of this dataset, the data consists of 13 physiochemical parameters measured in 178 different wine samples from three distinct cultivars(variety produced by selective breeding) grown in Italy.
Use the code below to import your dataset:
from pandas import ExcelWriter from pandas import ExcelFile path = r"/Users/kindsonmunonye/Datasets/wine.xlsx" wine_data = pd.read_excel(path, header=0)
The parameters are given below:
- Metallic Acid
- Alkalinity of Ash
- Total Phenols
- Flavanoid Phnols
- Color Intensity
- OD280/OD315 of Diluted wines
2. Univariate Plots: Visualising Individual Features
This type of plots help use understand individual variables of our dataset independent of other variables. Some of the univariate plots we would use in this class includes, histograms, univariate scatter plots, line plots.
We begin with histogram
A histogram is a plot that groups the data into bins or vertical bars. Each attributes is represented with bin whose height represents the values of the attribute. An example of use of histogram is to get the count of observations in given category of the totals of certain columns.
Use the code below to get a histogram plot of the wine dataset:
# HISTOGRAM fig = plt.figure(figsize = (15,20)) ax = fig.gca() wine_data.hist(ax = ax) # wine_data.hist(ax = ax, column='Wine') for a single data column plt.show()
4. Density Plot
A density plot is similar to a histogram bu it uses a smooth curve to represent the data attributes. It uses the kernel density estimate to show the probability density function (PDF) of the variables.
The code below provides the density plot of the wine dataset
# DENSITY PLOT fig = plt.figure(figsize = (15,20)) ax = fig.gca() wine_data.plot(ax = ax, kind='density', subplots=True, layout=(4,4), sharex=False) plt.show()
The kinds of plot can be changes into any of the following.
- ‘line’ : line plot (default)
- ‘bar’ : vertical bar plot
- ‘barh’ : horizontal bar plot
- ‘hist’ : histogram
- ‘box’ : boxplot
- ‘kde’ : Kernel Density Estimation plot
- ‘density’ : same as ‘kde’
- ‘area’ : area plot
- ‘pie’ : pie plot
- ‘scatter’ : scatter plot (DataFrame only)
- ‘hexbin’ : hexbin plot (DataFrame only)
I recommend you try them out yourself to see what you get
5. Box Plots
This is also called box and whisker plot. It provides a visualization of the distribution of each attribute in the dataset. It draws a line in the middle value of the attribute and a box around the 25% and 75% (1st and 3rd quartiles). Then it also draws a whisker to indicate the spread of the data.
Use the code below to get a box plot of the wine dataset.
# BOX AND WHISKER PLOT fig = plt.figure(figsize = (15,20)) ax = fig.gca() wine_data.plot(ax = ax, kind='box', subplots=True, layout=(4,4), sharex=False) plt.show()
6. Multivariate Plots
This kind of plots are used for multi-variable visualization. Multivariate plots provides an insight into the relationship and interaction between the variables in a dataset.
Some multivariate plots includes corelation matrix plot, scatter matrix plot and pairwise plot (pairplot)
7. Correlation Matrix
Correlation is provides an insight into the relationship between two variables. So how does changes in one variable affect the other variables(s)? A correlation matrix plot uses the correlation coefficient (Pearson’s Correlation coefficient). This value indicates how strong or weak a relationship is between two variables.
Correlation matrix plot can be created using matshow from matplotlib or the seaborn module.
# Plot using Seaborn import seaborn as sb fig = plt.figure(figsize = (15,15)) ax = fig.gca() sb.heatmap(correlations, annot=True, ax=ax) plt.show()
# Using matshow import numpy as np fig = plt.figure(figsize = (15,15)) ax = fig.gca() # The gca() method figure module of matplotlib library is used to get the current axes. cax = ax.matshow(correlations, vmin=-1, vmax=1) # matshow() function is used to represent an array as a matrix fig.colorbar(cax) ticks = np.arange(0,14,1) ax.set_xticks(ticks) ax.set_yticks(ticks) ax.set_xticklabels(correlations.columns) ax.set_yticklabels(correlations.columns) plt.show()
The outputs from the codes above is also referred to as a ‘heatmap’.
8. Scatter Matrix Plot
The scatter plot or scatter matrix shows how much one variable is affected by another variable or the relationship between the variables. This represented using dots in two dimensions. Scatter plots are similar to x-y graphs since they use the horizontal(x) and the vertical(y) axis.
The code below produces a scatter matrix.
# Using Scatter matrix from pandas import pandas.plotting as pp fig = plt.figure(figsize = (15,15)) ax = fig.gca() pp.scatter_matrix(wine_data, ax=ax) # pp.scatter_matrix(wine_data[['Wine','Alcohol', 'Ash', 'Malic.acid']], ax=ax) # Taking a subset plt.show()
Scatter matrix plot using seaborn
# Using Seaborn import seaborn as sb sb.pairplot(wine_data[['Ash', 'Wine', 'Hue', 'Acl']]) # Taking Subset plt.show()
Complete Video Tutorial on Plotting
- Tutorial 1 – Introduction and Basics of Plotting https://youtu.be/gWieyVShHHk
- Tutorial 2 – Formatting Your Plot https://youtu.be/IgzpJ8C2cRo
- Tutorial 3 – Formatting Your Plot Using shorthand https://youtu.be/4yExxAqRkc8
- Tutorial 4 – PyPlot Functions https://youtu.be/x8nd4knAtXI
- Tutorial 5 – Plotting the Heart Curve https://youtu.be/JtAXEaiOdwo
- Tutorial 6 – Plotting the Figure 8 Shape https://youtu.be/p3B7SKipNTA
- Tutorial 7 – Working with Subplots https://youtu.be/qMYum724N8g
- Tutorial 8 – Creating as Scatterplot https://youtu.be/D3rJwgY2R8E
- Tutorial 9 – Creating a Histogram https://youtu.be/dEWKFi7TIyY
- Tutorial 10 – Creating a Bar Chart https://youtu.be/sannLieIIPU