Data Visualisations: Part II

--

Nick Fewings on Unsplash

The question is not what you look at, but what you see.

-Henry Thoreau

In the second article in the series on introduction to visualisations we discuss how to identify relationships among different variables following a matrix representation with a colour coded scheme through heatmaps. We will also discuss how pie charts are simple and effective ways to show comparisons among different types of values for the same variable.

Pie Charts

Pie charts are basically used when we want to show comparison. For ex the chart below shows the percentage of different types of vaccines being used to inoculate the population with reference to data from this source. Here we can clearly see that Pfizer is the main vaccine used in Ireland till April 21.

Pie chart representing percentage share of vaccine

Heatmaps

Heat maps are colour coded maps or matrices where the shades represent the nature of data with darker shades inclined towards worst values and lighter shades for best values. The main purpose of heat maps is to represent the volume/count/number of events within the data. The data within the cell is aggregated.

For plotting heatmap, we will use the historic data for Ireland Local Electoral Areas (LEA) with their 14 days incident rates per 100k population. We will plot a heatmap for visualising the comparative analysis among the incidence rates in the three weeks of April, 2021. We will follow the steps below to plot a heatmap:

  1. Read the data with read_csv() function from pandas
  2. Extract months and years from the Eventdate column of the data. This can be done with the function DatetimeIndex() from pandas library, with the argument as the EventDate column
  3. We will filter the dataset to get the data for the month of April, 2021
  4. Now, we have dataset only for the month of April, 2021
  5. For plotting the heatmap, we need to reshape the data according to LEAs. We will therefore reshape the dataset with LEA as the index and incidence rate as values mapped with the counties. This can be achieved with the function pivot() from the pandas library.
  6. The dataset is now ready to plot. We will plot a heatmap with seaborn library with the function heatmap() with the filtered dataset as the argument. Other arguments will customise the look of the plot {annot (True/False): to display the values of the traits in the grid cells of the map, linewidth(numeric value): to insert gaps between the grid cells, cmap(values): to change the colour scheme of the plot}.
Heatmap for incidence rates for 14 days for 100k population across LEAs in Ireland

The entire code for the series can be found here.

Takeaways

We illustrated how heatmaps can prove useful to visualise the volume and location of event within a dataset. The colour coded scheme makes them self explanatory with the darker shade representing the higher value and the lighter shade representing the lower value. In this way, it’s easy to quickly gain the key insights. The slices on a pie chart are effective ways of comparing a set of values for a particular variable with the bigger slice indicating higher number of samples/instances for that value.

Do you have any questions?

Kindly ask your questions via email or comments and we will be happy to answer :)

--

--

Insights on Modern Computation
Perspectives on data science

A Communal initiative by Meghana Kshirsagar (BDS| Lero| UL, Ireland), Gauri Vaidya (Intern|BDS). Each concept is followed with sample datasets and Python codes.