Facts are stubborn, but statistics are more pliable
— Mark Twain
This article illustrates an approach to visualise the distribution of observations for the given dataset with histograms. The article also discusses bubble charts which are useful to convey information regarding a third data variable per observation where bubble size indicates the magnitude of the third data variable.
A bubble chart or bubble plot is a variation of a scatter plot when there are three variables considered for visualisation. The bubbles of different sizes are used to plot the data point that varies according to the other two variables.
I like elegance. I like art nouveau; a stretched line or curve. These things are very much in the foreground of my work
-H. R. Giger
In the fourth article on our current visualisation series, we familiarise the audience with two kinds of plots viz: bar-line graphs and line graphs with different scales on y-axis. We will continue to use Covid-19 datasets for discussing the different chart types.
The graph below shows the total number of new cases due to COVID-19 in Ireland for May, 2021. The dataset used for this graph is referenced from here. The same also depicts…
Mankind invented a system to cope with the fact that we are so intrinsically louse at manipulating numbers. It’s called the graph. — Charlie Munger
In the third article on our data visualisation series with python we familiarise the users with line and box plots as powerful tools for representing quantitative values. Line plots are majorly used in time -series data whereas to get summary statistics about the data we have box plots. We continue with the same dataset from our previous articles .
Line graphs are best suited when we have to analyse quantitative data with respect to time…
The question is not what you look at, but what you see.
In the second article in the series on introduction to visualisations we discuss how to identify relationships among different variables following a matrix representation with a colour coded scheme through heatmaps. We will also discuss how pie charts are simple and effective ways to show comparisons among different types of values for the same variable.
Open your eyes and see the beauty!
This article aims to introduce readers to smart approaches to visualise data. Whether we perform exploratory data analysis where the goal is to understand data ourselves or perform explanatory data analysis where we need to communicate to the end users, data visualisations can facilitate in providing clear ideas. Due to abundant variety in the way data can be visualised, it can cause confusion to understand the purpose and context of using a suitable representation. Choosing a wrong method of data visualisation can cause misinterpretation of data leading towards bad decision making. In the…
One of the secrets of successful living is found in the word balance, referring to the avoidance of harmful extremes.
— James C. Dobson
Getting a balanced dataset to train machine learning models continues to pose challenges. However there is no lack of methods and theories discussed among research communities to identify ways to effectively address such challenges. Data from the financial domain continue to be highly imbalanced with almost all of data belonging to genuine groups while training models for credit risk assessment. We are left with only a fraction or minuscule of data for the fraudulent group. …
If you do not know how to ask the right question, you discover nothing. — W. Edward Deming
There are many real life applications in which we encounter datasets with uneven distribution of samples across target labels. As minority class is usually the class which is more important and usually underrepresented, we introduce through this article a data driven method named Synthetic minority oversampling technique (SMOTE) for handling minority groups. This is one of the popular oversampling techniques for addressing imbalanced datasets.
Balance is the key to everything — Koi Fresco
After a comprehensive look at some key data preprocessing tasks in our previous articles, it’s now time to understand the concept of imbalanced datasets, commonly a problem with the real world datasets. After a gentle introduction, we will follow up with, in the preceding next articles with the techniques needed to deal with it.
The key to artificial intelligence has always been the representation. — Jeff Hawkins
The primary objective of this article is to enhance the accuracy of ML models trained for object detection by annotating unique objects within an image . We discuss how different shapes used for image annotation within a singular image can lead to smart , clutter free and better user experience.
Following code demonstrates the scaling and changes in resolution of images.
#read the image
image = cv.imread('traffic.jpeg')#define percentage for scaling the image (resolution changes)
percentage = 50#calculate width and height of image according to percentage
“Go down deep into anything and you will find mathematics.” — Dean Schlicter
Through this article, we discuss morphological operators and their usefulness towards extracting the most accurate shape of the underlying object/element, with the examples of digits represented in four different languages.
The samples from four language classes are all consolidated in a single dataset with the following features :- (The displayed images are in the sequence as below)
1. English-MNIST with 60,000 training samples on the 10 digit classes (1, 2, 3, 4, 5, 6, 7, 8, 9, 0)
2. Arabic MADBase with 60,000 training samples represented as…
A Communal initiative by Meghana Kshirsagar (BDS| Lero| UL, Ireland), Gauri Vaidya (Intern|BDS). Each concept is followed with sample datasets and Python codes.