Photo by Pixabay from Pexels

Facts are stubborn, but statistics are more pliable

— Mark Twain

This article illustrates an approach to visualise the distribution of observations for the given dataset with histograms. The article also discusses bubble charts which are useful to convey information regarding a third data variable per observation where bubble size indicates the magnitude of the third data variable.

Bubble plot

A bubble chart or bubble plot is a variation of a scatter plot when there are three variables considered for visualisation. The bubbles of different sizes are used to plot the data point that varies according to the other two variables.


Photo by Jan Huber on Unsplash

I like elegance. I like art nouveau; a stretched line or curve. These things are very much in the foreground of my work

-H. R. Giger

In the fourth article on our current visualisation series, we familiarise the audience with two kinds of plots viz: bar-line graphs and line graphs with different scales on y-axis. We will continue to use Covid-19 datasets for discussing the different chart types.

Line graphs with the dual y-axis

The graph below shows the total number of new cases due to COVID-19 in Ireland for May, 2021. The dataset used for this graph is referenced from here. The same also depicts…


Photo by Edward Howell on Unsplash

Mankind invented a system to cope with the fact that we are so intrinsically louse at manipulating numbers. It’s called the graph. — Charlie Munger

Introduction

In the third article on our data visualisation series with python we familiarise the users with line and box plots as powerful tools for representing quantitative values. Line plots are majorly used in time -series data whereas to get summary statistics about the data we have box plots. We continue with the same dataset from our previous articles .

Line Graphs

Line graphs are best suited when we have to analyse quantitative data with respect to time…


Nick Fewings on Unsplash

The question is not what you look at, but what you see.

-Henry Thoreau

In the second article in the series on introduction to visualisations we discuss how to identify relationships among different variables following a matrix representation with a colour coded scheme through heatmaps. We will also discuss how pie charts are simple and effective ways to show comparisons among different types of values for the same variable.

Pie Charts

Pie charts are basically used when we want to show comparison. For ex the chart below shows the percentage of different types of vaccines being used to inoculate the population with…


Photo by Luca Micheli on Unsplash

Open your eyes and see the beauty!

This article aims to introduce readers to smart approaches to visualise data. Whether we perform exploratory data analysis where the goal is to understand data ourselves or perform explanatory data analysis where we need to communicate to the end users, data visualisations can facilitate in providing clear ideas. Due to abundant variety in the way data can be visualised, it can cause confusion to understand the purpose and context of using a suitable representation. Choosing a wrong method of data visualisation can cause misinterpretation of data leading towards bad decision making. In the…


Photo by Ibrahim Rifath on Unsplash

One of the secrets of successful living is found in the word balance, referring to the avoidance of harmful extremes.

James C. Dobson

Introduction

Getting a balanced dataset to train machine learning models continues to pose challenges. However there is no lack of methods and theories discussed among research communities to identify ways to effectively address such challenges. Data from the financial domain continue to be highly imbalanced with almost all of data belonging to genuine groups while training models for credit risk assessment. We are left with only a fraction or minuscule of data for the fraudulent group. …


Photo by Elena Mozhvilo on Unsplash

If you do not know how to ask the right question, you discover nothing. — W. Edward Deming

There are many real life applications in which we encounter datasets with uneven distribution of samples across target labels. As minority class is usually the class which is more important and usually underrepresented, we introduce through this article a data driven method named Synthetic minority oversampling technique (SMOTE) for handling minority groups. This is one of the popular oversampling techniques for addressing imbalanced datasets.

Approaches for handling imbalance


Photo by Christophe Hautier on Unsplash

Balance is the key to everything — Koi Fresco

After a comprehensive look at some key data preprocessing tasks in our previous articles, it’s now time to understand the concept of imbalanced datasets, commonly a problem with the real world datasets. After a gentle introduction, we will follow up with, in the preceding next articles with the techniques needed to deal with it.


Photo by billow926 on Unsplash

The key to artificial intelligence has always been the representation. — Jeff Hawkins

The primary objective of this article is to enhance the accuracy of ML models trained for object detection by annotating unique objects within an image . We discuss how different shapes used for image annotation within a singular image can lead to smart , clutter free and better user experience.

Image processing : Basics

Following code demonstrates the scaling and changes in resolution of images.

#read the image
image = cv.imread('traffic.jpeg')
#define percentage for scaling the image (resolution changes)
percentage = 50
#calculate width and height of image according to percentage
width…


ImgSource: https://pubs.asha.org/doi/10.1044/2014_LSHSS-13-0003

“Go down deep into anything and you will find mathematics.” — Dean Schlicter

Through this article, we discuss morphological operators and their usefulness towards extracting the most accurate shape of the underlying object/element, with the examples of digits represented in four different languages.

Dataset Details

The samples from four language classes are all consolidated in a single dataset with the following features :- (The displayed images are in the sequence as below)

1. English-MNIST with 60,000 training samples on the 10 digit classes (1, 2, 3, 4, 5, 6, 7, 8, 9, 0)

2. Arabic MADBase with 60,000 training samples represented as…

Insights on Modern Computation

A Communal initiative by Meghana Kshirsagar (BDS| Lero| UL, Ireland), Gauri Vaidya (Intern|BDS). Each concept is followed with sample datasets and Python codes.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store