Electronic Health Record (EHR) according to Wikipedia is the systematized collection of patient and population health information stored electronically in a digital format. These records are shared through network-connected or other information networks and exchanges. EHRs may include a range of data, from demographics, medical history, medication and allergies to immunization status, laboratory test results, radiology images, vital signs, personal statistics like age and weight, and billing information.

This project is a hypothetical case of a data scientist working with EHR for patient selection for diabetes and it is one of my projects from AI for Healthcare nanodegree program. This…

Feature scaling in machine learning is a process of calculating distances between data. There are so many methods of scaling data, but in this practice I worked with the standard scaler from scikit-learn.

Standard scaler involves standardizing a feature by subtracting the mean and then scaling to unit variance. This results in a distribution with a standard deviation equal to 1. The variance is equal to 1 also, because variance = standard deviation squared. And 1 squared = 1. It also makes the mean of the distribution 0. About 68% of the values will lie be between -1 and 1.

…

The novel Coronavirus disease also known as COVID-19, is a new strain of SARS-Cov-2 that has ravaged the world as a global pandemic with its rapid spread and high mortality. This has brought together the different stakeholders in the world from the government to academia to researchers and scientists to curb this virus through vaccine development and testing. Currently, there are varieties of vaccines all around the world in different countries and this analysis seeks to monitor its progression.

Tracking the progress of the Covid-19 vaccine in Nigeria in comparison to Africa and the world.

Time series analysis is a statistical analysis that deals with trend analysis. Time series analysis is done using a time series data that spans across a period of time. In summary, it involves looking for the correlation between your dependent variable and time.

Facebook Prophet algorithm is an algorithm designed by facebook which is an open source time series forecasting algorithm. It builds a model by finding the best smooth line represented by:

y(t) = g(t) + s(t) + h(t) + ϵ

where:

g(t) = overall growth trend

s(t) = yearly seasonality, weekly seasonality

h(t) = holiday effect

In this…

One hot encoding is a process where categorical variables are converted into a form that is fed to a machine learning algorithm for more accurate prediction.

I demonstrated the concept on a dummy data curated by me.

In this sample dataset, ‘ascites’, ‘edema’, and ‘stage’ are categorical variables while ‘cholesterol’ is a continuous variable, since it can be any decimal value greater than zero.

In this dataset, I applied one hot encoding to the edema column because it had three categories. One hot encoding onto this column will create feature columns for each of the possible outcome. …

This is the second phase of my image classification model. The article on how was done can be found here.

For the second phase, I deployed the model to streamlit. To make this possible, you will need to apply for an invite and once you have gotten it, you can start your deployment.

First, you will need to install streamlit into your local machine using

*pip install streamlit*

Then run => *streamlit hello*

This will open up the web page on your localhost. For more information, you can check out their documentation .

Data wrangling is a preprocessing phase where data is transformed from one form to another. The aim of this phase is to make data available for analytics and this phase includes data collection, exploratory data analysis etc.

In this project, I performed data wrangling using data from Gapminder, a Swedish non-profit organization.

This data folder contained three csv files namely: cell_phones.csv, population1.csv and ddf — entities — geo — country.csv.

Loading the datasets and visualizing them.

Principal Component Analysis(PCA) is a method employed to reduce the dimensionality of a data set consisting of a large number of interrelated variables, while retaining as much as possible of the variation present in the data set. This is achieved by transforming to a new set of variables, the principal components (PCs), which are uncorrelated, and which are ordered so that the first few retain most of the variation present in all of the original variables. Source .

To demonstrate this, I used a dataset from Kaggle. I focused on only the color variable to achieve the above. …

Performance metrics is used to measure the performance of machine learning models. There are two different application of performance metrics; in classification models and regression models.

I demonstrated the use of performance metrics in a regression model using Scikit-learn library, the metrics used in this project are mean absolute error, mean square error and root mean square error.

The data used for this project is the USA housing dataset. After I imported the required libraries, I loaded the dataset and visualized the first five columns using the head() function. …

Transfer learning is repurposing a pre-trained model for another but similar usage. This method is seen in various machine learning applications especially in situations where the dataset is relatively small.

In this project, I built an image classification model from scratch using transfer learning. When I said from scratch, the dataset was my custom dataset which I scrapped using the IDT tool with my own custom classes. You can read about this tool here.

Most people tend to have issues classifying bags and carry-ons. …

Data Scientist || Machine Learning enthusiast and hobbyist