Time series analysis is a statistical analysis that deals with trend analysis. Time series analysis is done using a time series data that spans across a period of time. In summary, it involves looking for the correlation between your dependent variable and time.
Facebook Prophet algorithm is an algorithm designed by facebook which is an open source time series forecasting algorithm. It builds a model by finding the best smooth line represented by:
y(t) = g(t) + s(t) + h(t) + ϵ
g(t) = overall growth trend
s(t) = yearly seasonality, weekly seasonality
h(t) = holiday effect
One hot encoding is a process where categorical variables are converted into a form that is fed to a machine learning algorithm for more accurate prediction.
I demonstrated the concept on a dummy data curated by me.
In this sample dataset, ‘ascites’, ‘edema’, and ‘stage’ are categorical variables while ‘cholesterol’ is a continuous variable, since it can be any decimal value greater than zero.
In this dataset, I applied one hot encoding to the edema column because it had three categories. One hot encoding onto this column will create feature columns for each of the possible outcome. …
This is the second phase of my image classification model. The article on how was done can be found here.
For the second phase, I deployed the model to streamlit. To make this possible, you will need to apply for an invite and once you have gotten it, you can start your deployment.
First, you will need to install streamlit into your local machine using
pip install streamlit
Then run => streamlit hello
This will open up the web page on your localhost. For more information, you can check out their documentation .
Data wrangling is a preprocessing phase where data is transformed from one form to another. The aim of this phase is to make data available for analytics and this phase includes data collection, exploratory data analysis etc.
In this project, I performed data wrangling using data from Gapminder, a Swedish non-profit organization.
This data folder contained three csv files namely: cell_phones.csv, population1.csv and ddf — entities — geo — country.csv.
Loading the datasets and visualizing them.
Principal Component Analysis(PCA) is a method employed to reduce the dimensionality of a data set consisting of a large number of interrelated variables, while retaining as much as possible of the variation present in the data set. This is achieved by transforming to a new set of variables, the principal components (PCs), which are uncorrelated, and which are ordered so that the first few retain most of the variation present in all of the original variables. Source .
To demonstrate this, I used a dataset from Kaggle. I focused on only the color variable to achieve the above. …
Performance metrics is used to measure the performance of machine learning models. There are two different application of performance metrics; in classification models and regression models.
I demonstrated the use of performance metrics in a regression model using Scikit-learn library, the metrics used in this project are mean absolute error, mean square error and root mean square error.
The data used for this project is the USA housing dataset. After I imported the required libraries, I loaded the dataset and visualized the first five columns using the head() function. …
Transfer learning is repurposing a pre-trained model for another but similar usage. This method is seen in various machine learning applications especially in situations where the dataset is relatively small.
In this project, I built an image classification model from scratch using transfer learning. When I said from scratch, the dataset was my custom dataset which I scrapped using the IDT tool with my own custom classes. You can read about this tool here.
Most people tend to have issues classifying bags and carry-ons. …
Leveraging Long Short-Term Memory (LSTM) to generate novel jazz solos.
The dataset is a corpus of jazz music with the generation using 78 values. In this case, values can be thought as musical notes.
This model was propositioned to learn musical patterns, so the LSTM was set with 64 dimensional hidden states. Then input and output layers were created for the network.
Next was to create a djmodel() function which:
a. Creating a custom lambda layer.
Inspired by a problem statement on Upwork, though not this. An advert was placed for a data scientist that could analyze interconnection between connections in one’s network. It gave me an idea to analyze my LinkedIn growth, searched online and found this repo by Guillaume Chevalier.
Navigated to my LinkedIn profile, went to settings then privacy setting to get my data. It took 13 minutes before I was notified that it was ready. The file I got was in .txt, I had to use an online extractor for it, probably because I requested for more than my connections dataset.
Using the COVID-19 dataset from Kaggle , build sentiment analysis models using logistic regression and LSTM.
I imported the needed libraries to make the preprocessing possible.
loaded the csv file and encoded it using latin1 as UTF-8 or just reading without encoding returns errors. I used the .info function to get information about the dataset.
Data Scientist || Machine Learning enthusiast and hobbyist