Starbucks Rewards Analysis

Starbucks as a means of promoting sales and awarding customer loyalty makes offers to their customers which comes in 3 forms; BOGO, discount and informational offers with difficulty and validity periods.

Their target is to increase sells through the customers purchasing merchandise through means and also reduce economic loss on customers who were already on the verge of making purchase.

Problem Statement:

We are given simulated data to find the target demographics with the best outcome which is completing the reward when these offers are sent.


This project is based on analysis alone which the results answer the following questions:

  • What age demographics complete their offers?
  • What gender demographics complete their offers?
  • What income demographics complete their offers.

Data Source:

The data was sourced from Starbucks data simulation available in Udacity Data science Nanodegree program.

Data Exploration:

The data files were explored to view their contents.

This is the table from the portfolio.json showing 10 different types of offer id’s. Each offer has a reward value between 0–10 and difficulty between 0–20 with each offer having a specific channel bundle associated with it which would be ignored in this study as it is out of scope.

Data Visualization:

Since our focus is on demographics, I chose the age demographic and visualized it in terms of the gender. This is to set a premise to the customer distribution across age and gender. This visualization shows that there is a similarity in age distribution, but a higher number of male population.


Data preprocessing:

The data preprocessing stage included cleaning the data by removing the outliers present in the age demographic, dropping columns that won’t be useful to the analysis, renaming columns, dropping duplicates as a result of dummy data creation, grouping and data transformation which was done on the transcript file.

Post-outlier removal


This project was implemented using Python 3.6.2 (google collaboratory) with the following files in it:

  • Portfolio.json
  • Transcript.json — A simulated transaction data.
  • Profile.json — A database of customers.


After the preprocessing and implementation stage, I started the analysis stage.

This phase started by checking the overall performance of offers that were sent out and ended up being completed.

I checked the distribution of these offers and found out that it was balanced.

Then, I checked the different kind of offers by their type and reward views and found out that the BOGO are viewed ~20% more than the others.

Which was surprising as the number of completed offers in the discount type of rewards was slightly higher than that of the BOGO reward leading to the question, was it intentional?

To this understand phenomenon, I evaluated the timeline of each received offer, by calculating the beginning and end of each offer.

From these numbers, ~1/3 of the completed offers are not even viewed by the user on time which means that Starbucks made offers to users who are already going to make the purchases.

Demography by gender:

Total number of completed offered are higher for males, but when normalized with number of customers in database, we see that they are larger across the other and female customers.

Demography by age:

The distribution shows the total number completed offers and it points towards the 50–60 age demographic being the majority with the most completed offers.

Demography by income:

Model Evaluation:

There was no model trained in this analysis, so nothing to evaluate.


I decided to analyze the data rather than train a model because analyzing the data shows more insight to the customer demographics using the past pattern of reward completion to enable the company tailor the each rewards with higher level of completion to the different demographics that redeem them the most.


  • The other gender labelled ‘O’ has incomplete data which results in its analysis being inconclusive.
  • There is higher number of completed offers for income levels in between $50–70k, especially for female customers.
  • For income levels below $50,000 male customers make up more of the completed offer compared to female customers.


There is room for improvement on this project especially data-wise. The incomplete data in regards to the ‘O’ gender has a significant impact on the overall analysis as they make up part of the customer database and their impact won’t be taken into consideration when inconclusive results occur.


Thank you for reading. This is the repo to this project and you can connect with me on LinkedIn.



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Nwosu Rosemary

Nwosu Rosemary


Data Scientist || Machine Learning enthusiast and hobbyist