Starbucks Rewards Analysis
Starbucks as a means of promoting sales and awarding customer loyalty makes offers to their customers which comes in 3 forms; BOGO, discount and informational offers with difficulty and validity periods.
Their target is to increase sells through the customers purchasing merchandise through means and also reduce economic loss on customers who were already on the verge of making purchase.
We are given simulated data to find the target demographics with the best outcome which is completing the reward when these offers are sent.
This project is based on analysis alone which the results answer the following questions:
- What age demographics complete their offers?
- What gender demographics complete their offers?
- What income demographics complete their offers.
The data was sourced from Starbucks data simulation available in Udacity Data science Nanodegree program.
The data files were explored to view their contents.
This is the table from the portfolio.json showing 10 different types of offer id’s. Each offer has a reward value between 0–10 and difficulty between 0–20 with each offer having a specific channel bundle associated with it which would be ignored in this study as it is out of scope.
Since our focus is on demographics, I chose the age demographic and visualized it in terms of the gender. This is to set a premise to the customer distribution across age and gender. This visualization shows that there is a similarity in age distribution, but a higher number of male population.
The data preprocessing stage included cleaning the data by removing the outliers present in the age demographic, dropping columns that won’t be useful to the analysis, renaming columns, dropping duplicates as a result of dummy data creation, grouping and data transformation which was done on the transcript file.
This project was implemented using Python 3.6.2 (google collaboratory) with the following files in it:
- Transcript.json — A simulated transaction data.
- Profile.json — A database of customers.
After the preprocessing and implementation stage, I started the analysis stage.
This phase started by checking the overall performance of offers that were sent out and ended up being completed.
I checked the distribution of these offers and found out that it was balanced.
Then, I checked the different kind of offers by their type and reward views and found out that the BOGO are viewed ~20% more than the others.
Which was surprising as the number of completed offers in the discount type of rewards was slightly higher than that of the BOGO reward leading to the question, was it intentional?
To this understand phenomenon, I evaluated the timeline of each received offer, by calculating the beginning and end of each offer.
From these numbers, ~1/3 of the completed offers are not even viewed by the user on time which means that Starbucks made offers to users who are already going to make the purchases.
Demography by gender:
Total number of completed offered are higher for males, but when normalized with number of customers in database, we see that they are larger across the other and female customers.
Demography by age:
The distribution shows the total number completed offers and it points towards the 50–60 age demographic being the majority with the most completed offers.
Demography by income:
There was no model trained in this analysis, so nothing to evaluate.
I decided to analyze the data rather than train a model because analyzing the data shows more insight to the customer demographics using the past pattern of reward completion to enable the company tailor the each rewards with higher level of completion to the different demographics that redeem them the most.
- The other gender labelled ‘O’ has incomplete data which results in its analysis being inconclusive.
- There is higher number of completed offers for income levels in between $50–70k, especially for female customers.
- For income levels below $50,000 male customers make up more of the completed offer compared to female customers.
There is room for improvement on this project especially data-wise. The incomplete data in regards to the ‘O’ gender has a significant impact on the overall analysis as they make up part of the customer database and their impact won’t be taken into consideration when inconclusive results occur.