Bellabeat Case Study | Google Data Analytics Capstone
Summary
This is the capstone project of the Google Data Analytics Certificate. The consumption patterns of 33 Fitbit users were analyzed over 30 days to find trends that will help plan the next marketing campaign for Bellabeat products.
Date
- Data Analytics
Category
SQL, Excel, Power BI
Tools
The Sky is The Limit
Urška Sršen and Sando Mur founded Bellabeat (in 2013), a high-tech company that manufactures health-focused smart products for women.
By 2016, Bellabeat had opened offices around the world and launched multiple products that track activity, sleep, stress, meditation and reproductive health.
In 2024, the company already has more than 10 million users, more than 1 million synchronized devices and it seems that it will not stop growing.
The Opportunity
Time is a hybrid watch that tracks menstrual cycle, meditation, and hydration. It gives insights into your health and lifestyle with guidance on how to improve them.
Sršen wants to add new in-app features to be used alongside Time. He has asked the analytics team to look for trends in Fitbit usage data to gain insights into how people already use their smart devices, and then offer high-level recommendations for better marketing strategy and in-app features.
Problem
- We want to improve the features of Time compared to similar products, thereby improving the marketing of the product.
- It is unclear which in-app features would be most useful for Bellabeat users.
Solution
- Conduct an analysis to understand how smartwatch users are already using their devices.
- Provide in-app feature recommendations based on the lifestyles of each type of smartwatch user.
Workflow
The development of this project was divided into 6 phases:
Ask
Business Task was defined in this phase, as well as a series of guiding questions to analyze the data and deliver a report to stakeholders.
Show Guiding Questions
- What are some trends in smart device usage?
- On average, how long do people exercise?.
- On average, what is a person’s caloric consumption throughout the day?
- Is it possible to segment the data population according to caloric consumption?
- How could these trends apply to Bellabeat customers?
- How could these trends help influence Bellabeat marketing strategy?
- Can I suggest new in-app features?
- Can I improve/modify the way a specific product is advertised?
Business Task
Analyze the FitBit data to find patterns on how consumers already use smart devices. Then, apply these insights and give high-level recommendations for Bellabeat’s marketing strategy for Time (Bellabeat’s smart watch).
Prepare
Data sources were defined, their credibility was confirmed, and issues with bias, privacy, and accessibility were identified.
How was the data collected?
These datasets were generated by respondents to a distributed survey via Amazon Mechanical Turk between 12/Apr/2016 and 12/May/2016. 33 eligible Fitbit users consented to the submission of personal tracker data, including minute-level output for physical activity, heart rate, and sleep monitoring.
How is the data organized?
It is organized in long and wide format. However, I’ll be using only the long format for data consistency.
It includes data about calories, heartrate, intensity, sleep, steps, weight and daily activity of the users in CSV format.
Metadata for this data set can be found in Fitabase.
Are there issues with bias or credibility in this data?
There are no issues regarding credibility since these datasets were generated by respondents to a distributed survey via Amazon Mechanical Turk. However, there might be issues with bias since we don’t know the age and gender of the participants; in other words, the dataset might not be a representative sample of the average Bellabeat’s user.
How am I addressing licensing, privacy, security, and accessibility?
This is open-data, distributed under CC0 licence (public domain) as it shows in the Kaggle website (Licence section).
The participants don’t share personal information (name, age, gender, address, etc.), so the data is anonymous.
Process
The working tools were defined. The data sets were organized and cleaned (using SQL and Excel).
- Data sets were organized and cleaned (using Excel).
- SIR model was solved numerically using RK4 method (using Python).
- Graphs were made in Power BI.
Show Complete Process
Before starting the analysis, I had to check the integrity and consistency of the data. I looked for duplicate values, null values, outliers, and made sure the dates were in the same format. To do this, I used SQL (via BigQuery) and Excel.
Below is the general structure of the commands used for each task.
Null Values
SELECT
COUNT(*) AS total_rows,
COUNT(CASE WHEN column_1 IS NULL THEN 1 END) AS null_column1,
COUNT(CASE WHEN column_2 IS NULL THEN 1 END) AS null_column2,
...
COUNT(CASE WHEN column_n IS NULL THEN 1 END) AS null_columnn
FROM
"bellabeat-case-study-gdac.FitBit_Data.my-table"
Duplicate Values
SELECT
Id,
column1,
...
COUNT(*)
FROM
"bellabeat-case-study-gdac.FitBit_Data.my-table"
GROUP BY
Id, column1, ...
HAVING COUNT(*) > 1;
Outliers
-- Search for field values1 outside a reasonable range
SELECT
*
FROM
"bellabeat-case-study-gdac.FitBit_Data.my-table"
WHERE
field1 < value OR field1 > value
After the data cleaning process, I made a basic data model (in Power BI) to connect all the tables through the primary key Id, which will allow me to perform the analysis in the next phase.
Limitations
There are multiple limitations with the dataset that must be considered before taking this analysis as useful:
Show Limitations
- It is not known how many of the people in the study are male or female. The analysis may be biased towards a specific gender.
- Since Bellabeat products are for women, the dataset may not be a representative sample of the average Bellabeat user.
- Given the low number of participants (33) used to represent the total population (+1,000,000 users), and the unknown number of women in the data, there is up to 22.46% error in the certainty of the analysis.
The error was calculated based on the sample size, with a confidence level of 99%, sample of 33, population proportion of 50% (considering that the FitBit user can be male or female) and population size of 1,000,000 (Considering the population of users with Bellabeat smart devices).
Analyze
The data was analyzed using Power BI. Trends were identified in 3 groups of people: Athletes, Fitness Enthusiasts and Sedentary People.
This is explained in detail in the following section, “Insights”.
Share
Bellabeat in-app feature recommendations were made based on the data findings.
This is explained in detail in the following section, “Data-driven Recommendations”.
Insights
The users in the dataset were divided into 3 groups based on their daily physical activity: Athletes, Fitness Enthusiasts, and Sedentary People.
Show Grouping Process
Since there is no way to tell if a user is a man or a woman, then we cannot differentiate the set by calories burned throughout the day (for example, a woman who exercises occasionally could be mistaken for a sedentary man due to differences in energy expenditure).
Because of this, it was decided to use the number of minutes where the user had a high activity throughout the day (according to FitBit parameters). Thus we have,
- Athlete: Over 30 very active minutes.
- Fitness Enthusiast: Between 5 and 30 very active minutes.
- Sedentary Person: Less than 5 very active minutes a day.
After grouping users, I decided to define their daily rutine using percentiles. For example, since the average athlete sleeps 7 hours, this means that he or she is asleep for 29% of the day. Because of this, the 29th percentile was used to define the athlete’s routine.
Having these 3 groups of the dataset defined, we have the following insights.
- Athletes
On average, an athlete wakes up at 5:00 AM. Their daytime activity is between 8:00 AM and 7:00 PM. They exercise between 5:00 PM and 7:00 PM. Finally, they go to sleep between 10:00 PM and 5:00 AM.
- Fitness Enthusiast
A fitness enthusiast wakes up at 6:00 AM. Their daytime activity is between 8:00 AM – 8:00 PM. They exercise at 12:00 PM and between 5:00 PM – 7:00 PM. Finally, they go to sleep between 11:00 PM and 6:00 AM.
- Sedentary People
Trends
Two favorable correlations could be obtained. According to Devore J. (2008), these are moderate correlations.
Daily Sport Minutes vs Distance
0.73
correlation coefficient
The more exercise a person does, the more distance they cover, so we can deduce that their favorite sport is running.
Data-driven Recommendations
Considering the limitations of the dataset proposed in the capstone, I’d recommend to look for more public data sets that could be used for this business task.
However, let’s do a hypothetical exercise. Suppose this data set was a representative sample of the average Bellabeat user. In that case I would make the following recommendations.
- Water Intake
Depending on the user’s daily routine, the app will notify them to refill their water bottle before periods of highest energy consumption so that they are always hydrated.
- Sleep Health
The app will notify users about the importance of sleep for their health and provide some recommendations to improve sleep quality in case they have irregular sleep periods.
- Custom Articles
The app will show articles related to each user’s lifestyle to improve their health. For example, sedentary people will be shown some stretches they can do at the office, while athletes will be recommended the most popular routes for runners in their city.
- Quote of the Day
The app will display inspirational messages based on the time they begin their daily activities.