Bellabeat Case Study | Google Data Analytics Capstone

Summary

This is the capstone project of the Google Data Analytics Certificate. The consumption patterns of 33 Fitbit users were analyzed over 30 days to find trends that will help plan the next marketing campaign for Bellabeat products.

Date

Category

SQL, Excel, Power BI

Tools

The Sky is The Limit

Urška Sršen and Sando Mur founded Bellabeat (in 2013), a high-tech company that manufactures health-focused smart products for women.

By 2016, Bellabeat had opened offices around the world and launched multiple products that track activity, sleep, stress, meditation and reproductive health.

In 2024, the company already has more than 10 million users, more than 1 million synchronized devices and it seems that it will not stop growing.

The Opportunity

Time is a hybrid watch that tracks menstrual cycle, meditation, and hydration. It gives insights into your health and lifestyle with guidance on how to improve them.

Sršen wants to add new in-app features to be used alongside Time. He has asked the analytics team to look for trends in Fitbit usage data to gain insights into how people already use their smart devices, and then offer high-level recommendations for better marketing strategy and in-app features.

Problem

Solution

Workflow

The development of this project was divided into 6 phases:

Ask

Business Task was defined in this phase, as well as a series of guiding questions to analyze the data and deliver a report to stakeholders.

  1. What are some trends in smart device usage?
    • On average, how long do people exercise?.
    • On average, what is a person’s caloric consumption throughout the day?
    • Is it possible to segment the data population according to caloric consumption?
  2. How could these trends apply to Bellabeat customers?
  3. How could these trends help influence Bellabeat marketing strategy?
    • Can I suggest new in-app features?
    • Can I improve/modify the way a specific product is advertised?

Analyze the FitBit data to find patterns on how consumers already use smart devices. Then, apply these insights and give high-level recommendations for Bellabeat’s marketing strategy for Time (Bellabeat’s smart watch).

Prepare

Data sources were defined, their credibility was confirmed, and issues with bias, privacy, and accessibility were identified.

The data is stored in Kaggle and Zenodo

These datasets were generated by respondents to a distributed survey via Amazon Mechanical Turk between 12/Apr/2016 and 12/May/2016.  33 eligible Fitbit users consented to the submission of personal tracker data, including minute-level output for physical activity, heart rate, and sleep monitoring.

It is organized in long and wide format. However, I’ll be using only the long format for data consistency.

It includes data about calories, heartrate, intensity, sleep, steps, weight and daily activity of the users in CSV format.

Metadata for this data set can be found in Fitabase.

There are no issues regarding credibility since these datasets were generated by respondents to a distributed survey via Amazon Mechanical Turk. However, there might be issues with bias since we don’t know the age and gender of the participants; in other words, the dataset might not be a representative sample of the average Bellabeat’s user.

This is open-data, distributed under CC0 licence (public domain) as it shows in the Kaggle website (Licence section).

The participants don’t share personal information (name, age, gender, address, etc.), so the data is anonymous.

Process

The working tools were defined. The data sets were organized and cleaned (using SQL and Excel).

  • Data sets were organized and cleaned (using Excel).
  • SIR model was solved numerically using RK4 method (using Python).
  • Graphs were made in Power BI.

Before starting the analysis, I had to check the integrity and consistency of the data. I looked for duplicate values, null values, outliers, and made sure the dates were in the same format. To do this, I used SQL (via BigQuery) and Excel. 

Below is the general structure of the commands used for each task.

Null Values

				
					SELECT
  COUNT(*) AS total_rows,
  COUNT(CASE WHEN column_1 IS NULL THEN 1 END) AS null_column1,
  COUNT(CASE WHEN column_2 IS NULL THEN 1 END) AS null_column2,
  ...
  COUNT(CASE WHEN column_n IS NULL THEN 1 END) AS null_columnn
FROM
 "bellabeat-case-study-gdac.FitBit_Data.my-table"
				
			

Duplicate Values

				
					SELECT
  Id,
  column1,
  ...
  COUNT(*)
FROM
  "bellabeat-case-study-gdac.FitBit_Data.my-table"
GROUP BY
    Id, column1, ...
HAVING COUNT(*) > 1;
				
			

Outliers

				
					-- Search for field values1 outside a reasonable range
SELECT 
    *
FROM
 "bellabeat-case-study-gdac.FitBit_Data.my-table"
WHERE 
    field1 < value OR field1 > value
				
			

After the data cleaning process, I made a basic data model (in Power BI) to connect all the tables through the primary key Id, which will allow me to perform the analysis in the next phase.

Limitations

There are multiple limitations with the dataset that must be considered before taking this analysis as useful:

  • It is not known how many of the people in the study are male or female. The analysis may be biased towards a specific gender.
  • Since Bellabeat products are for women, the dataset may not be a representative sample of the average Bellabeat user.
  • Given the low number of participants (33) used to represent the total population (+1,000,000 users), and the unknown number of women in the data, there is up to 22.46% error in the certainty of the analysis.
    The error was calculated based on the sample size, with a confidence level of 99%, sample of 33, population proportion of 50% (considering that the FitBit user can be male or female) and population size of 1,000,000 (Considering the population of users with Bellabeat smart devices).

Analyze

The data was analyzed using Power BI. Trends were identified in 3 groups of people: Athletes, Fitness Enthusiasts and Sedentary People.

This is explained in detail in the following section, “Insights”.

Share

Bellabeat in-app feature recommendations were made based on the data findings.

This is explained in detail in the following section, “Data-driven Recommendations”.

Insights

The users in the dataset were divided into 3 groups based on their daily physical activity: Athletes, Fitness Enthusiasts, and Sedentary People.

Since there is no way to tell if a user is a man or a woman, then we cannot differentiate the set by calories burned throughout the day (for example, a woman who exercises occasionally could be mistaken for a sedentary man due to differences in energy expenditure).

Because of this, it was decided to use the number of minutes where the user had a high activity throughout the day (according to FitBit parameters). Thus we have,

  • Athlete: Over 30 very active minutes.
  • Fitness Enthusiast: Between 5 and 30 very active minutes.
  • Sedentary Person: Less than 5 very active minutes a day.

After grouping users, I decided to define their daily rutine using percentiles. For example, since the average athlete sleeps 7 hours, this means that he or she is asleep for 29% of the day. Because of this, the 29th percentile was used to define the athlete’s routine.

Having these 3 groups of the dataset defined, we have the following insights.

On average, an athlete wakes up at 5:00 AM. Their daytime activity is between 8:00 AM and 7:00 PM. They exercise between 5:00 PM and 7:00 PM. Finally, they go to sleep between 10:00 PM and 5:00 AM.

Average Caloric Intake During the Day: Athlete

2.8K

kcal. burn a day (average)

56

minutes of exercise a day

7

hours asleep a day

A fitness enthusiast wakes up at 6:00 AM. Their daytime activity is between 8:00 AM – 8:00 PM. They exercise at 12:00 PM and between 5:00 PM – 7:00 PM. Finally, they go to sleep between 11:00 PM and 6:00 AM.

Average Caloric Intake During the Day: Fitness Enthusiast

2.2K

kcal. burn a day (average)

13

minutes of exercise a day

6.7

hours asleep a day

On average, a sedentary person wakes up at 7:00 AM. Their daytime activity is between 9:00 AM – 7:00 PM. Finally, they go to sleep between 11:00 PM – 7:00 AM.

Average Caloric Intake During the Day: Sedentary People

1.9K

kcal. burn a day (average)

< 2

minutes of exercise a day

8

hours asleep a day

Trends

Two favorable correlations could be obtained. According to Devore J. (2008), these are moderate correlations.

Daily Sport Minutes vs Distance

0.73

correlation coefficient

The more exercise a person does, the more distance they cover, so we can deduce that their favorite sport is running.

Daily Sport Minutes vs Caloric Burn

0.63

correlation coefficient

The more exercise a person does, the more calories they burn. This was a completely expected behavior.

Data-driven Recommendations

Considering the limitations of the dataset proposed in the capstone, I’d recommend to look for more public data sets that could be used for this business task.

However, let’s do a hypothetical exercise. Suppose this data set was a representative sample of the average Bellabeat user. In that case I would make the following recommendations.

Depending on the user’s daily routine, the app will notify them to refill their water bottle before periods of highest energy consumption so that they are always hydrated.

The app will notify users about the importance of sleep for their health and provide some recommendations to improve sleep quality in case they have irregular sleep periods.

The app will show articles related to each user’s lifestyle to improve their health. For example, sedentary people will be shown some stretches they can do at the office, while athletes will be recommended the most popular routes for runners in their city.

The app will display inspirational messages based on the time they begin their daily activities.

References

  1. Devore, J (2008). Probabilidad y Estadística par Ingeniería y Ciencias (séptima edición). México: Cengage Learning.
  2. Furberg, R., Brinton, J., Keating, M., & Ortiz, A. (2016). Crowd-sourced Fitbit datasets 03.12.2016-05.12.2016. Zenodo.
  3. Möbius (2024). FitBit Fitness Tracker Data. Kaggle.