Google Data Analytics Professional Capstone

Kal Woods

2022-10-16

Purpose

Bellabeat is a health-focused smart products company aimed at empowering women with knowledge about their own health and habits. Founded in 2013 with a drive to develop beautifully designed technology for its core market, it has grown rapidly and stood out within its segment.

Bellabeat wants to analyze smart device usage in order to gain insight into how consumers use non-Bellabeat smart devices and get high-level recommendations that can inform marketing strategies.

Data

Publicly available FitBit Fitness Tracker Data from Kaggle obtained via Amazon Mechanical Turk was used for analysis. Participants are identified across the data sets by unique id.

Cleaning and Initial Insights

While there were 18 CSVs within the data set, it was best to limit the analysis to those that contained a large enough number of unique users to provide relevant insight. Some of the CSVs are also simply more granular forms of others, i.e., Minute Intensities vs Hourly Intensities vs Daily Intensities. Those chosen represented:

  • Daily Activity
  • Daily Intensities
  • Daily Sleep

Creation of data frames:

daily_activity <- read.csv("dailyActivity_merged.csv")
daily_intensities <- read.csv("dailyIntensities_merged.csv")
daily_sleep <- read.csv("sleepDay_merged.csv")

Number of unique users per set:

df_list <- Filter(function(x) is(x, "data.frame"), mget(ls()))
lapply(df_list, function(x) {n_distinct(x$Id)})
## $daily_activity
## [1] 33
## 
## $daily_intensities
## [1] 33
## 
## $daily_sleep
## [1] 24

Check for missing data:

lapply(df_list, function(x) {apply(is.na(x), 2, sum)})
## $daily_activity
##                       Id             ActivityDate               TotalSteps 
##                        0                        0                        0 
##            TotalDistance          TrackerDistance LoggedActivitiesDistance 
##                        0                        0                        0 
##       VeryActiveDistance ModeratelyActiveDistance      LightActiveDistance 
##                        0                        0                        0 
##  SedentaryActiveDistance        VeryActiveMinutes      FairlyActiveMinutes 
##                        0                        0                        0 
##     LightlyActiveMinutes         SedentaryMinutes                 Calories 
##                        0                        0                        0 
## 
## $daily_intensities
##                       Id              ActivityDay         SedentaryMinutes 
##                        0                        0                        0 
##     LightlyActiveMinutes      FairlyActiveMinutes        VeryActiveMinutes 
##                        0                        0                        0 
##  SedentaryActiveDistance      LightActiveDistance ModeratelyActiveDistance 
##                        0                        0                        0 
##       VeryActiveDistance 
##                        0 
## 
## $daily_sleep
##                 Id           SleepDay  TotalSleepRecords TotalMinutesAsleep 
##                  0                  0                  0                  0 
##     TotalTimeInBed 
##                  0

Summaries of data sets:

lapply(df_list, function(x) {summary(x)})
## $daily_activity
##        Id            ActivityDate         TotalSteps    TotalDistance   
##  Min.   :1.504e+09   Length:940         Min.   :    0   Min.   : 0.000  
##  1st Qu.:2.320e+09   Class :character   1st Qu.: 3790   1st Qu.: 2.620  
##  Median :4.445e+09   Mode  :character   Median : 7406   Median : 5.245  
##  Mean   :4.855e+09                      Mean   : 7638   Mean   : 5.490  
##  3rd Qu.:6.962e+09                      3rd Qu.:10727   3rd Qu.: 7.713  
##  Max.   :8.878e+09                      Max.   :36019   Max.   :28.030  
##  TrackerDistance  LoggedActivitiesDistance VeryActiveDistance
##  Min.   : 0.000   Min.   :0.0000           Min.   : 0.000    
##  1st Qu.: 2.620   1st Qu.:0.0000           1st Qu.: 0.000    
##  Median : 5.245   Median :0.0000           Median : 0.210    
##  Mean   : 5.475   Mean   :0.1082           Mean   : 1.503    
##  3rd Qu.: 7.710   3rd Qu.:0.0000           3rd Qu.: 2.053    
##  Max.   :28.030   Max.   :4.9421           Max.   :21.920    
##  ModeratelyActiveDistance LightActiveDistance SedentaryActiveDistance
##  Min.   :0.0000           Min.   : 0.000      Min.   :0.000000       
##  1st Qu.:0.0000           1st Qu.: 1.945      1st Qu.:0.000000       
##  Median :0.2400           Median : 3.365      Median :0.000000       
##  Mean   :0.5675           Mean   : 3.341      Mean   :0.001606       
##  3rd Qu.:0.8000           3rd Qu.: 4.782      3rd Qu.:0.000000       
##  Max.   :6.4800           Max.   :10.710      Max.   :0.110000       
##  VeryActiveMinutes FairlyActiveMinutes LightlyActiveMinutes SedentaryMinutes
##  Min.   :  0.00    Min.   :  0.00      Min.   :  0.0        Min.   :   0.0  
##  1st Qu.:  0.00    1st Qu.:  0.00      1st Qu.:127.0        1st Qu.: 729.8  
##  Median :  4.00    Median :  6.00      Median :199.0        Median :1057.5  
##  Mean   : 21.16    Mean   : 13.56      Mean   :192.8        Mean   : 991.2  
##  3rd Qu.: 32.00    3rd Qu.: 19.00      3rd Qu.:264.0        3rd Qu.:1229.5  
##  Max.   :210.00    Max.   :143.00      Max.   :518.0        Max.   :1440.0  
##     Calories   
##  Min.   :   0  
##  1st Qu.:1828  
##  Median :2134  
##  Mean   :2304  
##  3rd Qu.:2793  
##  Max.   :4900  
## 
## $daily_intensities
##        Id            ActivityDay        SedentaryMinutes LightlyActiveMinutes
##  Min.   :1.504e+09   Length:940         Min.   :   0.0   Min.   :  0.0       
##  1st Qu.:2.320e+09   Class :character   1st Qu.: 729.8   1st Qu.:127.0       
##  Median :4.445e+09   Mode  :character   Median :1057.5   Median :199.0       
##  Mean   :4.855e+09                      Mean   : 991.2   Mean   :192.8       
##  3rd Qu.:6.962e+09                      3rd Qu.:1229.5   3rd Qu.:264.0       
##  Max.   :8.878e+09                      Max.   :1440.0   Max.   :518.0       
##  FairlyActiveMinutes VeryActiveMinutes SedentaryActiveDistance
##  Min.   :  0.00      Min.   :  0.00    Min.   :0.000000       
##  1st Qu.:  0.00      1st Qu.:  0.00    1st Qu.:0.000000       
##  Median :  6.00      Median :  4.00    Median :0.000000       
##  Mean   : 13.56      Mean   : 21.16    Mean   :0.001606       
##  3rd Qu.: 19.00      3rd Qu.: 32.00    3rd Qu.:0.000000       
##  Max.   :143.00      Max.   :210.00    Max.   :0.110000       
##  LightActiveDistance ModeratelyActiveDistance VeryActiveDistance
##  Min.   : 0.000      Min.   :0.0000           Min.   : 0.000    
##  1st Qu.: 1.945      1st Qu.:0.0000           1st Qu.: 0.000    
##  Median : 3.365      Median :0.2400           Median : 0.210    
##  Mean   : 3.341      Mean   :0.5675           Mean   : 1.503    
##  3rd Qu.: 4.782      3rd Qu.:0.8000           3rd Qu.: 2.053    
##  Max.   :10.710      Max.   :6.4800           Max.   :21.920    
## 
## $daily_sleep
##        Id              SleepDay         TotalSleepRecords TotalMinutesAsleep
##  Min.   :1.504e+09   Length:413         Min.   :1.000     Min.   : 58.0     
##  1st Qu.:3.977e+09   Class :character   1st Qu.:1.000     1st Qu.:361.0     
##  Median :4.703e+09   Mode  :character   Median :1.000     Median :433.0     
##  Mean   :5.001e+09                      Mean   :1.119     Mean   :419.5     
##  3rd Qu.:6.962e+09                      3rd Qu.:1.000     3rd Qu.:490.0     
##  Max.   :8.792e+09                      Max.   :3.000     Max.   :796.0     
##  TotalTimeInBed 
##  Min.   : 61.0  
##  1st Qu.:403.0  
##  Median :463.0  
##  Mean   :458.6  
##  3rd Qu.:526.0  
##  Max.   :961.0

Setting up certain summary information as tables offers an easier to read format and will make for better exporting to a simplified presentation.

measure value
Average Daily Steps 7637.91
Average Daily Miles 5.49
Average Logged Activities Miles 0.11
Average Daily Calories Burned 2303.61

One thing that stands out here and has been highlighted is the number of calories burned. The recommended daily allowance for calories is 2000. If as seen here on average most users were burning more than that allotment the tracker may have indeed been very effective in keeping users aware of their calorie use and helped them in burning more per day than they consumed, assuming a balanced diet.

Seeing that so many participants met or exceeded the 2000 calorie mark it may be worthwhile to allow users to share when they hit milestones–not directly with calorie-to-calorie comparisons, but with friendly notifications of how many people in their circle have hit their daily, weekly, or monthly goals.

measure value
Average Sedentary Minutes 991.21
Average Light Active Minutes 192.81
Average Moderately Active Minutes 13.56
Average Very Active Minutes 21.16
Average Light Active Distance 3.34
Average Moderately Active Distance 0.57
Average Very Active Distance 0.57
measure value
Average Daily Sleep Sessions 1.12
Average Daily Sleep Minutes 419.47
Average Daily Sleep Hours 6.99
Average Daily Minutes In Bed 458.64
Average Daily Hours In Bed 7.64

Summary of Tables

On average participants spent:

  • 3.2 hours per day engaged in light activity
  • 15 minutes per day on moderate activity and 20 minutes per day engaged in very heavy activity
  • 16 hours per day sedentary, including 7.64 hours total time in bed with 6.99 of those hours coming as daily sleep.

Other findings:

  • Participants didn’t appear to take advantage of the Logged Activities feature. It may be worth investigating whether potential users don’t desire such a feature or simply need to be made aware of it.
  • Average Daily Sleep Sessions may indicate that few participants have time for or choose to nap daily. Helping to encourage a regular sleep pattern through Bellabeat wearables and related apps could be a useful strategy to test.

Additional Points of Interest

When are users most active? This question could provide useful facets of information about potential customers who would be interested in fitness trackers.

Average Intensity peaks from 5-7PM with a lower peak from 12-2PM. Perhaps combining this type of data further with personas derived from customer profiles that align with others in their segment who achieved measurable results could provide positive outcomes for new users. Data like this could easily be updated daily or weekly for wearers to help them see how they’re performing and if they’re on track for goals. This would also help customers see if they’re keeping a regular schedule from day to day or week to week to sustain their momentum.

Do those who are more active burn more calories regardless of intensity level? Health professionals and fitness instructors recommend that people should be active daily even if that activity isn’t strenuous. Perhaps the tracker is able to show the simple relationship between more movement of any kind, including daily steps taken, having a positive effect on calories burned.

Let’s look at the correlation coefficients of different tests, then a scatter plot of calories burned against steps taken.

cor.test(daily_activity$TotalSteps, daily_activity$Calories, method = "pearson")
## 
##  Pearson's product-moment correlation
## 
## data:  daily_activity$TotalSteps and daily_activity$Calories
## t = 22.472, df = 938, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.5483688 0.6316184
## sample estimates:
##       cor 
## 0.5915681
cor.test(daily_activity$TotalSteps, daily_activity$Calories, method = "kendall")
## 
##  Kendall's rank correlation tau
## 
## data:  daily_activity$TotalSteps and daily_activity$Calories
## z = 18.179, p-value < 2.2e-16
## alternative hypothesis: true tau is not equal to 0
## sample estimates:
##       tau 
## 0.3974441
cor.test(daily_activity$TotalSteps, daily_activity$Calories, method = "spearman")
## 
##  Spearman's rank correlation rho
## 
## data:  daily_activity$TotalSteps and daily_activity$Calories
## S = 61010776, p-value < 2.2e-16
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##       rho 
## 0.5592679

All three tests for correlation as well as the scatter plot with a logistic regression line of best fit show a positive relationship between the number of steps taken and total calories burned. P-values well below 0.05 indicate statistical significance. One would expect that someone who simply takes more steps per day would be likely show more total calories burned than someone who took fewer. Simple is a great starting point when it comes to getting people to take up and stay consistent with a daily activity routine.

Recommendations

Bellabeat can combine the above findings with their own user-informed research to develop a marketing strategy focused on their target audience of female customers. Collecting similar data that provides better insights for their users could be useful to offer feedback that actively engages each individual.

Specific Recommendations Summary:

  • Create ad campaigns as well as device & app-related designs that help create awareness of core features users might not otherwise note. For instance, display information on activity tracking and offer optional “get on the move”/sedentary duration reminders during device setup. Conduct A/B tests to see if such campaigns & feature awareness provide better results and more consistent device use.
  • Consider surveying customers about their sleep habits and trial interest and engagement with “sleep companion” features. Present napping recommendation for users whose activity patterns suggest the need.
  • Highlight users’ most active hours to further drive engagement and help them establish activity routines that align with their goals.
  • Provide shared highlights via available devices & apps. Create features that encourage better together ideals of moving forward as a group focused on common goals and milestones without a scoreboard mentality.