This is an optional Capstone project in partial fulfillment for the Google Data Analytics certification. this case study follows the six steps of data analysis process: Ask,Prepare,Process,Analyse,Share and Act.
I am a junior data analyst working on the marketing analyst team at Bellabeat, a high-tech manufacturer of health-focused products for woman. Bellabeat has the goal to become a larger player the global smart device market,they intend to do this by analyzing smart device fitness device. I am asked to focus on one of Bellabeat’s products and analyze smart device data to gain insight into how customers are using their smart devices. These insights will help guide marketing strategy for the company.
Analyze data from FitBit fitness tracker, a non-Bellabeat smart product, to gain insights into consumer’s trends. This trends are to be applied to one Bellabeat product to help guide marketing strategy for Bellabeat to grow as a global player.
Primary Stakeholders: Urska Srska, Sando Mur and every member of the executive team.
Secondary Stakeholders: Bellabeat marketing analytics team.
The data used was taken from the following public data set: FitBit Fitness Tracker Data from Mobius. The data is publicly available on Kaggle and stored in 18 csv files.The data collected include Physical activity, Heart rate, Sleep monitoring, Daily activities, Weigh, Calorie burned, Distance and Steps.
A good data set should Reliable,Original,Comprehensive,Current and Cited (ROCCC). The FitBit fitness Tracker Data set does not have the aforementioned qualities.
It is not Reliable because it has the sample size of only 30 respondents, which is not large enough subject to get accurate insights. Moreover, some of respondents could be men, which are not Bellabet’s target customer base.
It is not Original because it is from a third party provider (Amazon Mechanical Turk)
Although the data parameters match most of Bellabeat products parameters, the data set is still not comprehensive. Data such as sex,height,age e.t.c not present in the data set would have been useful to create a more accurate analysis.
The data is not also current. The data is from 2016 (6 years ago), and likely does not reflect current smart device usage, hence it is outdated.
Data is collected from a third party. It is not cited.
I will be using R for the data cleaning, transformation and visualization. The following packages were installed:
install.packages("tidyverse", repos = "http://cran.us.r-project.org")
install.packages("lubridate", repos = "http://cran.us.r-project.org")
install.packages("dplyr", repos = "http://cran.us.r-project.org")
install.packages("ggplot2",repos= "http://cran.us.r-project.org")
install.packages("tidyr",repos = "http://cran.us.r-project.org")
install.packages("here",repos="http://cran.us.r-project.org")
install.packages("skimr", repos = "http://cran.us.r-project.org")
install.packages("janitor", repos = "http://cran.us.r-project.org")
install.packages("readr", repos = "http://cran.us.r-project.org")
install.packages("scales", repos = "http://cran.us.r-project.org")
loading Packages
library(tidyverse)
library(lubridate)
library(dplyr)
library(ggplot2)
library(tidyr)
library(here)
library(skimr)
library(janitor)
library(readr)
library(scales)
I will be using the daily activity data set.
daily_activity <- read_csv("dailyActivity_merged.csv")
## Rows: 940 Columns: 15
## -- Column specification --------------------------------------------------------
## Delimiter: ","
## chr (1): ActivityDate
## dbl (14): Id, TotalSteps, TotalDistance, TrackerDistance, LoggedActivitiesDi...
##
## i Use `spec()` to retrieve the full column specification for this data.
## i Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(daily_activity)
colnames(daily_activity)
glimpse(daily_activity)
str(daily_activity)
Looking for null values
is.null(daily_activity)
## [1] FALSE
Counting the unique IDs to check if they are up to 30.
n_distinct(daily_activity$Id)
## [1] 33
No null or missing values.
Data has 15 columns and 940 rows.
Activity date is classified as Character.
The unique IDs are 33 instead of 30. It means that some users may have more than 1 IDs.
Activitydate is in character format and needs to be converted to datetime type.
daily_activity$ActivityDate=as.POSIXct(daily_activity$ActivityDate, format="%m/%d/%Y", tz=Sys.timezone())
daily_activity$date <- format(daily_activity$ActivityDate, format = "%m/%d/%y")
daily_activity$ActivityDate=as.Date(daily_activity$ActivityDate, format="%m/%d/%Y", tz=Sys.timezone())
daily_activity$date=as.Date(daily_activity$date, format="%m/%d/%Y")
class(daily_activity$ActivityDate)
## [1] "Date"
Create new column ‘Week_day’ using the activitydate as reference.
daily_activity$week_day <- weekdays(daily_activity$ActivityDate)
Create new column ‘Totalmins’ which will be the total minutes users were logged in, it will be gotten by adding very active minutes,fairly active minutes,lightly active minutes,and sedentary minutes.
daily_activity$Totalmins <- daily_activity$VeryActiveMinutes + daily_activity$FairlyActiveMinutes + daily_activity$LightlyActiveMinutes + daily_activity$SedentaryMinutes
Create new column ‘Totalhrs’ by converting Totalmins to hours.
daily_activity$Totalhrs <- daily_activity$Totalmins/60
At this stage I will be performing some calculations to get insights and spot trends from the data. First,I will condense the dataframe into only the columns that i want to use for more focused analysis.
df_activity <- daily_activity %>%
select(Id,week_day,TotalSteps, Calories, VeryActiveMinutes, FairlyActiveMinutes, LightlyActiveMinutes, SedentaryMinutes, Totalmins,Totalhrs)
head(df_activity)
## # A tibble: 6 x 10
## Id week_day TotalSteps Calories VeryActiveMinutes FairlyActiveMinutes
## <dbl> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 1503960366 Tuesday 13162 1985 25 13
## 2 1503960366 Wednesday 10735 1797 21 19
## 3 1503960366 Thursday 10460 1776 30 11
## 4 1503960366 Friday 9762 1745 29 34
## 5 1503960366 Saturday 12669 1863 36 10
## 6 1503960366 Sunday 9705 1728 38 20
## # ... with 4 more variables: LightlyActiveMinutes <dbl>,
## # SedentaryMinutes <dbl>, Totalmins <dbl>, Totalhrs <dbl>
Performing calculations.
summary(df_activity)
## Id week_day TotalSteps Calories
## Min. :1.504e+09 Length:940 Min. : 0 Min. : 0
## 1st Qu.:2.320e+09 Class :character 1st Qu.: 3790 1st Qu.:1828
## Median :4.445e+09 Mode :character Median : 7406 Median :2134
## Mean :4.855e+09 Mean : 7638 Mean :2304
## 3rd Qu.:6.962e+09 3rd Qu.:10727 3rd Qu.:2793
## Max. :8.878e+09 Max. :36019 Max. :4900
## VeryActiveMinutes FairlyActiveMinutes LightlyActiveMinutes SedentaryMinutes
## Min. : 0.00 Min. : 0.00 Min. : 0.0 Min. : 0.0
## 1st Qu.: 0.00 1st Qu.: 0.00 1st Qu.:127.0 1st Qu.: 729.8
## Median : 4.00 Median : 6.00 Median :199.0 Median :1057.5
## Mean : 21.16 Mean : 13.56 Mean :192.8 Mean : 991.2
## 3rd Qu.: 32.00 3rd Qu.: 19.00 3rd Qu.:264.0 3rd Qu.:1229.5
## Max. :210.00 Max. :143.00 Max. :518.0 Max. :1440.0
## Totalmins Totalhrs
## Min. : 2.0 Min. : 0.03333
## 1st Qu.: 989.8 1st Qu.:16.49583
## Median :1440.0 Median :24.00000
## Mean :1218.8 Mean :20.31255
## 3rd Qu.:1440.0 3rd Qu.:24.00000
## Max. :1440.0 Max. :24.00000
The average total steps by the users is 7,638 steps which is not adequate. An average adult female has to aim at least 10,000 steps per day, according to Medical News Today article.
The average minutes for very active is 21.16, for fairly active is 13.56, for lightly active is 192.8 and for sedentary is 991.2. Sedentary users logins makes up for 81% of total average minutes.
Users mostly use the FitBit app to track sedentary activities (81.3%) and not using it for tracking their health habits. better put, it can be said that users are not usually active.
Users tend to track their activities more during weekdays compared to weekends.
On average, the total staps per day for the users was 7,638, which is almost 2,500 steps below the suggested minimum total total steps per day.
The trends found from the FitBit data can very well be applied to Bellabeat customers because just like FitBit, Bellabeat company develops products focused on providing women with their health and fitness data to help them keep track of their habits and make healthier decisions.
Bellabeat marketing team can encourage users by educating and equipping them with knowledge about fitness benefits on the bellabeat app.
Bellabeat could offer incentives for consistent tracking, they could organise in-app competitions between users and offer them points.
Since users rarely login on weekdays, Bellabeat app can prompt notification to encourage and remind users to exercise.
Bellabeat app should have a BMR or TDEE calculator built in, this will help to obtain more data such as sex, age, weight, height, e.t.c for more detailed and accurate analysis.
Thank You