Joe Yong

May 24, 2022

Google Data Analytics: Case Study 2 (Using RStudio)

Hello again! This will be another documentation of how I approached the 2nd case study within the Google Data Analytics Professional Certificate on Coursera.

As usual, I will be showcasing my understanding of the data analysis process which is: Ask, Prepare, Process, Analyze, Share & Act.

I will be using RStudio & Tableau in this article. I will as well, publish a separate article as to how I used BigQuery SQL for the data processing phase.

You are a junior data analyst working on the marketing analyst team at Bellabeat, a high-tech manufacturer of health-focused products for women. Urška Sršen (aliased as U.S, the cofounder & Chief Creative Officer) believes that analyzing smart device fitness data could help unlock new growth opportunities for the company.

You have been asked to focus on one of Bellabeat’s products and analyze smart device data to gain insight into how consumers are using their smart devices which will help guide the marketing strategy for the company.

About the company

Bellabeat is a high-tech company that manufactures health-focused smart products which are beautifully developed by U.S that informs and inspires women around the world regarding their activity, sleep, stress & reproductive health.

Bellabeat has also invested in traditional advertising media, such as radio, out-of-home billboards, print, and television but focuses on digital marketing extensively such as investing in Google Search, maintaining active Facebook & Instagram pages, consistently engaging consumers on Twitter, running video ads on Youtube and displays ads on the Google Display Network to support campaigns around key marketing dates.

U.S knows that an analysis of Bellabeat’s available consumer data would reveal more opportunities for worth. She has asked the marketing analytics team to focus on a Bellabeat product and analyze smart device usage data in order to gain insight into how people are using their smart devices. Using this information, she would like high-level recommendations for how these trends can inform Bellabeat marketing strategy.

U.S asks you to analyze smart device usage data in order to gain insight into how consumers use non-Bellabeat smart devices. She wants you to select one Bellabeat product to apply these insights to your presentation. These questions will guide your analysis:

We will be using Fitbit Fitness Tracker Data ( here ) which was made available by Mobius ( here ) that contains personal fitness tracker from thirty Fitbit users who consented to the submission of personal tracker data, which includes minute-level output for:

It also includes information about daily activity, steps, and heart rate that can be used to explore user habits.

Since the files are already grouped in a folder, there’s no need to organize them. The names of the files are also fairly easy to recognize given the context of the data, so we will not be modifying them as well.

Lets load install and load the necessary packages required for this process which would be: Tidyverse, Janitor, Lubridate & Skimr

Disclaimed: Sentences followed after # are comments/explanation and are not lines of code

After this, we would need to import the datasets into RStudio using read.csv(). I will also be making slight name changes as well

A disclaimer: my csv’s were located inside the folder “Fitbit Data” and my working directory is in “Google Case Study 2” to separate the file types as shown below:

Let's inspect our data to see if there are any errors with formatting by using str()

and we would get the following output:

After a brief view of the output, there are a few issues that we need to address:

To clean the column names, we would use clean_names()

Lets also format daily_activity$ActivityDate, daily_sleep$SleepDay, weight_log$Date into the proper date format. using as.Date() & as.POSIXct()

format(as.Date()) & format(as.POSIXct()) is giving me errors which is why I use as.Date() & as.POSIXct() instead

For weight_log$date, it's a little tricky because if you look closely, there’s the PM indicator at the end. POSIX.ct does not recognize this and will return all values as NA, so we will need to use parse_date_time from Lubridate.

And to format weight_log$is_manual_report to a logical format, we will use as.logical()

After a quick look at our current data, let's add a day of the week, sedentary hours & total active hours column for further analysis in daily_activity. I will not be adding a month column since the dataset only provides information collected within a month.

Let's also add new columns which convert the current minutes of collection to hours and round it using round() in daily_sleep. I will also be adding a column to indicate the time taken to fall asleep in daily_sleep as well.

We will also be removing weight_log$fat, as it has little to no context and would not be helpful during the analysis phase by using select(-c())

Lastly, I will also be adding a new column in weight_log called bmi2 which will indicate whether the user is underweight, healthy, or overweight by using a line of code I recently learned about which is case_when !

A̶n̶a̶l̶y̶z̶e̶

Before we move onto the phase where we actually start to analyze the dataframe, we need to remove any outliers from the data.

In this case, let's remove rows in which the total_active_hours & calories burned are 0. The reasoning behind this is that we’re using data collected from Fitbits, which are wearables. If they don’t wear their smart devices it doesn’t collect information, hence we will remove the clutter from the data frame. Users might have also disabled GPS/accelerometer functions that allow for the collection of steps taken.

If you’re using an external visualization tool such as Tableau or PowerBI, we need to export our dataframe using

Analyze & Share (RStudio)

I will be using ggplot for this section of the analysis phase. I will also be including another section in which I used Tableau instead.

As per usual, let's revisit our business task to ensure we are not plotting or trying to hypothesize information/relationships which will not help in solving the business task which are:

After having a brief view of the current data, I will be plotting a few observations revolving around:

Let's have a quick look at the average steps taken, sedentary hours, very active minutes & total hours of sleep using summary() .

With a brief view of the outputs above:

Now let's have a look at which days are users most active:

which produces the following:

As we can see, the most active days for the Fitbit users were on Sunday, with a slow decline throughout the week. This could be due to motivation levels being fairly high during the end of the week.

Next, let's investigate the relationship between total active hours, total steps taken, and sedentary hours against calories burned by using the following:

Which produces the following:

At a glance, we can tell that there is a positive correlation between calories burned and total steps taken/total active hours. However, in the last chart, we can see that the correlation is confusing.

I was expecting an inverse relationship with the first 2 charts however I was wrong. The relationship between sedentary hours and calories burned was fairly positive up till about the 17-hour mark.

For the relationship between weight & physical activity we would use:

Which would produce:

From the chart above, we can infer that users weighing around 60kg & 85kg are the most active.

As a disclaimer, I will not be using/reccomending violin charts as they often communicate information differently than what we think, further explanation in the Tableau viz.

We will carry out descriptive analysis to observe how many overweight & healthy users by using the following

Out of the 30 users, only 8 submitted their responses regarding weight. 5 users are overweight and only 3 are within the healthy BMI range of 18.5–24.9

Analyze & Share (Tableau)

Here are the visualizations I've made from Tableau. My findings are shown below:

Above are the distributions of the selected variables. As shown:

Something to take note of, these are a collection of information collected throughout a month, which is then grouped by the day .

These are the visualizations to find out the activity of users in order to identify which days they spend the most time being active.

Here we can see that they spend a lot of time engaged in physical activity starting from Sunday, which then slowly trails lower and lower. This could be due to the fact that motivation levels were higher on the weekends.

Here we can see a positive trend with the first 2 charts, which indicates that the more time you spend engaged in physical activity, the more calories you tend to burn.

For the last chart, I was expecting an inverse relationship with the first 2 charts. However I was proven wrong, the data speaks for itself.

As a disclaimer, this only displays the relationship between 2 variables . We do not have height data, which means we cannot calculate BMR hence we cannot claim that walking x steps burns x calories, and can only hypothesize that walking more steps burns more calories. I suspect that the calories column, is calories burned THROUGHOUT the day which would be TDEE. I’ve come to this conclusion because to burn 3000 calories, you would need to walk an equivalent of 100k steps. More information here
The thicker the lines, the more recorded counts of activity As you can see here, while the 2 violin charts are plotted differently, It is, in fact, exactly the same over here. Violin charts often “smooth” the distribution of data to make it look more pleasing to the eye. The width of the violin plot doesn’t always equate to a bigger count, in fact, it will often mean that there is a “wider” distribution (min max).

As we can see from the 2 charts above, the most active users are within the 50kg–85kg. We also see a sharp decline in activity (physically and in count) for users over 90kg.

In the last chart, we have the BMI of users. Out of the 30 users, only 8 submitted their weight records of which 5 of them are overweight and only 3 have a healthy BMI.

In the previous section of Analyze & Share, we have covered the 1st and 2nd business task which are:

Based on my findings after my analysis, I would like to share my hypothesis on this matter.

Motivation levels & free time are higher on the weekends, which would provide an opportunity for users to sneak in a workout. As work load decreases, a window of opportunity to exercise would present itself in the midweek (Thursdays) We see an alltime low of recorded activity on Friday’s due to the possibility of social engagement with friends/coworkers after working hours.

Now to answer the final business task, I would like to share my recommendations based on my findings to help influence Bellabeat’s marketing strategy.

Next, I would provide some general recommendations to further improve Bellabeat’s products:

Additional remarks:

Authors note: That concludes the case study! I hope i have helped shed some light as to how to approach this case study, as it was a real challenge for me! Figuring out an answer for the marketing question was real tough for me!

More from Joe Yong

Ex cafe manager, looking to start a career as a data analyst!

About Help Terms Privacy

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store

Text to speech

Twitter Facebook Google+

Or copy & paste this link into an email or IM:

google data analytics capstone case study 2

Bellabeat Case Study - Google Data Analytics Capstone

emily1618/Google-Data-Analytics-Bellabeat-Case-Study

Name already in use.

Use Git or checkout with SVN using the web URL.

Work fast with our official CLI. Learn more .

Sign In Required

Please sign in to use Codespaces.

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching Xcode

If nothing happens, download Xcode and try again.

Launching Visual Studio Code

Your codespace will open once ready.

There was a problem preparing your codespace, please try again.

Latest commit

@emily1618

CASE STUDY: Bellabeat Fitness Data Analysis

Author: emi ly, date: october 5, 2021, tableau dashboard, tableau story presentation to skateholders.

The case study follows the six step data analysis process:

Since it was founded in 2013, Bellabeat has grown rapidly and quickly positioned itself as a tech-driven wellness company for women. The company has 5 focus products: bellabeat app, leaf, time, spring and bellabeat membership. Bellabeat is a successful small company, but they have the potential to become a larger player in the global smart device market. Our team have been asked to analyze smart device data to gain insight into how consumers are using their smart devices. The insights we discover will then help guide marketing strategy for the company.

💡 BUSINESS TASK: Analyze Fitbit data to gain insight and help guide marketing strategy for Bellabeat to grow as a global player.

Primary stakeholders: Urška Sršen and Sando Mur, executive team members.

Secondary stakeholders: Bellabeat marketing analytics team.

Data Source: 30 participants FitBit Fitness Tracker Data from Mobius: https://www.kaggle.com/arashnic/fitbit

The dataset has 18 CSV. The data also follow a ROCCC approach:

⛔ The dataset has limitations:

Back to Top

Examine the data, check for NA, and remove duplicates for three main tables: daily_activity, sleep_day and weight:

Convert ActivityDate into date format and add a column for day of the week:

Check to see if we have 30 users using n_distinct() . The dataset has 33 user data from daily activity, 24 from sleep and only 8 from weight. If there is a discrepency such as in the weight table, check to see how the data are recorded. The way the user record the data may give you insight on why there is missing data.

Additional insight to be awared of is how often user record their data. We can see from the ggplot() bar graph that the data are greatest from Tuesday to Thursday. We need to investigate the data recording distribution. Monday and Friday are both weekdays, why isn't the data recordings as much as the other weekdays?

image

⛔ From weekday's total asleep minutes, we can see the graph look almost same as the graph above! We can confirmed that most sleep data is also recorded during Tuesday to Thursday. This raised a question "how comprehensive are the data to form an accurate analysis?"

image

Merge the three tables:

Clean the data to prepare for analysis in 4. Analyze!

Check min, max, mean, median and any outliers. Avg weight is 135 pounds with BMI of 24 and burn 2050 calories. Avg steps is 10200, max is almost triple that 36000 steps. Users spend on avg 12 hours a day in sedentary minutes, 4 hours lightly active, only half hour in fairly+very active! Users also gets about 7 hour of sleep.

summary

Active Minutes:

Back to Analyze

Percentage of active minutes in the four categories: very active, fairly active, lightly active and sedentary. From the pie chart, we can see that most users spent 81.3% of their daily activity in sedentary minutes and only 1.74% in very active minutes.

newplot

The American Heart Association and World Health Organization recommend at least 150 minutes of moderate-intensity activity or 75 minutes of vigorous activity, or a combination of both, each week. That means it needs an daily goal of 21.4 minutes of FairlyActiveMinutes or 10.7 minutes of VeryActiveMinutes.

In our dataset, 30 users met fairly active minutes or very active minutes.

Noticeable Day:

The bar graph shows that there is a jump on Saturday: user spent LESS time in sedentary minutes and take MORE steps. Users are out and about on Saturday.

image

Total Steps:

Let's look at how active the users are per hourly in total steps. From 5PM to 7PM the users take the most steps.

image

How active the users are weekly in total steps. Tuesday and Saturdays the users take the most steps.

image

Interesting Finds:

The more active that you're, the more steps you take, and the more calories you will burn. This is an obvious fact, but we can still look into the data to find any interesting. Here we see that some users who are sedentary, take minimal steps, but still able to burn over 1500 to 2500 calories compare to users who are more active, take more steps, but still burn similar calories.

image

Comparing the four active levels to the total steps, we see most data is concentrated on users who take about 5000 to 15000 steps a day. These users spent an average between 8 to 13 hours in sedentary, 5 hours in lightly active, and 1 to 2 hour for fairly and very active.

image

According to this healthline.com article , moderately active woman between the ages of 26–50 needs to eat about 2,000 calories per day and moderately active man between the ages of 26–45 needs 2,600 calories per day to maintain his weight. Comparing the four active levels to the calories, we see most data is concentrated on users who burn 2000 to 3000 calories a day. These users also spent an average between 8 to 13 hours in sedentary, 5 hours in lightly active, and 1 to 2 hour for fairly and very active. Additionally, we see that the sedentary line is leveling off toward the end while fairly + very active line is curing back up. This indicate that the users who burn more calories spend less time in sedentary, more time in fairly + active.

image

According to article: Fitbit Sleep Study , 55 minutes are spent awake in bed before going to sleep. We have 13 users in our dataset spend 55 minutes awake before alseep.

We can use regression analysis look at the variables and correlation. For R-squared, 0% indicates that the model explains none of the variability of the response data around its mean. Higher % indicates that the model explains more of the variability of the response data around its mean. Postive slope means variables increase/decrease with each other, and negative means one variable go up and the other go down. We want to look at if users who spend more time in sedentary minutes spend more time sleeping as well. We can use regression analysis lm() to check for the dependent and indepedent variables. We also find that how many minutes an user asleep have an very weak correlation with how long they spend in sedentary minutes during the day.

calvssteps2

How about calories vs asleep? Do people sleep more burn less calories? Plotting the two variables we can see that there is not much a correlation.

image

🎨 Bellabeat Data Analysis Dashboard

dashboard-bella

🎨 Bellabeat Data Presentation in Tableau

present

Conclusion based on our analysis:

Marketing recommendations to expand globally:

🔢 Obtain more data for an accurate analysis, encouraging users to use a wifi-connected scale instead of manual weight entries.

🚲 educational healthy style campaign encourages users to have short active exercises during the week, longer during the weekends, especially on sunday where we see the lowest steps and most sedentary minutes., 🎁 educational healthy style campaign can pair with a point-award incentive system. users completing the whole week's exercise will receive bellabeat points on products/memberships., 🏃‍♂️ the product, such as leaf wellness tracker, can beat or vibrate after a prolonged period of sedentary minutes, signaling the user it's time to get active similarly, it can also remind the user it's time to sleep after sensing a prolonged awake time in bed..

Google Data Analytics Capstone Project

Updated: Feb 22

I worked on the Google Data Analytics Capstone Project, Track 1, Case Study 1. I will be diving into the background, my full process of cleaning, analyzing and visualizing the data, along with my final suggestions and summary of the data.

Quick Links :

Tableau Dashboard | Github R Code for Analysis | Github R Code for Tableau Visualization | LinkedIn Post

Below is a table of contents in case you want to go to a specific section.

Table of Contents:

Microsoft excel.

Finished Project

Summary of Data

Business Suggestions

What I Learned

Cyclistic is a bike sharing program which features more than 5,800 bikes and 600 docking stations. It offers reclining bikes, hand tricycles, and cargo bikes, making it more inclusive to people with disabilities and riders who can't use a standard two-wheeled bike. It was founded in 2016 and has grown tremendously into a fleet of bicycles that are geotracked and locked into a network of 692 stations across Chicago. The bikes can be unlocked from one station and returned to any other station in the system anytime.

Previously, Cyclistic's marketing strategy tried to build the general awareness and appeal to broad consumers. It has flexible pricing plans: single-ride passes, full-day passes, and annual memberships. Those who purchase single-ride or full-day passes are referred to as casual riders while those who purchase annual memberships are Cyclistic members .

My Role : In this scenario I am a junior data analyst at Cyclistic and my team has been tasked with the overall goal (see below) of designing marketing strategies

Overall Goal : Design marketing strategies aimed at converting casual riders into annual members.

Business Question : "How do annual members and casual riders use Cyclistic bikes differently?"

Below I will describe step-by-step the process I used to for this project. If you want to skip ahead to the business suggestions move onto the section "Insights".

Overview : I first analyzed the data separately (each month) in Excel, then used R to analyze the data as a whole (one year). Finally I created a dashboard in Tableau and used Figma to support the design elements.

I initially wanted to gather and analyze my data in Excel because it was the tool I was most familiar with and I could get a general understanding of the data quicker. I did not combine all of the spreadsheets into one because that would've taken more processing power than my computer had.

I began downloading the data from divvy-tripdata , and turning the .csv files into excel spreadsheets. I downloaded the most recent year of data which was at the time of starting my project:

August 2020

September 2020

October 2020

November 2020

December 2020

January 2021

February 2021

Added two columns to all of the months:

ride_length calculated the total ride length for each trip using the start_at column which was: ending time minus starting time.

day_of_week calculated the day of the week for each trip using the start_at column date.

Went over the business task and the information I had at hand and how that could be used to figure out how members and casual riders use the bike service differently

Came up with metrics to look at such as :

total number of rides per hour, per day of the month, per season, per day of the week, and for different bike types

Average ride length between members and casual

For every month in Excel created pivot tables and charts to go with the analysis on (this took the longest):

Total Rides per Weekday - calculated the total rides for members and casual and separated it by day of the week; used a cluster column chart

Average Ride Length - calculated the average ride length for members and casual and separated it by day of the week; used a cluster column chart

Total Rides per Hour - calculated the total rides for members and casual separated by the time of the day (24hr); used a line comparison chart

Total Rides per Day - calculated the total rides for members and casual separated by the day of the month; used a line comparison chart

Total Rides per Bike Type - calculated the total rides for members and casual separated by Bike type; used stacked column chart

I also created a Google docs Notes list where I wrote down the exact steps for each month (had a checklist) and included my insights for each month

Time Spent:

535 minutes or just under 9 hours to complete.

I originally wanted to use SQL but the files were too big to upload and I couldn't figure out how to utilize Google Cloud Platform. Instead I used R to analyze the data because it could handle all of the information quicker than Excel, and I wanted to work on my R skills. Below is my general process in R, I didn't include my mistakes/missteps or errors for the sake of brevity. If you are interested in my full process including my mistakes, you can email me at: [email protected] and I would be happy to discuss it.

View my full code on my Github for this capstone project here .

Load all of the libraries I used: tidyverse, lubridate, hms, data.table

Uploaded all of the original data from the data source divytrip into R using read_csv function to upload all individual csv files and save them in separate data frames. For august 2020 data I saved it into aug08_df, september 2020 to sep09_df and so on.

Merged the 12 months of data together using rbind to create a one year view

Created a new data frame called cyclistic_date that would contain all of my new columns

Created new columns for:

Ride Length - did this by subtracting end_at time from start_at time

Day of the Week

Time - convert the time to HH:MM:SS format

Season - Spring, Summer, Winter or Fall

Time of Day - Night, Morning, Afternoon or Evening

Cleaned the data by:

Removing duplicate rows

Remove rows with NA values (blank rows)

Remove where ride_length is 0 or negative (ride_length should be a positive number)

Remove unnecessary columns: ride_id, start_station_id, end_station_id, start_lat, start_long, end_lat, end_lng

Calculated Total Rides for:

Total number of rides which was just the row count = 4,152,139

Member type - casual riders vs. annual members

Type of Bike - classic vs docked vs electric; separated by member type and total rides for each bike type

Hour - separated by member type and total rides for each hour in a day

Time of Day - separated by member type and total rides for each time of day (morning, afternoon, evening, night)

Day of the Week - separated by member type and total rides for each day of the week

Day of the Month - separated by member type and total rides for each day of the month

Month - separated by member type and total rides for each month

Season - separated by member type and total rides for each season (spring, summer, fall, winter)

Calculated Average Ride Length for:

Total average ride length

Type of Bike - separated by member type and average ride length for each bike type

Hour - separated by member type and average ride length for each hour in a day

Time of Day - separated by member type and average ride length for each time of day (morning, afternoon, evening, night)

Day of the Week - separated by member type and average ride length for each day of the week

Day of the Month - separated by member type and average ride length for each day of the month

Month - separated by member type and average ride length for each month

Season - separated by member type and average ride lengths for each season (spring, summer, fall, winter)

Then using all of this data I created my own summary in my case notes and took note of the: total rides for each variable, average ride lengths for each variable, and the difference between members versus casual riders. I originally wanted to create a report using R Markdown as well but for the sake of time (I had already spent over 20 hours on the project so far), I decided to skip this step, and write this article instead.

1045 minutes or about 17 and a half hours to complete.

While I learned the basics of Tableau in the Google Course I wanted more practice with visualizing data and creating dashboards.

To view my completed dashboard click here .

I created a separate R code (you can view it here on Github) that made some changes for specifically the Tableau portion.

For ride length I rounded the digits by 1, meaning my numbers were 29.8 or 12.5.

Revised how I created my "month" column. I used mutate() to create a column that had the month in ___ format and not number format. So instead of 01 it would say "January"

Cleaned the data: removed rows with NA values, removed duplicate rows, removed where ride_length was 0 or negative and removed unnecessary columns like: ride_id, start_station_id, end_station_id, start_lat, start_long, end_lat, end_lng

Created a new dataframe with this information so I could test the difference between the original data frame (cyclistic_date) that I used for my analysis and the data frame I would use for Tableau (cyclistic_tableau).

In this new data frame I removed more columns to make calculations quicker in Tableau. I removed: start_station_name, end_station_name, time, started_at, ended_at

Downloaded this data frame into a .csv file which I uploaded to Tableau

Created graphs similar to those I created in Excel but added a few:

Total Rides by Bike Type

Ride Length by Weekday

Total Rides by Weekday

Total rides by hour, total rides by month.

Then I created a basic dashboard with all of that information, a prototype for me to view while I was creating the final dashboard ( Figure 1 below).

Created a prototype mockup in Figma, added in values like average ride length, busiest month, season, hour, time of day, and most used bike type

Created a final version of the mockup in Figma

Edited Dashboard in Tableau to reflect design in Figma

Edited graphs in Tableau

Made bar graphs round

Added annotations

Highlights to specific important notes

Got rid of labels for visual purposes

Combined Figma and Tableau (used dashboard created in Figma as the background for my Tableau Dashboard) to create a final prototype ( Figure 2 below)

Made minor edits to design elements and created final dashboard (see Finished Project below)

765 minutes or almost 13 hours to complete.

Prototype of my dashboard for my google capstone project

I am including the other tools I used.

Figma to create my background and help develop the dashboard aesthetics.

Google Docs helped me keep track of all of my documents for this project like:

Date Log - I wrote down what I did that day related to my project

Resources - A list of resources I frequently used

Case Notes - Notes for the case study including the final insights, what I was looking for, and anything else having to do with the case

Evernote to draft this article before I uploaded it here.

FINISHED PROJECT

Here is my finished project: Google Capstone Project | Cyclistic . You can view the links to my R code on Github used for analysis here and the code for Tableau here .

Final dashboard for capstone project

SUMMARY OF DATA

Those who purchase single-ride or full-day passes are referred to as casual riders while those who purchase annual memberships are Cyclistic members .

Total Rides by User Type

Average Ride Length per User Type

Average Ride per Weekday

Members had more rides with 2,328,763 total rides or 56% and casual riders had 1,823,376 total rides or 43%.

Total Rides by Rider Type Pie chart

Total Rides per Bike Type

Both casual riders and members used the classic bike the most with 1,777,593 rides or 43% of total rides, followed by docked bikes with 1,545,936 rides or 37% of total rides, and lastly with electric bikes at 828,610 rides or 20% of total rides.

Total Rides per Bike Type - bar chart

Average Ride Length by User Type

The total average ride length was 24 minutes. For casual riders it was longer at 27 minutes while members was 14 minutes.

Average ride length by rider type

Average Ride Length per Weekday

For the average ride length per weekday both casual riders and members had an increase in the average ride length on the weekends. For both Sunday was the longest at 31 minutes.

average ride length per weekday - bar chart

Saturday was the most popular weekday combining casual riders and member rides with 784,239 rides or 19% of total rides. But for member rides only Wednesday was the most popular day with 356,060 rides, 5,407 rides more than Saturday.

Total rides by weekday - bar chart

5PM or 17:00 was the busiest hour for both members and casual riders with 426,685 rides or 10% of the total rides. Typically rides began increasing in the morning at 6AM and rose until 5PM then dropped afterwards. The afternoon was the busiest for both rider types with 1,905,797 rides or 45% of total rides. 4AM was the least popular hour.

Total rides by hour

July was the busiest month combining casual riders and member rides at 691,476 rides or 16% of total rides. While summer was the most popular season for both at 1,903,446 rides or 46% of total rides. Looking at just members August is actually the busiest month with 323,140 rides, 816 rides more than July. Winter is the least popular season and February is the least popular month.

Total bike rides per month - bar chart

Final Summary

The most popular bike among with riders was the classic.

Busiest time was afternoon and the peak time was at 5PM for both casual riders and members.

Busiest weekday was Saturday, casual riders used the service the most on the weekends.

Busiest season was Summer for both types of riders.

Most rides by User Type was members but casual riders weren't far behind.

The average ride length was 24 minutes but casual riders on average rode 23 minutes longer than members.

BUSINESS SUGGESTIONS

This was the hardest part for me for the whole project. I have never provided suggestions for a business nor worked in marketing. Any feedback here would be appreciated.

These are my suggestions for the marketing team to convert casual riders to annual members:

Personalize discounts and show perks in the membership program based on their preferences and riding habits.

Emphasize the benefits of memberships, including discounts during busy times of the year like during Summer, or on the weekends.

Have existing members to share their stories about how using Cyclistic's system has changed their life, to create a sense of community, offer a discount if they do so this will help encourage new riders to join the program.

WHAT I LEARNED

Below is what I learned/practiced from over 40 hours spent on this project:

Pivot Tables in Microsoft Excel

Practice using R for data analysis and cleaning specifically using the tidyverse package for data analysis

Graphs in Tableau, edited visual elements along with creating different charts and filters.

Design elements of an effective dashboard

Combining the design feature of Figma with the functionality of Tableau

R portion of my project I found Itamar's case study on Kaggle using R as well, a helpful resource.

Tableau portion I used Navneet Singh's Tableau Dashboard as inspiration.

A spotlight on uOttawa's bullying problem

Google Data Analytics Professional Certificate Course 8: Capstone – quiz answers

Coursera Google Data Analytics Professional Certificate Course 8 – Google Data Analytics Capstone: Complete a Case Study quiz answers (weeks 1 – 4):

You may also be interested in  Google Data Analytics Professional Certificate Course 1: Foundations – Cliffs Notes .

Course 8 – Google Data Analytics Capstone: Complete a Case Study

This course is the eighth course in the Google Data Analytics Certificate. You’ll have the opportunity to complete an optional case study, which will help prepare you for the data analytics job hunt. Case studies are commonly used by employers to assess analytical skills. For your case study, you’ll choose an analytics-based scenario. You’ll then ask questions, prepare, process, analyze, visualize and act on the data from the scenario. You’ll also learn other useful job hunt skills through videos with common interview questions and responses, helpful materials to build a portfolio online, and more. Current Google data analysts will continue to instruct and provide you with hands-on ways to accomplish common data analyst tasks with the best tools and resources.

Course content

Professional case studies

Fill in the blank: A _ is a collection of case studies that you can share with potential employers.

A portfolio is a collection of case studies that you can share with potential employers. A capstone is a final project that brings everything you’ve learned together.

Which of the following are important strategies when completing a case study? Select all that apply.

When completing a case study, it’s important to answer the question being asked. It’s also important to communicate the steps you’ve taken to reach your conclusion and the assumptions you made about the data.

To successfully complete a case study, your answer to the question the case study asks has to be perfect.

To successfully complete a case study, your answer to the question the case study asks does not have to be perfect. It’s more important to show off your thought process so that the interviewers can understand how you approach the problem.

Which of the following are qualities of the best portfolios for a junior data analyst? Select all that apply.

The best portfolios are personal, unique, and simple. Your portfolio’s a chance to show people who you are and what you’re interested in. You want to keep your portfolio pretty simple, and focus on your skills as a data analyst.

Which of the following are places where you can store and share your portfolio? Select all that apply.

Portfolios can be stored and shared on public websites, including Github, Kaggle and Tableau, or on your personal website.

Case Study 1: How Does a Bike-Share Navigate Speedy Success?

Case Study 2: How Can a Wellness Technology Company Play It Smart?

Case Study 3: Follow Your Own Case Study Path

Your portfolio and case study checklist

Related content

Basic Statistics Mini-Course

Google Data Analytics Professional Certificate Course 1: Foundations – Cliffs Notes

Google Data Analytics Professional Certificate Course 2: Ask Questions – quiz answers

Google Data Analytics Professional Certificate Course 3: Prepare Data – quiz answers

Google Data Analytics Professional Certificate Course 4: Process Data – quiz answers

Google Data Analytics Professional Certificate Course 5: Analyze Data – quiz answers

Google Data Analytics Professional Certificate Course 6: Share Data – quiz answers

Google Data Analytics Professional Certificate Course 7: Data Analysis with R – quiz answers

IT career paths – everything you need to know

Back to  DTI Courses

Other content

1st Annual University of Ottawa Supervisor Bullying ESG Business Risk Assessment Briefing

Disgraced uOttawa President Jacques Frémont ignores bullying problem

How to end supervisor bullying at uOttawa

PhD in DTI uOttawa program review

Rocci Luppicini – Supervisor bullying at uOttawa case updates

The case for policy reform: Tyranny

The trouble with uOttawa Prof. A. Vellino

The ugly truth about uOttawa Prof. Liam Peyton

uOttawa engineering supervisor bullying scandal

uOttawa President Jacques Frémont ignores university bullying problem

uOttawa Prof. Liam Peyton denies academic support to postdoc

Updated uOttawa policies and regulations: A power grab

What you must know about uOttawa Prof. Rocci Luppicini

Why a PhD from uOttawa may not be worth the paper it’s printed on

Why uOttawa Prof. Andre Vellino refused academic support to postdoc

Supervisor Bullying

google data analytics capstone case study 2

DOWNLOAD ALL THE QUIZ ANSWERS

(COURSES 1 - 8)

Google Data Analytics Capstone: Complete a Case Study

Image of instructor, Google Career Certificates

About this Course

This course is the eighth course in the Google Data Analytics Certificate. You’ll have the opportunity to complete an optional case study, which will help prepare you for the data analytics job hunt. Case studies are commonly used by employers to assess analytical skills. For your case study, you’ll choose an analytics-based scenario. You’ll then ask questions, prepare, process, analyze, visualize and act on the data from the scenario. You’ll also learn other useful job hunt skills through videos with common interview questions and responses, helpful materials to build a portfolio online, and more. Current Google data analysts will continue to instruct and provide you with hands-on ways to accomplish common data analyst tasks with the best tools and resources.

Learners who complete this certificate program will be equipped to apply for introductory-level jobs as data analysts. No previous experience is necessary. By the end of this course, you will: - Learn the benefits and uses of case studies and portfolios in the job search. - Explore real world job interview scenarios and common interview questions. - Discover how case studies can be a part of the job interview process. - Examine and consider different case study scenarios. - Have the chance to complete your own case study for your portfolio.

  No prior experience with spreadsheets or data analytics is required. All you need is high-school level math and a curiosity about how things work.

Could your company benefit from training employees on in-demand skills?

What you will learn

Differentiate between a capstone, case study, and a portfolio

Identify the key features and attributes of a completed case study

Apply the practices and procedures associated with the data analysis process to a given set of data

Discuss the use of case studies/portfolios when communicating with recruiters and potential employers

Skills you will gain

Placeholder

Google Career Certificates Top Instructor

Placeholder

Grow with Google is an initiative that draws on Google's decades-long history of building products, platforms, and services that help people and businesses grow. We aim to help everyone – those who make up the workforce of today and the students who will drive the workforce of tomorrow – access the best of Google’s training and tools to grow their skills, careers, and businesses.

See how employees at top companies are mastering in-demand skills

Syllabus - What you will learn from this course

Learn about capstone basics.

A capstone is a crowning achievement. In this part of the course, you’ll be introduced to capstone projects, case studies, and portfolios, as well as how they help employers better understand your skills and capabilities. You’ll also have an opportunity to explore online portfolios of real data analysts.

Optional: Building your portfolio

In this part of the course, you’ll get an overview of two possible tracks to complete your case study. You can use a dataset from one of the business cases provided or search for a public dataset and develop a business case for an area of personal interest. In addition, you'll be introduced to several platforms for hosting your completed case study.

Optional: Using your portfolio

Your portfolio is meant to be seen and explored. In this part of the course, you’ll learn how to discuss your portfolio and highlight specific skills in interview scenarios. You’ll also create and practice an elevator pitch for your case study. Finally, you’ll discover how to position yourself as a top applicant for data analyst jobs with useful and practical interview tips.

Putting your certificate to work

Earning your Google Data Analytics Certificate is a badge of honor. It's also a real badge. In this part of the course, you'll learn how to claim your certificate badge and display it in your LinkedIn profile. You'll also be introduced to job search benefits that you can claim as a certificate holder, including access to the Big Interview platform and Byteboard interviews.

TOP REVIEWS FROM GOOGLE DATA ANALYTICS CAPSTONE: COMPLETE A CASE STUDY

It is a great starting point toward a career in Analytics. For people with no IT background, this will be their first project & the topics. The simplicity of the project also adds to its credit.

I think it´s a great course that teaches you a lot. My favorite part was the coding exercises and the Case Study. Thanks a lot to all of the Teachers , Mentors and Developers of this course :)

I found a new passion in data analytics. I already signed up for a data analytics boot camp to further develop my data analytics team. Thank you to the amazing Google team that taught the courses.

Helpful course to get one started in their data analytics career. I do recommand other training along with this course for a complete understanding of the data analytics career field.

Frequently Asked Questions

When will I have access to the lectures and assignments?

Access to lectures and assignments depends on your type of enrollment. If you take a course in audit mode, you will be able to see most course materials for free. To access graded assignments and to earn a Certificate, you will need to purchase the Certificate experience, during or after your audit. If you don't see the audit option:

The course may not offer an audit option. You can try a Free Trial instead, or apply for Financial Aid.

The course may offer 'Full Course, No Certificate' instead. This option lets you see all course materials, submit required assessments, and get a final grade. This also means that you will not be able to purchase a Certificate experience.

What will I get if I subscribe to this Certificate?

When you enroll in the course, you get access to all of the courses in the Certificate, and you earn a certificate when you complete the work. Your electronic Certificate will be added to your Accomplishments page - from there, you can print your Certificate or add it to your LinkedIn profile. If you only want to read and view the course content, you can audit the course for free.

What is data analytics?

Data is a group of facts that can take many different forms, such as numbers, pictures, words, videos, observations, and more. We use and create data everyday, like when we stream a show or song or post on social media.

Data analytics is the collection, transformation, and organization of these facts to draw conclusions, make predictions, and drive informed decision-making.

Why start a career in data analytics?

The amount of data created each day is tremendous. Any time you use your phone, look up something online, stream music, shop with a credit card, post on social media, or use GPS to map a route, you’re creating data. Companies must continually adjust their products, services, tools, and business strategies to meet consumer demand and react to emerging trends. Because of this, data analyst roles are in demand and competitively paid.

Data analysts make sense of data and numbers to help organizations make better business decisions. They prepare, process, analyze, and visualize data, discovering patterns and trends and answering key questions along the way. Their work empowers their wider team to make better business decisions.

Why enroll in the Google Data Analytics Certificate?

You will learn the skill set required for becoming a junior or associate data analyst in the Google Data Analytics Certificate. Data analysts know how to ask the right question; prepare, process, and analyze data for key insights; effectively share their findings with stakeholders; and provide data-driven recommendations for thoughtful action.

You’ll learn these job-ready skills in our certificate program through interactive content (discussion prompts, quizzes, and activities) in under six months, with under 10 hours of flexible study a week. Along the way, you'll work through a curriculum designed with input from top employers and industry leaders, like Tableau, Accenture, and Deloitte. You’ll even have the opportunity to complete a case study that you can share with potential employers to showcase your new skill set.

After you’ve graduated from the program, you’ll have access to career resources and be connected directly with employers hiring for open entry-level roles in data analytics.

What background is required?

No prior experience with spreadsheets or data analytics is required. All you need is high-school level math and a curiosity about how things work.

Do you need to be strong at math to succeed in this certificate?

You don't need to be a math all-star to succeed in this certificate. You need to be curious and open to learning with numbers (the language of data analysts). Being a strong data analyst is more than just math, it's about asking the right questions, finding the best sources to answer your questions effectively, and illustrating your findings clearly in visualizations.

What tools and platforms are taught in the curriculum?

You'll learn to use analysis tools and platforms such as spreadsheets (Google Sheets or Microsoft Excel), SQL, presentation tools (Powerpoint or Google Slides), Tableau, RStudio, and Kaggle.

Which “spreadsheet” platform is being taught?

Learners can self-select which platform they want to use throughout the program: Google Sheets or Microsoft Excel. It’s up to the learner’s preference, and all activities throughout the syllabus can be performed on either platform.

Why would I choose to complete the optional capstone project in this certificate?

In the data analyst job hunt, it’s important to demonstrate that you’re able to ask the right questions and that you have the right skills to find the answers . Hiring managers often want proof that you can apply concepts in a meaningful way. Because of this, during the job application process, many employers ask for a link to a portfolio. Our optional capstone project will help learners produce meaningful artifacts for employers to reference during the job interview process. Learners will be encouraged to post these on a public Kaggle portfolio or on GitHub.

Do you need to take each course in course order?

We highly recommend completing the courses in the order presented because the content in each course builds on information from earlier lessons.

More questions? Visit the Learner Help Center .

Build employee skills, drive business results

Coursera Footer

Start or advance your career.

Popular Courses and Certifications

Popular collections and articles

Earn a degree or certificate online

Placeholder

IMAGES

  1. Google Data Analytics Capstone: Complete a Case Study-学不厌资源

    google data analytics capstone case study 2

  2. Isaac Langit

    google data analytics capstone case study 2

  3. Google Data Analytics Capstone Project: Cyclistic bike-share analysis

    google data analytics capstone case study 2

  4. AdSense/Content Site Case Study Month 9 Update

    google data analytics capstone case study 2

  5. Case Study MOOC and Free Online Courses

    google data analytics capstone case study 2

  6. Business Analytics Capstone Project Github

    google data analytics capstone case study 2

VIDEO

  1. How to become a Google Data Analyst in the Philippines

  2. How to apply for Financial Aids for Google Data analytics Professional Certification 2023

  3. Google Data Analytics: Cyclistic Bike Share (SQL DB Part 2)

  4. DATA ANALYTICS CASES

  5. Danielle Snyder Data Analytics Capstone Project

  6. Case Study: Site Analytics

COMMENTS

  1. Google Data Analytics Capstone

    Google Data Analytics Capstone - Case Study 2 ... In this specific case study, I am playing the role of a junior data analyst for a company called Bellabeat

  2. Google Data Analytics Capstone: Case Study 2

    This whole notebook illustrates my approach on the Google Data Analytics Case Study Project: Case Study 2: How Can a Wellness Technology Company Play It

  3. Google Data Analytics: Case Study 2 (Using RStudio)

    Hello again! This will be another documentation of how I approached the 2nd case study within the Google Data Analytics Professional

  4. Google Data Analytics Certificate Case Study

    Google Data Analytics Capstone - Case Study 2. How Can a Wellness Company Play it Smart? Introduction and Goals. This is the capstone project

  5. Google Data Analytics Capstone, Case Study2

    Google Data Analytics, Case_study2 ... document is made to clarify the trends and to apply for a ne marketing strategy through the analysis.

  6. Bellabeat Case Study

    Primary stakeholders: Urška Sršen and Sando Mur, executive team members. Secondary stakeholders: Bellabeat marketing analytics team. 2. Prepare. Data Source: 30

  7. Google Data Analytics Capstone Project

    Google Data Analytics Capstone Project. Updated: 2 days ago. I worked on the Google Data Analytics Capstone Project, Track 1, Case Study 1.

  8. Course 8

    Google Data Analytics Professional Certificate Course 8: Capstone – quiz answers · Week 1: Learn about capstone basics · Week 2: Optional: Building your portfolio

  9. Google Data Analytics Capstone: Complete a Case Study

    This course is the eighth course in the Google Data Analytics Certificate. You'll have the opportunity to complete an optional case study, which will help

  10. Google Data Analytics Certificate Course 8 of 8

    Google Data Analytics Certificate Course 8 of 8 - Capstone Case Study + Full Program Impressions.