Google Business Intelligence Capstone Project: Google Fiber Scenario

Daniel Poe 2025-06-17

This is a capstone project that I have completed as part of the Google Business Intelligence Professional Certificate. The Google Business Intelligence Certificate programme builds on the foundations of the Google Data Analytics Certificate and helps enhance technical skills, including BigQuery, Tableau, and SQL. Course participants can learn the following:

Introduction

I am interviewing for a job with Google Fiber (also known as GFiber), which provides people and businesses with fibre optic internet. As part of the interview process, the Fiber customer service team has asked me to design a dashboard using fictional data. The position I am interviewing for is in the customer call centre, where GFiber uses business intelligence to monitor and improve customer satisfaction. To provide the interviewers with both BI value and organisational data maturity, I will use my knowledge of the BI stages: Capture, Analyse, and Monitor.

The team needs to understand how often customers phone customer support again after their first inquiry; this will help leaders understand whether the team is able to answer customer questions the first time. Further, leaders want to explore trends in repeat calls to identify why customers are having to call more than once, as well as how to improve the overall customer experience. I will create a dashboard to reveal insights about repeat callers.

The team’s ultimate goal is to reduce call volume by increasing customer satisfaction and improving operational optimisation. My dashboard should demonstrate an understanding of this goal and provide stakeholders with insights about repeat caller volumes in different markets and the types of problems they represent.

Characters

Stakeholders and Team Members

My primary contacts are Emma and Keith.

Business Task

To create a dashboard to reveal insights about repeat callers in three different market cities.

Stakeholder Questions

Relevant Business Intelligence Documents

The relevant documents are as follows, which can be viewed by clicking on the links:

About the Dataset

This fictional dataset is a version of actual data the team works with. Because of this, the data is already anonymised and approved. It includes:

In order to anonymise and fictionalise the data, the datasets the columns market_1, market_2, and market_3 to indicate three different city service areas the data represents.

The data also lists five problem types:

Additionally, the dataset records repeat calls over seven-day periods. The initial contact date is listed as contacts_n. The other call columns are then contacts_n_number of days since first call. For example, contacts_n_6 indicates six days since first contact.

Preparing the Data for Dashboarding

Combining the Datasets with Google BigQuery

Having received the datasets from Google Fiber, I made use of Google BigQuery to survey the datasets.

SELECT * FROM `mimetic-retina-458409-c3.google_fiber_market_data.market_1` LIMIT 10;
date_created contacts_n contacts_n_1 contacts_n_2 contacts_n_3 contacts_n_4 contacts_n_5 contacts_n_6 contacts_n_7 new_type new_market
2022-02-04 199 21 6 11 7 14 5 6 type_5 market_1
2022-01-30 19 2 0 2 1 0 0 0 type_1 market_1
2022-02-14 29 0 2 2 0 1 0 1 type_1 market_1
2022-01-16 120 6 6 5 4 7 4 NA type_2 market_1
2022-02-03 182 27 13 NA 14 4 3 2 type_5 market_1
2022-03-08 3 NA NA NA NA NA NA 1 type_4 market_1
2022-03-05 2 1 NA 1 NA NA NA NA type_4 market_1
2022-02-01 262 14 8 7 3 2 3 8 type_2 market_1
2022-01-04 15 2 3 1 NA NA 1 2 type_3 market_1
2022-02-10 258 11 10 4 7 7 7 5 type_2 market_1

Displaying records 1 - 10

SELECT * FROM `mimetic-retina-458409-c3.google_fiber_market_data.market_2` LIMIT 10;
date_created contacts_n contacts_n_1 contacts_n_2 contacts_n_3 contacts_n_4 contacts_n_5 contacts_n_6 contacts_n_7 new_type new_market
2022-03-16 NA NA NA NA NA 1 NA NA type_4 market_2
2022-02-01 28 1 NA 1 NA NA 2 NA type_2 market_2
2022-01-05 11 2 0 0 0 0 0 0 type_1 market_2
2022-03-30 38 3 2 NA NA NA NA 1 type_5 market_2
2022-01-24 1 0 0 0 0 0 0 0 type_1 market_2
2022-02-27 NA NA NA NA NA NA NA NA type_4 market_2
2022-03-28 29 NA 2 NA 1 NA NA 1 type_2 market_2
2022-01-10 NA NA NA NA NA NA NA NA type_4 market_2
2022-01-03 5 NA NA NA NA NA NA NA type_3 market_2
2022-02-23 29 NA 1 NA 1 NA NA NA type_2 market_2

Displaying records 1 - 10

SELECT * FROM `mimetic-retina-458409-c3.google_fiber_market_data.market_3` LIMIT 10;
date_created contacts_n contacts_n_1 contacts_n_2 contacts_n_3 contacts_n_4 contacts_n_5 contacts_n_6 contacts_n_7 new_type new_market
2022-02-17 9 3 1 NA 2 2 NA 2 type_3 market_3
2022-01-04 8 3 3 NA NA NA NA 1 type_3 market_3
2022-03-16 27 3 2 2 NA NA NA 1 type_2 market_3
2022-01-23 9 NA NA NA NA NA 1 1 type_2 market_3
2022-02-06 2 NA NA NA NA NA NA NA type_4 market_3
2022-03-16 NA NA NA NA NA NA NA NA type_4 market_3
2022-03-13 8 3 5 NA 2 NA NA NA type_3 market_3
2022-02-09 27 1 2 NA 3 1 NA NA type_2 market_3
2022-03-30 33 2 2 1 1 NA NA NA type_2 market_3
2022-02-11 18 5 10 4 5 5 5 3 type_1 market_3

Displaying records 1 - 10

Since all the datasets have the same column names, I made use of the UNION ALL statement to combine all of the datasets together into one table.

SELECT * FROM `mimetic-retina-458409-c3.google_fiber_market_data.market_1`
UNION ALL
SELECT * FROM `mimetic-retina-458409-c3.google_fiber_market_data.market_2`
UNION ALL
SELECT * FROM `mimetic-retina-458409-c3.google_fiber_market_data.market_3`
LIMIT 10;
date_created contacts_n contacts_n_1 contacts_n_2 contacts_n_3 contacts_n_4 contacts_n_5 contacts_n_6 contacts_n_7 new_type new_market
2022-02-04 199 21 6 11 7 14 5 6 type_5 market_1
2022-01-30 19 2 0 2 1 0 0 0 type_1 market_1
2022-02-14 29 0 2 2 0 1 0 1 type_1 market_1
2022-01-16 120 6 6 5 4 7 4 NA type_2 market_1
2022-02-03 182 27 13 NA 14 4 3 2 type_5 market_1
2022-03-08 3 NA NA NA NA NA NA 1 type_4 market_1
2022-03-05 2 1 NA 1 NA NA NA NA type_4 market_1
2022-02-01 262 14 8 7 3 2 3 8 type_2 market_1
2022-01-04 15 2 3 1 NA NA 1 2 type_3 market_1
2022-02-10 258 11 10 4 7 7 7 5 type_2 market_1

Displaying records 1 - 10

I then named and exported the combined dataset as a CSV file named google_fiber_market_data.csv. The dataset is now ready for cleaning before proceeding with any form of visualisation.

Data Cleaning with Python

At the time of writing, I have been going through the Google Advanced Data Analytics Professional Certificate programme, where I have been learning new skills on Python. I decided to make use of the new skills I have learnt so far to prepare the data for visualisation in Tableau. I will make use of the pandas library for data cleaning.

import pandas as pd
# Importing the dataset from Google BigQuery
google_fiber_data=pd.read_csv('google_fiber_market_data.csv')
google_fiber_data.head()
##   date_created  contacts_n  contacts_n_1  ...  contacts_n_7  new_type  new_market
## 0   2022-02-03        58.0          20.0  ...           1.0    type_5    market_3
## 1   2022-02-22       227.0          69.0  ...           1.0    type_5    market_3
## 2   2022-03-04       181.0          44.0  ...           6.0    type_5    market_3
## 3   2022-02-14       373.0          63.0  ...           1.0    type_5    market_3
## 4   2022-02-15       158.0          17.0  ...           4.0    type_5    market_3
## 
## [5 rows x 11 columns]
# Obtaining the shape of the dataset
google_fiber_data.shape
## (1350, 11)

There are 1,350 rows and 11 columns in the combined dataset.

# Obtaining the data types of each column
google_fiber_data.dtypes
## date_created     object
## contacts_n      float64
## contacts_n_1    float64
## contacts_n_2    float64
## contacts_n_3    float64
## contacts_n_4    float64
## contacts_n_5    float64
## contacts_n_6    float64
## contacts_n_7    float64
## new_type         object
## new_market       object
## dtype: object

The date_created datatype is not correct. I will have to change the datatype from object to datetime by using the pd.to_datetime function.

# Changing the data type of 'date_created' to datetime
google_fiber_data["date_created"]=pd.to_datetime(google_fiber_data["date_created"])
google_fiber_data.head()
##   date_created  contacts_n  contacts_n_1  ...  contacts_n_7  new_type  new_market
## 0   2022-02-03        58.0          20.0  ...           1.0    type_5    market_3
## 1   2022-02-22       227.0          69.0  ...           1.0    type_5    market_3
## 2   2022-03-04       181.0          44.0  ...           6.0    type_5    market_3
## 3   2022-02-14       373.0          63.0  ...           1.0    type_5    market_3
## 4   2022-02-15       158.0          17.0  ...           4.0    type_5    market_3
## 
## [5 rows x 11 columns]
# Confirming the data type change
google_fiber_data.dtypes
## date_created    datetime64[ns]
## contacts_n             float64
## contacts_n_1           float64
## contacts_n_2           float64
## contacts_n_3           float64
## contacts_n_4           float64
## contacts_n_5           float64
## contacts_n_6           float64
## contacts_n_7           float64
## new_type                object
## new_market              object
## dtype: object

Now I will check the number of null values in the dataset by using the .isna() and .sum() functions.

# Checking for null values in the dataset
google_fiber_data.isna().sum()
## date_created      0
## contacts_n      182
## contacts_n_1    458
## contacts_n_2    529
## contacts_n_3    572
## contacts_n_4    611
## contacts_n_5    631
## contacts_n_6    634
## contacts_n_7    617
## new_type          0
## new_market        0
## dtype: int64

In this scenario, I will replace null values with 0 by using the .fillna() function.

# Replace NaN values with 0
google_fiber_data.fillna(value=0, inplace=True)
google_fiber_data
##      date_created  contacts_n  contacts_n_1  ...  contacts_n_7  new_type  new_market
## 0      2022-02-03        58.0          20.0  ...           1.0    type_5    market_3
## 1      2022-02-22       227.0          69.0  ...           1.0    type_5    market_3
## 2      2022-03-04       181.0          44.0  ...           6.0    type_5    market_3
## 3      2022-02-14       373.0          63.0  ...           1.0    type_5    market_3
## 4      2022-02-15       158.0          17.0  ...           4.0    type_5    market_3
## ...           ...         ...           ...  ...           ...       ...         ...
## 1345   2022-01-23         1.0           0.0  ...           0.0    type_4    market_1
## 1346   2022-01-10         1.0           0.0  ...           0.0    type_4    market_1
## 1347   2022-03-29         3.0           0.0  ...           0.0    type_4    market_1
## 1348   2022-02-16         3.0           0.0  ...           0.0    type_4    market_1
## 1349   2022-01-02       141.0          17.0  ...           4.0    type_5    market_1
## 
## [1350 rows x 11 columns]
# Verifying that there are no more null values
google_fiber_data.isna().sum()
## date_created    0
## contacts_n      0
## contacts_n_1    0
## contacts_n_2    0
## contacts_n_3    0
## contacts_n_4    0
## contacts_n_5    0
## contacts_n_6    0
## contacts_n_7    0
## new_type        0
## new_market      0
## dtype: int64

The original dataset had the number of calls columns as the integer datatype, but after importing the .csv file, the datatype was found to be float, as observed earlier. I will need to convert the datatype to integer by using the .astype(int) function.

# Changing the data type of contacts columns to integer
google_fiber_data["contacts_n"]=google_fiber_data["contacts_n"].astype(int)
google_fiber_data["contacts_n_1"]=google_fiber_data["contacts_n_1"].astype(int)
google_fiber_data["contacts_n_2"]=google_fiber_data["contacts_n_2"].astype(int)
google_fiber_data["contacts_n_3"]=google_fiber_data["contacts_n_3"].astype(int)
google_fiber_data["contacts_n_4"]=google_fiber_data["contacts_n_4"].astype(int)
google_fiber_data["contacts_n_5"]=google_fiber_data["contacts_n_5"].astype(int)
google_fiber_data["contacts_n_6"]=google_fiber_data["contacts_n_6"].astype(int)
google_fiber_data["contacts_n_7"]=google_fiber_data["contacts_n_7"].astype(int)

google_fiber_data.head()
##   date_created  contacts_n  contacts_n_1  ...  contacts_n_7  new_type  new_market
## 0   2022-02-03          58            20  ...             1    type_5    market_3
## 1   2022-02-22         227            69  ...             1    type_5    market_3
## 2   2022-03-04         181            44  ...             6    type_5    market_3
## 3   2022-02-14         373            63  ...             1    type_5    market_3
## 4   2022-02-15         158            17  ...             4    type_5    market_3
## 
## [5 rows x 11 columns]
# Confirming the data type change
google_fiber_data.dtypes
## date_created    datetime64[ns]
## contacts_n               int64
## contacts_n_1             int64
## contacts_n_2             int64
## contacts_n_3             int64
## contacts_n_4             int64
## contacts_n_5             int64
## contacts_n_6             int64
## contacts_n_7             int64
## new_type                object
## new_market              object
## dtype: object

Now that all the datatypes are correct, I will replace the problem types in the new_type column with the corresponding details. I will use the .replace() function for this step. The problem types are as follows:

# Replacing problem types with the exact type names
google_fiber_data=google_fiber_data.replace({"new_type":{"type_1":"account_management",
                                       "type_2":"technician_troubleshooting",
                                       "type_3":"scheduling",
                                       "type_4":"construction",
                                       "type_5":"internet_and_wifi"}})
google_fiber_data
##      date_created  contacts_n  ...           new_type  new_market
## 0      2022-02-03          58  ...  internet_and_wifi    market_3
## 1      2022-02-22         227  ...  internet_and_wifi    market_3
## 2      2022-03-04         181  ...  internet_and_wifi    market_3
## 3      2022-02-14         373  ...  internet_and_wifi    market_3
## 4      2022-02-15         158  ...  internet_and_wifi    market_3
## ...           ...         ...  ...                ...         ...
## 1345   2022-01-23           1  ...       construction    market_1
## 1346   2022-01-10           1  ...       construction    market_1
## 1347   2022-03-29           3  ...       construction    market_1
## 1348   2022-02-16           3  ...       construction    market_1
## 1349   2022-01-02         141  ...  internet_and_wifi    market_1
## 
## [1350 rows x 11 columns]

Given that the column contacts_n indicate the first, while the columns contacts_n_1, contacts_n_2 etc. onwards indicate the number of days since the first contact, I renamed the columns for greater clarity using the .rename() function.

#Rename columns for clarity
google_fiber_data=google_fiber_data.rename(columns={
    "contacts_n":"call_day_1",
    "contacts_n_1":"call_day_2",
    "contacts_n_2":"call_day_3",
    "contacts_n_3":"call_day_4",
    "contacts_n_4":"call_day_5",
    "contacts_n_5":"call_day_6",
    "contacts_n_6":"call_day_7",
    "contacts_n_7":"call_day_8",
    "new_type":"problem_type",
    "new_market":"market_area"})
google_fiber_data.head()
##   date_created  call_day_1  ...       problem_type  market_area
## 0   2022-02-03          58  ...  internet_and_wifi     market_3
## 1   2022-02-22         227  ...  internet_and_wifi     market_3
## 2   2022-03-04         181  ...  internet_and_wifi     market_3
## 3   2022-02-14         373  ...  internet_and_wifi     market_3
## 4   2022-02-15         158  ...  internet_and_wifi     market_3
## 
## [5 rows x 11 columns]

Now that the dataset has been cleaned and formatted, it is now ready to be used in Tableau for creating the dashboard. I exported the dataset as google_fiber_market_data_final.csv using the .to_csv() function.

# Exporting the cleaned dataset to a new CSV file
google_fiber_data.to_csv('google_fiber_market_data_final.csv', index=False)
# Reading the new CSV file to confirm the export
pd.read_csv('google_fiber_market_data_final.csv')
##      date_created  call_day_1  ...       problem_type  market_area
## 0      2022-02-03          58  ...  internet_and_wifi     market_3
## 1      2022-02-22         227  ...  internet_and_wifi     market_3
## 2      2022-03-04         181  ...  internet_and_wifi     market_3
## 3      2022-02-14         373  ...  internet_and_wifi     market_3
## 4      2022-02-15         158  ...  internet_and_wifi     market_3
## ...           ...         ...  ...                ...          ...
## 1345   2022-01-23           1  ...       construction     market_1
## 1346   2022-01-10           1  ...       construction     market_1
## 1347   2022-03-29           3  ...       construction     market_1
## 1348   2022-02-16           3  ...       construction     market_1
## 1349   2022-01-02         141  ...  internet_and_wifi     market_1
## 
## [1350 rows x 11 columns]

Now that we have cleaned and prepared the data, we can now proceed to create the dashboard on Tableau.

Low-Fidelity Mockup of the Dashboard

Making the Tableau Dashboard

I uploaded the final dataset into Tableau and created calculated fields on Tableau for the total number of repeat calls and the total number of calls.

I also created a calculated field called First Call Resolution (FCR). FCR is defined by SQM Group as a metric or key performance indicator (KPI) that measures a call centre’s performance for resolving customer interactions on the first call or contact, eliminating the need for follow-up contacts. FCR measures the percentage of calls resolved on the first contact without requiring follow-up.

The interactive Tableau dashboard can be found here. I have included filters by year, quarter, month, and week, as requested by the stakeholders. I have also included additional helpful filters by weekday, market city, and problem type. The FCR metric/KPI, first calls, and repeated calls will be automatically updated.

Analysis and Observations

First Call Resolution (FCR)

As mentioned earlier, FCR is a metric or KPI that measures a call centre’s performance for resolving customer interactions on the first call or contact, eliminating the need for follow-up contacts. FCR measures the percentage of calls resolved on the first contact without requiring follow-up.

According to SQM Group, although FCR benchmarks vary across different industries, the typical industry standard for a good FCR is rate is 70-79%. This indicates efficient processes, quality agent training, and effective problem-solving strategies that resolve most issues on the first call. A world-class FCR rate would be above 80%, which is difficult to achieve, but will bring significant rewards in terms of customer satisfaction, loyalty, and reduced operational costs.

The following is the FCR table for each market and problem type. Observations: Based on the three-month data provided by GFiber, the FCR is around 76.24%, initially indicating that the GFiber call centres are performing well overall. However, looking at individual problem types and market cities, there are different insights that could be drawn:

Market City

Observations: Market City 1 received the greatest number of calls within the quarter, followed by Market Cities 3 and 2.

Observations: Throughout the quarter, the greatest number of repeat calls came from Market City 1. Market City 3 has consistently received the least number of repeat calls.

Problem Types

Observations: Across the three market cities in the quarter, the two most significant issues repeat callers called about were regarding internet and Wi-Fi, and technician troubleshooting.

Executive Summary

The executive summary for this capstone project can be found here.

Key Findings and Recommendations

The key findings of this project are as follows:

Based on these key findings, I would like to make the following recommendations: