Data Analytics with Google Cloud BigQuery and Looker Studio
1 Data Analytics with Google Cloud BigQuery and Looker Studio Certification
2 Introduction
This report summarizes what I learned from the LinkedIn Learning course Data Analytics with Google Cloud BigQuery and Looker Studio.
The main goal of the course was to understand how to:
Analyze large datasets using SQL in BigQuery
Create visualizations in Looker Studio
Use both tools together to tell a clear story with data
The course was structured around a real-world scenario where I had to analyze how global temperatures have changed over time and present those findings visually.
3 Course Setup and Big Picture
At the beginning of the course, the instructor framed everything around a practical question:
To answer this, we needed to: 1. Find and explore data
2. Analyze it using SQL
3. Visualize it using charts
4. Communicate the results clearly
The two main tools used were:
BigQuery
Used for data analysis and SQL queries
Looker Studio
Used for creating visualizations and dashboards
4 Understanding BigQuery
4.1 What BigQuery Is
BigQuery is a cloud-based data warehouse that allows you to run SQL queries on very large datasets. One thing that stood out to me is that it is fully managed, so you don’t need to worry about infrastructure at all. You can just focus on writing your queries and analyzing the data.
Another important point is that BigQuery is column-based, which affects performance and cost. When you run a query, you are charged based on the amount of data processed, so it’s important to write efficient queries.
Because of this: - Using SELECT * can be expensive.
- It is better to select only the columns needed for analysis.
4.2 Accessing Public Datasets
Instead of uploading our own dataset, we worked with a public dataset available directly in BigQuery. This is important because it allows us to practice analyzing real-world data without needing to collect or clean it ourselves first.
The dataset used in this course was:
- NOAA GSOD (Global Surface Summary of the Day)
This dataset contains daily weather observations collected from over 9,000 stations around the world. It includes a large amount of historical data, covering the period from 1929 to the present, which makes it ideal for analyzing long-term trends like global temperature change.
4.3 What the Dataset Includes
The GSOD dataset contains several key variables:
- Temperature → average daily temperature (main variable used in analysis)
- Wind speed → useful for weather pattern analysis
- Air pressure → another environmental measurement
Each row in the dataset represents a single daily observation from a specific weather station.
4.4 How the Data is Structured
One important detail is that the dataset is not stored as one large table. Instead, it is split into multiple tables by year, such as:
gsod1929gsod1930gsod1940- etc.
This structure is important because it affects how we query the data later. Instead of querying one table, we often need to query multiple tables at once using wildcards.
4.5 Key Things I Learned
- BigQuery provides access to over 200 public datasets, which makes it a powerful tool for learning and practice.
- Before writing any queries, it is important to:
- Preview the data
- Understand the schema (column names and types)
- The GSOD dataset is large (tens of gigabytes), but BigQuery can still process it quickly.
- Real-world datasets are often not perfectly organized, so understanding structure is a key step before analysis.
4.6 Why This Step Matters
This step helped me understand that data analysis always starts with finding and understanding the data, not just writing queries.
Before doing any calculations or visualizations, I need to: 1. Know what the dataset contains
2. Understand how it is structured
3. Identify which variables are actually useful
This foundation made the later SQL queries and visualizations much easier to understand.
4.7 Running SQL Queries in BigQuery
After exploring the dataset, the next step was to begin writing SQL queries to analyze the data.
4.8 Basic Query Example
This was the first query used to look at temperature data from one year:
SELECT
year,
AVG(temp) AS mean_temp
FROM `bigquery-public-data.noaa_gsod.gsod*`
GROUP BY year
ORDER BY yearThis query does the following: - Selects the year and calculates the average temperature(mean_temp) for each year.
Uses a wildcard (
gsod*) to query all tables that start withgsod, which allows us to analyze data across multiple years.Groups the results by year and orders them chronologically.
4.9 Creating Visualizations in Looker Studio
After analyzing the data in BigQuery, the next step was to visualize the results using Looker Studio.
4.10 Steps I followed:
- Open Looker Studio and create a blank report
- Click Add Data → BigQuery
- Select:
- Project
- Dataset
- Saved view
Saving Queries as Views
After writing queries, we saved them as views in BigQuery.
This is useful because: - It allows the query results to be reused - It makes it easier to connect data to Looker Studio - It avoids rewriting the same query multiple times
This step connects the data analysis stage to the visualization stage.
4.11 Building a Line Chart
- Dimension →
year - Metric →
mean_temp - Chart type → Line chart
- Sort → ascending
4.12 Improvements Made:
- Removed incomplete year (current year)
- Adjusted axis range for better readability
- Added a reference line (average temperature)
4.13 Key Insight:
The line chart clearly showed that global temperature has increased over time, which supports the overall analysis.
4.13.1 Updating Data Sources in Looker Studio
After modifying queries or adding new fields, the data source in Looker Studio needs to be refreshed.
Steps: - Go to Resource → Manage Data Sources
- Click Edit → Reconnect
This ensures that new variables (like decade or temp_delta) are available for visualization.
5 Joining Datasets for Location Data
One important challenge was that the GSOD dataset does not directly include country names in a usable way. Instead, it uses station IDs.
To solve this, we joined the dataset with a reference table:
SELECT
year,
country,
AVG(temp) AS mean_temp
FROM `bigquery-public-data.noaa_gsod.gsod194*` AS gsod
JOIN `bigquery-public-data.noaa_gsod.stations` AS stations
ON gsod.stn = stations.usaf
GROUP BY year, country
ORDER BY year, country5.1 What This Query Does
This query combines the GSOD dataset with the stations reference table to add location information.
- The
JOINconnects the two tables using the station ID
gsod.stn = stations.usaflinks each weather observation to a specific station
- The
countryfield comes from the stations table
5.2 Why This Step Is Important
The original GSOD dataset does not directly include country names in a usable format. Without this join:
- We would only have station IDs
- We would not be able to analyze data by country
- The results would be harder to interpret
By joining the tables, the data becomes much more meaningful and easier to work with.
5.3 How This Connects to the Next Step
After adding country information, we can:
- Group data by country and year
- Compare temperature trends across different regions
- Create visualizations that show differences between countries
This step is essential for building charts in Looker Studio, especially when using:
- Breakdown dimensions (country)
- Geo charts (mapping data by location)
5.4 Key Takeaway
This step showed me that real-world data is often split across multiple tables, and combining data using joins is necessary to perform more meaningful analysis.
5.4.1 Interpreting Visualizations
The visualizations created in Looker Studio were not just for display, but for identifying patterns.
For example: - Line charts showed long-term temperature trends - Multiple lines (by country) allowed comparison across regions - Geo charts highlighted where temperature changes were more significant
This helped turn raw data into meaningful insights.
5.5 Overall Takeaways from the Course
This course showed me that data analytics is not just about writing SQL queries or creating charts. It is a complete process that involves:
- Understanding the data
- Structuring queries correctly
- Transforming raw data into useful metrics
- Visualizing results in a clear way
- Communicating insights effectively
One of the most important things I learned is that tools like BigQuery and Looker Studio work best together. BigQuery handles the data processing, while Looker Studio helps present the results in a way that is easy to understand.
This experience helped me see how data can be used to answer real-world questions, such as how global temperatures have changed over time.