Lab - Interactive Visualizations with Plotly

Author

Affiliation

Parsa Keyvani

Georgetown University

Plotly with R

Plot 1

Question

How have the transistors (in million) of CPUs and GPUs changed over time, and how does this reflect on Moore’s Law?

Code

# Importing the dataset
chip_data <- read_csv("/Users/parsakeyvani/Desktop/Adv Data viz/spring-2024-lab5-plotly-keyvanip/data/chip_dataset.csv")

# data cleaning and manipulation to answer the first question
q1_data <- chip_data %>%
  select(`Release Date`, Type, `Transistors (million)`) %>%
  mutate(`Release Date` = as.Date(`Release Date`,"%m/%d/%y")) %>%
  filter(month(`Release Date`) == 12 & year(`Release Date`) < 2021) %>%
  mutate(Date = yearmonth(`Release Date`)) %>%
  group_by(Date, Type) %>%
  summarise(Transistors = round(mean(as.integer(`Transistors (million)`)))) %>%
  mutate(Growth = round(Transistors / lag(Transistors), 2)) 


# Create the Plotly plot
plot_ly(data = q1_data, x = ~as.Date(Date), y = ~Transistors, type = 'scatter', mode = 'lines+markers',
             color = ~Type, hoverinfo = 'text',
             text = ~paste('Date:', Date, "<br>Growth (if val = 2, matches Moore's Law):", Growth)) %>%
  layout(title = 'Transistors Growth Over Time by Type',
         xaxis = list(title = 'Date'),
         yaxis = list(title = 'Transistors (Million)'))

Figure 1: This graph illustrates the validation of Moore’s Law, demonstrating that the number of transistors on an integrated circuit has indeed doubled approximately every two years. The verification of this trend is facilitated through an interactive feature: hovering over data points reveals detailed information. Additionally, sequential data points are analyzed using a ‘Growth’ metric within the plot

Plot 2

Question

How does the base clock speed (Freq in GHz) correlate with the transistor count for CPUs and GPUs, considering the data available?

Code

# data cleaning and manipualtion to answer the second question
q2_data <- chip_data %>%
  select(`Release Date`, Type, `Transistors (million)`, `Freq (GHz)`) %>%
  mutate(`Release Date` = as.Date(`Release Date`,"%m/%d/%y")) %>%
  filter(month(`Release Date`) == 12 & year(`Release Date`) < 2021) %>%
  mutate(Date = yearmonth(`Release Date`)) %>%
  group_by(Date, Type) %>%
  summarise(Transistors = round(mean(as.integer(`Transistors (million)`))),
            Freq = round(mean(as.integer(`Freq (GHz)`)))) %>%
  drop_na()


# Separate data by Type and fit linear models for each
models <- q2_data %>%
  split(.$Type) %>%
  map(~linear_reg() %>% 
        set_engine("lm") %>%
        set_mode("regression") %>%
        fit(Transistors ~ Freq, data = .x))

# Prepare a sequence of Freq values for predictions for each Type
freq_range <- range(q2_data$Freq)
x_range <- seq(from = freq_range[1], to = freq_range[2], length.out = 100)

# Predict using the models for each Type and prepare for plotting
predictions <- map2(models, names(models), ~{
  new_data <- tibble(Freq = x_range)
  predicted <- predict(.x, new_data) %>% 
    bind_cols(new_data, .) %>% 
    mutate(Type = .y)
}) %>% bind_rows()

predictions <- predictions %>%
  mutate(Type = ifelse(Type == "CPU", "Regression Fit CPU", "Regression Fit GPU")) %>%
  mutate(Line_Color = ifelse(Type == "Regression Fit CPU", "blue", "grey"))


# Create the initial scatter plot
fig <- plot_ly(data = q2_data, x = ~Freq, y = ~Transistors, type = 'scatter', mode = 'markers',
               color = ~Type, colors = c("CPU" = "blue", "GPU" = "grey"), alpha = 0.65) %>%
  layout(title = 'Clock Speed (Freq in GHz) Correlation with the Transistors \nCount for CPUs and GPUs',
         xaxis = list(title = 'Freq (GHz)'),
         yaxis = list(title = 'Transistors (Million)'))

# Add the regression lines for each Type with specified colors
fig <- fig %>%
  add_lines(data = predictions, x = ~Freq, y = ~.pred, line = list(color = ~Line_Color), name = ~Type)

fig

Figure 2: The plot shows correlation between Base Clock Speed and Transistor Count in CPUs and GPUs. Both GPUs and CPUs exhibit a positive correlation between Freq and the number of transistors; however, GPUs show a greater magnitude of increase.

Code

# question 3 data manipulation for the plot
q3_data <- chip_data %>%
  select(`Release Date`, `Die Size (mm^2)`, Type) %>%
  mutate(`Release Date` = as.Date(`Release Date`,"%m/%d/%y")) %>%
  mutate(Date = yearquarter(`Release Date`)) %>%
  group_by(Date, Type) %>%
  summarise(die_size = round(mean(as.integer(`Die Size (mm^2)`)))) %>%
  drop_na() %>%
  mutate(Date = as.Date(Date,"%m/%d/%y"))

Code

# question 4 data manipulation for the plot
q4_data <- chip_data %>%
  select(Vendor, `Release Date`, `TDP (W)`, Type, `Process Size (nm)`) %>%
  mutate(`Release Date` = as.Date(`Release Date`,"%m/%d/%y")) %>%
  mutate(Date = yearquarter(`Release Date`)) %>%
  filter(Type == "GPU") %>%
  group_by(Date, Vendor) %>%
  summarise(TDP = round(mean(as.integer(`TDP (W)`))),
            Process_size = round(mean(as.integer(`Process Size (nm)`)))) %>%
  drop_na() %>%
  mutate(Date = as.Date(Date,"%m/%d/%y")) %>%
  filter(Vendor != "3dfx")

Plotly with Python

Plot 1

Question

How do CPUs and GPUs differ in terms of die size and process size for the same time period?

Code

import numpy as np
import pandas as pd
import plotly.express as px

# moving the cleaned data from r to python
q3_data_py = r.q3_data
q3_data_py = pd.DataFrame(q3_data_py) 

# Creating the line plot
fig = px.line(q3_data_py, x='Date', y='die_size', color='Type', facet_col="Type",
              title='Die Size Over Time (quarterly) by Type',
              labels={'die_size': 'Die Size', 'Date': 'Date'},
              template='plotly_white',
              markers=True) # Adding markers to points on the line for clarity

fig.show()

Figure 3: This graph tracks the quarterly changes in die size for both CPUs and GPUs over two decades. The plot reveals the fluctuations and general upward trends in the physical size of the processor dies, which is indicative of the technological advancements and manufacturing capabilities in semiconductor fabrication during the observed period. The data shows variability in die sizes, with GPUs generally exhibiting larger die sizes compared to CPUs, which reflects their different roles and performance requirements in computing.

Plot 2

Question

Which vendor has shown the most significant improvements in chip technology (specifically process size) over the years and what are the trends?

Code

# moving the cleaned data from r to python
q4_data_py = r.q4_data
q4_data_py = pd.DataFrame(q4_data_py)

# Convert the Date column to datetime type for better plotting
q4_data_py['Date'] = pd.to_datetime(q4_data_py['Date'])

# Create the line plot
fig = px.line(q4_data_py, x='Date', y='Process_size', color='Vendor', 
              facet_col='Vendor', facet_col_wrap=2,
              title='Process Size Over Time by Vendor',
              labels={'Process_size': 'Process Size (nm)', 'Date': 'Date'},
              template='plotly_white',
              hover_data=['Date'],
              markers=True)

fig.show()

Figure 4: This graph displays the progressive reduction in semiconductor process size (measured in nanometers) of four major vendors: ATI, NVIDIA, Intel, and AMD. Each line represents the vendor’s trajectory in minimizing the process size of their chips, which is a key indicator of technological advancement in semiconductor manufacturing. The trend lines show that all vendors have successfully reduced their process sizes over the years, with varying degrees of steepness in their respective trajectories. The plot shows that NVIDIA has been able to make the lowest process size as of the latest date with 5 nm. Intel and AMD are also very close coming at 6 nm both. ATI however is the vendor with the largest nm at 40, but this is mainly becasue our data stops at 2012 for ATI, which is not an accurate comparison as we have more recent data for the other three vendors.