Course: “DATA 101”, institution: “Montgomery College” term: “Spring 2021”.
Moore’s Law is an observation that describes the number of transistors in an integrated circuit will double approximately every two years. This law illustrates the sheer speed at which semiconductor and chip technology has improved over the years. Moore’s prediction has been proven highly accurate over the years, however in recent years (the 2020s) the trend has changed due to physical and fundamental limitations on the degree to which components can be miniaturized. This project aims to observe how Moore’s Law sustains over the years and predict if this trend will continue in the foreseeable future using datasets from the transistor count Wikipedia and TOP500 supercomputers.
“Moore’s law is the observation that the number of transistors in a dense integrated circuit doubles approximately every two years.” [1].
Moore’s law is an observation made by, and named after, Intel’s co-founder Gordon E. Moore in 1965 predicting the number of transistors on integrated circuits will double approximately every two years. So far, this prediction has proven to be highly accurate suggesting exponential growth. However this growth is unlikely to sustain forever. Some studies have shown that within the next few years physical and fundamental limitations could be reached, meaning this law may become unfeasible.
In this regression analysis we are looking to assess over a period of time if the performance of each processor unit behaves as Moore’s law describes and how feasible this law will be in the near future.
This regression analysis will be divided into four parts:
Observation analysis to see how each processor behaves according to Moore’s law using the dataset from transistors count Wikipedia, extending over a 50 year period (1971-2021).
Performance analysis to see the performance of the Top 500 supercomputers using the dataset from TOP500 supercomputers, extending over a 11 year period (2010-2021).
Prediction analysis to predict how feasible Moore’s law will be in the near future using the dataset from TOP500 supercomputers, extending over a 11 year period (2010-2021).
A conclusion to summaries my report and to predict the foreseeable future of transistors.
library("dplyr")
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library("ggplot2")
library("ggpubr")
library("ggrepel")
library("janitor")
##
## Attaching package: 'janitor'
## The following objects are masked from 'package:stats':
##
## chisq.test, fisher.test
library("rvest")
library("readxl")
library("tidyverse")
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ tibble 3.1.1 ✓ purrr 0.3.4
## ✓ tidyr 1.1.3 ✓ stringr 1.4.0
## ✓ readr 1.4.0 ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x readr::guess_encoding() masks rvest::guess_encoding()
## x dplyr::lag() masks stats::lag()
url <- "https://en.wikipedia.org/wiki/Transistor_count"
html <- read_html(url)
html_nodeset <- html %>%
html_nodes("table")
df_transistors <- html_nodeset[[2]] %>%
html_table() %>%
as_tibble()
df_transistors <- df_transistors %>%
janitor::clean_names() %>%
mutate(processor = str_squish(str_remove(processor, "\\(.*"))) %>%
mutate_at(c("mos_transistor_count", "date_ofintroduction", "mos_process_nm", "area_mm2"), parse_number)
## Warning: 14 parsing failures.
## row col expected actual
## 1 -- a number ?
## 3 -- a number ?
## 12 -- a number ?
## 13 -- a number ?
## 17 -- a number ?
## ... ... ........ ......
## See problems(...) for more details.
## Warning: 31 parsing failures.
## row col expected actual
## 1 -- a number ?
## 5 -- a number ?
## 12 -- a number ?
## 13 -- a number ?
## 17 -- a number ?
## ... ... ........ ......
## See problems(...) for more details.
df_transistors %>%
count(designer, sort = T)
## # A tibble: 40 x 2
## designer n
## <chr> <int>
## 1 Intel 63
## 2 AMD 18
## 3 Apple 14
## 4 IBM 13
## 5 Fujitsu 10
## 6 Huawei 7
## 7 Motorola 7
## 8 Qualcomm 7
## 9 Texas Instruments 5
## 10 Acorn 4
## # … with 30 more rows
df_transistors <- df_transistors %>%
mutate(designer = fct_lump(designer, 5))
process_labels <- sample(df_transistors$processor, 50)
cleanup = theme(panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.background = element_blank(),
axis.line = element_line(color = "black"))
df_transistors %>%
ggplot(aes(date_ofintroduction, mos_transistor_count)) +
geom_smooth(group = 1, alpha = .1, method = "loess", formula = y ~ x) +
geom_point(aes(color = designer, shape = designer), cex = 2) +
geom_text_repel(data = df_transistors %>%
filter(processor %in% process_labels),
aes(date_ofintroduction, mos_transistor_count, label = processor),
cex = 2, max.overlaps = 30, color = "#505050") +
scale_color_brewer(palette = "Dark2") +
scale_shape_manual(values = seq(15,20, by = 1)) +
scale_y_log10(labels = scales::comma_format()) +
labs(x = "Years",
y = "Transistor Count (log scale)",
shape = "Chip Designer",
color = "Chip Designer",
caption = "Source: Wikipedia - Transistor Count: http//en.wikipedia.org/wiki/Transistor_count",
title = "Moore's Law",
subtitle = "Transistors count by Year, 1971-2021")+
cleanup
By observation it is given that there is a positive relationship between years and the transistor count over the period of time. Moore’s Law stays true until ~ 2005 where the progression in transistor count has slowed. Also, even though production of chips has increased the rate of transistor counts are diminishing.
df_transistors %>%
ggplot(aes(x = date_ofintroduction, y = mos_transistor_count)) +
geom_smooth(method=lm, formula = y ~ x, se = FALSE) +
geom_point(aes(color = designer, shape = designer), cex = 3) +
scale_color_brewer(palette = "Dark2") +
scale_shape_manual(values = seq(15,20, by = 1)) +
scale_y_log10(labels = scales::comma_format()) +
labs(x = "Years",
y = "Transistor Count (log scale)",
shape = "Chip Designer",
color = "Chip Designer",
caption = "Source: Wikipedia - Transistor Count: http//en.wikipedia.org/wiki/Transistor_count",
title = "Moore's Law",
subtitle = "Transistors count by Year, 1971-2021")
fit1 <- lm(date_ofintroduction ~ mos_transistor_count, data = df_transistors)
summary(fit1)
##
## Call:
## lm(formula = date_ofintroduction ~ mos_transistor_count, data = df_transistors)
##
## Residuals:
## Min 1Q Median 3Q Max
## -41.989 -8.819 5.097 9.438 18.784
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.998e+03 1.009e+00 1979.02 < 2e-16 ***
## mos_transistor_count 1.598e-09 1.828e-10 8.74 1.19e-15 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 12.42 on 191 degrees of freedom
## Multiple R-squared: 0.2857, Adjusted R-squared: 0.2819
## F-statistic: 76.39 on 1 and 191 DF, p-value: 1.185e-15
By linearising y ~ x variables, a linear least squared estimator can fit. By observation, this scatterplot shows a strong, positive, linear association between Years and Transistor counts. There are a dozen of outliers, but the outliers are from other chip designers which truly doesn’t represent Moore’s Law.
df_transistors %>%
ggplot(aes(date_ofintroduction, mos_transistor_count)) +
geom_point(aes(color = designer, shape = designer), cex = 3)+
geom_smooth(group = 1, alpha = .1, method = loess, formula = y ~ x) +
facet_wrap(~designer, nrow = 3, scale = "free_y") +
scale_color_brewer(palette = "Dark2") +
scale_shape_manual(values = seq(15,20, by = 1)) +
scale_y_log10(labels = scales::comma_format()) +
labs(x = "Years",
y = "Transistor Count (log scale)",
shape = "Chip Designer",
color = "Chip Designer",
caption = "Source: Wikipedia - Transistor Count: http//en.wikipedia.org/wiki/Transistor_count",
title = "Moore's Law",
subtitle = "Transistors count by Year, 1971-2021")
Each chip designer has a strong positive relationship between years and the transistor count.
Based on the graphs, excluding other chip designer, Fujitsu, IBM, and Intel have been slowing down since 2010, while AMD and Apples are gradually increasing.
AMD have been increasing thanks to there TSMC’s 7nm+ node [5].
Apples have been increasing thanks to there M1 Processors [6].
TOP500_202011 <- read_excel("TOP500_202011.xlsx")
view(TOP500_202011)
TOP500_202011 %>%
ggplot(aes(Year,Rmax)) +
geom_jitter(aes(color = cut(Rank,
breaks = c(1, 100, 200, 300, 400, 500),
include.lowest = TRUE))) +
geom_smooth(method=lm, formula = y ~ x, se = FALSE) +
scale_color_brewer(name = "TOP500 Rank", palette = "Paired") +
scale_y_log10(labels = scales::comma_format()) +
labs(x = "Dated in years starting from 2010",
y = "Rmax (FLOPS)",
caption = "Source: TOP500 - Rmax: https://www.top500.org/",
title = "Moore's Law",
subtitle = "Rmax by Year, 2010-2021")
Rmax is the maximum performance. By linearising the graph, a least squared estimator can fit visualized as the blue line. By observation, since 2015, there has been noticeable exponential growth in Rmax.
TOP500_202011 %>%
ggplot(aes(Year,Rpeak)) +
geom_jitter(aes(color = cut(Rank,
breaks = c(1, 100, 200, 300, 400, 500),
include.lowest = TRUE))) +
geom_smooth(method=lm, formula = y ~ x, se = FALSE) +
scale_color_brewer(name = "TOP500 Rank", palette = "Paired") +
scale_y_log10(labels = scales::comma_format()) +
labs(x = "Dated in years starting from 2010",
y = "Rpeak (FLOPS)",
caption = "Source: TOP500 - Rmax: https://www.top500.org/",
title = "Moore's Law",
subtitle = "Rpeak by Year, 2010-2021")
Rpeak is the theoretical speed at which a computer can run in flops. Rpeak will always be greater than Rmax, but not by a too big margin. Therefore, it is logical expected that both graphs reflect similarities. There has also been noticeable exponential growth in Rpeak.
TOP500_202011 %>%
ggplot(aes(Year,Cores)) +
geom_jitter(aes(color = cut(Rank,
breaks = c(1, 100, 200, 300, 400, 500),
include.lowest = TRUE))) +
geom_smooth(method=lm, formula = y ~ x, se = FALSE) +
scale_color_brewer(name = "TOP500 Rank", palette = "Paired") +
scale_y_log10(labels = scales::comma_format()) +
labs(x = "Date in years starting from 2010",
y = "Total Cores",
caption = "Source: TOP500 - Total Cores: https://www.top500.org/",
title = "Moore's Law",
subtitle = "Total Cores by Year, 2010-2021")
Overall the average of number of cores within a transistor count is decreasing according to the negative linear regression line.
fit2 <- lm(Year ~ Rmax + Rpeak + Cores, data = TOP500_202011)
summary(fit2)
##
## Call:
## lm(formula = Year ~ Rmax + Rpeak + Cores, data = TOP500_202011)
##
## Residuals:
## Min 1Q Median 3Q Max
## -8.1723 -0.2200 -0.1618 0.8086 1.8779
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.018e+03 7.468e-02 27022.808 < 2e-16 ***
## Rmax -3.913e-05 1.791e-05 -2.185 0.02933 *
## Rpeak 4.247e-05 1.484e-05 2.863 0.00438 **
## Cores -4.913e-07 1.652e-07 -2.974 0.00308 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.552 on 496 degrees of freedom
## Multiple R-squared: 0.02733, Adjusted R-squared: 0.02145
## F-statistic: 4.646 on 3 and 496 DF, p-value: 0.003253
Mfit <- lm(Rmax ~ Year, data = TOP500_202011)
coef(Mfit)
## (Intercept) Year
## -1357036.4418 674.7962
Mfit2 <- lm(Rmax ~ I(Year - mean(Year)), data = TOP500_202011)
coef(Mfit2)
## (Intercept) I(Year - mean(Year))
## 4857.5237 674.7962
Mfit3 <- lm(Rmax ~ I(Year * 10), data = TOP500_202011)
coef(Mfit3)
## (Intercept) I(Year * 10)
## -1.357036e+06 6.747962e+01
newx <- c(2, 4, 6, 8, 10)
coef(Mfit)[1] + coef(Mfit)[2] * newx
## [1] -1355687 -1354337 -1352988 -1351638 -1350288
predict(Mfit, newdata = data.frame(Year = newx))
## 1 2 3 4 5
## -1355687 -1354337 -1352988 -1351638 -1350288
Pfit <- lm(Rpeak ~ Year, data = TOP500_202011)
coef(Pfit)
## (Intercept) Year
## -2250358.307 1118.827
Pfit2 <- lm(Rpeak ~ I(Year - mean(Year)), data = TOP500_202011)
coef(Pfit2)
## (Intercept) I(Year - mean(Year))
## 7692.072 1118.827
Pfit3 <- lm(Rpeak ~ I(Year * 10), data = TOP500_202011)
coef(Pfit3)
## (Intercept) I(Year * 10)
## -2250358.3068 111.8827
newx <- c(2, 4, 6, 8, 10)
coef(Pfit)[1] + coef(Pfit)[2] * newx
## [1] -2248121 -2245883 -2243645 -2241408 -2239170
predict(Pfit, newdata = data.frame(Year = newx))
## 1 2 3 4 5
## -2248121 -2245883 -2243645 -2241408 -2239170
Cfit <- lm(Cores ~ Year, data = TOP500_202011)
coef(Cfit)
## (Intercept) Year
## 25708866.21 -12666.51
Cfit2 <- lm(Cores ~ I(Year - mean(Year)), data = TOP500_202011)
coef(Cfit2)
## (Intercept) I(Year - mean(Year))
## 144932.25 -12666.51
Cfit3 <- lm(Cores ~ I(Year * 10), data = TOP500_202011)
coef(Cfit3)
## (Intercept) I(Year * 10)
## 25708866.213 -1266.651
newx <- c(2, 4, 6, 8, 10)
coef(Cfit)[1] + coef(Cfit)[2] * newx
## [1] 25683533 25658200 25632867 25607534 25582201
predict(Cfit, newdata = data.frame(Year = newx))
## 1 2 3 4 5
## 25683533 25658200 25632867 25607534 25582201
By the year of 2024 there should be a decrease in performance within transistor counts.
Based on observation and performance analysis, we can assume with confidence that Moore’s law has grown exponentially over a period of time, but may soon end by the year of 2024 if the trend continue as is. Many predictors have concluded that Moore’s law is dead. Prediction are increasing as lithography advances stall and process technology approaches atomic limitations. However, companies such as AMD and Apples are innovating their processor using their own silicon [6]. These specialize chips can boast a range of improvements. In the end, predictors aren’t wrong with Gorden Moore himself stating “It cannot continue forever” [10]. As long as the trend continue as shown on the graphs above, Moore’s law will certainly die within the next decade unless, algorithms and software can match the expontential growth of Moore’s Law.
[1] https://en.wikipedia.org/wiki/Transistor_count
[2] https://top500.org/lists/top500/
[3] https://www.sciencehistory.org/sites/default/files/understanding_moores_law.pdf
[4] https://www.osti.gov/servlets/purl/1766372
[5] https://www.pcgamesn.com/amd/zen-3-cpu-tsmc-7nm
[8] https://ggplot2.tidyverse.org/reference/geom_smooth.html
[9] https://everythingcomputerscience.com/books/regmods.pdf
[10] https://www.eetimes.com/moores-law-dead-by-2022-expert-says/