Correlation of CPU Price and Performance

Lorenzo, de Leon, Oba, Calpito, Casenas

July 21, 2022

Introduction


          Computers have become very significant nowadays especially when much of the learning and working set-up transitioned to the online space due to the COVID-19 pandemic. The demand for computers continues to expand and several types of computers with varying CPU performance continue to exist in the market at varied price markings. CPU stands for central processing unit, which serves as the computer’s control center and is sometimes referred to as the main processor that operates the computer.\(^{[1]}\) The price is the amount of money that must be paid in exchange for a product.\(^{[2]}\) The CPU performance and pricing of a computer are two factors that are mostly being considered in acquiring a computer unit.\(^{[3]}\)
          Multiple studies including a 1991 study of Dave and Fitzpatrick sought to test Grosch’s law from another study of Knight (1996) defining computer power as “a function of memory, computer time, and input/output time.” This is then used to obtain a single value for the system’s power and determining the relationship between computer power and price. Several studies followed through to model other factors that affect price which proved that some factors like memory size(RAM) greatly affects computer prices.\(^{[4]}\) In this study, the group also sought to determine and model the relationship between computer performance and price with the latest data available. Also this research was conducted to provide consumers up-to-date useful statistical information about computer prices and performance. This study aims to aid the public in the buying process and in deciding to purchase a computer unit.

Methodology


          The data set used in this study was gathered from PassMark’s collection of user benchmarks.\(^{[5]}\) The researchers decided to employ Pearson correlation to determine the significance of the relationship between the predictor variable (the price of the CPU) and the response variable CPU Mark (PassMark’s Measure of CPU performance). Subsequently, the study will be able to produce a linear model and equation that can be used to predict the price of a CPU depending on its performance, which is usually measured through benchmarks. Finally, through residual analysis, the researchers should be able to determine the accuracy of the linear model. The study, therefore, assumes a null hypothesis which states that the linear model obtained does not account for the variation of the CPU prices.

Results


          The price and CPU mark (measured by PassMark) of 1,967 different CPUs were analyzed in this study. Through simple linear regression, the following regression model is obtained:
cpuData<-read.csv("CPU_benchmark_v4.csv")
model<-lm(cpuData$price~cpuData$cpuMark)
plot(x = cpuData$cpuMark, y = cpuData$price, main="CPU Mark vs Price", xlab="cpu mark", ylab="price")
abline(model, col=2, lwd=3)

\(Price = -26.0569634 + 0.0580751*(CPU Mark)\)

          The Pearson correlation coefficient and r-squared are:
## [1] "r = 0.692600"
## [1] "r-squared = 0.479694"
          This model of correlation will be tested with significance level α = 0.05 (95% confidence interval). The test goes as follows:
  • Null Hypothesis \(H_0\): \(β_0=0\)
  • Alternative Hypothesis \(H_1\): \(β_0\neq0\)
  • Test Statistic:
  • \(F=\frac{\frac{{\rm RSS}_R-{\rm RSS}_{UR}}{\rho_{fit}-\rho_{mean}}}{\frac{{\rm RSS}_{UR}}{n-\rho_{fit}}}\)
  • We reject \(H_0\) if \(F>F_{0.025,1965}\)

    F=qf(0.025, 1, 1965, lower.tail = FALSE)
    sprintf('F_{0.025,1965} = %f',F)
    ## [1] "F_{0.025,1965} = 5.031595"
    Therefore, we reject \(H_0\) if \(F>5.031595\).

Calculation:

## 
## Call:
## lm(formula = cpuData$price ~ cpuData$cpuMark)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2007.0  -183.7   -20.2    73.7  6714.3 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     -26.056963  18.400896  -1.416    0.157    
## cpuData$cpuMark   0.058075   0.001364  42.563   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 654.7 on 1965 degrees of freedom
## Multiple R-squared:  0.4797, Adjusted R-squared:  0.4794 
## F-statistic:  1812 on 1 and 1965 DF,  p-value: < 2.2e-16
          The F-statistic is calculated to be 1812 which is far greater than 5.032. Therefore, we reject \(H_0\). We can also notice a p-value of \(2x10^{-16}\) for cpuMark which is way less than 0.025.

Residual Analysis

library(ggfortify)
autoplot(object = model) + theme_classic()

          Looking at the Residuals vs Fitted plot, we see a very slight upward concavity on the line within the 1000 to 3000 values of x. In the normal Q-Q plot, a lot of residuals are seen to deviate away from the line especially on the far right.

Discussion


          Upon analysis of data, we arrived at the linear model: \(Price = -26.0569634 + 0.0580751*CPU Mark\) to predict the price of a CPU, given a measure of its performance. It has a Pearson correlation coefficient of 0.6925996 and an r-squared of 0.4796942. This indicates a low to moderate positive correlation between CPU mark and price. We can say that around half (~48%) of the variation in price can be explained by our model.
          Using F-test, we rejected the null hypothesis (that the linear model obtained doesn’t account for the variation of the price) with strong evidence. This proves that the correlation found between CPU mark and price in this analysis is statistically significant. Along with this, we have a statistically significant p-value of CPU Mark, which means it will give a reliable prediction of CPU price.
          Through residual analysis, we were able to point out that the regression model we obtained predicts a slightly higher price at around 10000 to 30000 CPU marks. We also observed that the data became less normally distributed as the CPU mark increased, implying that CPU prices increased faster than the model predicted.

Conclusion


          The data set is able to show the price, performance, and general information of CPUs. The study showed a positive correlation between CPU price and performance. The analysis was able to exhibit this relationship using a linear model. This can be useful for those looking into purchasing computers or CPUs, since it allows them to determine which CPU will offer the most performance at the price point of their choosing and vice versa.

References


[1]Arm. n.d. “Central Processing Unit.” Accessed July 21, 2022. https://www.arm.com/glossary/cpu

[2]Britannica. n.d. “Price.” Accessed July 21, 2022. https://www.britannica.com/topic/price-economics

[3]McLellan, Charles. 2020. “How to choose the right PC: Everything you need to know about picking the right computer for work.” ZDNet. https://www.zdnet.com/article/how-to-chose-the-right-pc-everything-you-need-to-know-about-picking-the-right-computer-for-work/

[4]Dave, Dinesh S., and Kathy E. Fitzpatrick. 1991. “Price/Performance of Desktop Computers in the U.S. Computer Industry.” Information & Management 20 (3): 161–82. https://doi.org/10.1016/0378-7206(91)90053-5.

[5]“PassMark Software - List of Benchmarked CPUs.” n.d. www.cpubenchmark.net. https://www.cpubenchmark.net/cpu_list.php.