1 Introduction

In semiconductor manufacturing, the thickness of thin films is a critical parameter that affects electrical resistance. This report analyzes the relationship between thickness (independent variable) and electrical resistance (dependent variable) through a simple linear regression model, assessing the stability of the sputtering deposition process.

2 Data Import

2.1 Data Import

Firstly, import the data from the website. Also, in order to simplify the following data analysis, we try to rename the variables.

library(dplyr)
ur1 <- "https://raw.githubusercontent.com/tmatis12/datafiles/refs/heads/main/semiconductor_SLR_dataset.csv"
data <- read.csv(ur1)
data <- data %>%
  rename( x = Film_Thickness_nm, y = Electrical_Resistance_mOhm)
head(data)
##        x      y
## 1  87.45 15.118
## 2 145.07 23.601
## 3 123.20 19.904
## 4 109.87 16.103
## 5  65.60 12.901
## 6  65.60 13.278

3 EDA

In this part, we try to find out whether there are some relationship between thickness and regression. If there are some relationship, try to find out the function between them.

3.1 Distribution of resistance y

Firstly, analysing the distribution of Electrical Resistance data.

#Histogram
hist(data$y,
     main= "Histogram of Electrical_Resistance",
     xlab = "Electrical_Resistance",
     ) 

#Boxplot
boxplot(data$y)

From the histogram and boxplot, we can know that the data is slightly right-skewed, concentrated between 14 and 20. Also, there is no extreme outliers, indicating a fairly consistent distribution.

3.2 Scatterplot of resistance ((Y)) versus thickness ((X))

We try to use exploratory data analysis (EDA) to summarize the dataset, primarily detecting whether there is a relationship between resistance and thickness.

ggplot(data, aes(x, y)) +
  geom_point(color = "darkgreen", alpha = 0.7) +
  geom_smooth(method = "lm", se = FALSE, color = "red") +
  labs(title = "Scatterplot of resistance ((Y)) versus thickness ((X))", x = "Thickness (nm)", y = "Resistance (Ω)")

From this scatterplot, we find that resistance and thickness approximately follow the linear regression. Also, as thickness increases, resistance increases.

4 Regression Model Fitting

4.1 Least Squares Estimation

After understand there is a linear regression between resistance and thickness, we try to find the simple linear model between them. Firstly, we assume the linear regression model is : \[ Y = \beta _0 + \beta _1 X + \epsilon \]And then, we use least squares to estimate \[\beta _0\] and \[\beta _1\]Also, in order to evaluate the significance of the relationship between variables, we calculate R-squared and P-value.

model <- lm(y ~ x, data = data)
summary(model)
## 
## Call:
## lm(formula = y ~ x, data = data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.27640 -0.75508 -0.08631  0.70422  2.69671 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 4.870489   0.356848   13.65   <2e-16 ***
## x           0.122954   0.003518   34.95   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.041 on 98 degrees of freedom
## Multiple R-squared:  0.9257, Adjusted R-squared:  0.925 
## F-statistic:  1221 on 1 and 98 DF,  p-value: < 2.2e-16

From the outcome, we can know that \[\beta _0 = 4.8704887\] and \[\beta _1 = 0.1229536\]So the linear regression function of resistance and thickness is: \[ Y = 4.8704887 + 0.1229536 X \]Which means while thickness = 0 , resistance = 4.8704887. Also, for every 1nm increase in thickness, the resistance increases by 0.1229536 Ω .

For the evaluation, we can find that the R-squared = 0.926, meaning that the thickness explains 92.6% of the variance in regression, and the model fits well. In another hand, P-value less than 0.001, which means under the assumption that thickness has no effect on resistance, the probability of observing the current data (or more extreme data) is less than 0.1%. This probability is extremely low, therefore rejecting the null hypothesis and considering thickness as a significant predictor of resistance.

5 Assumption Checking

After obtaining the linear regression model, we try to verify whether the residuals follow a normal distribution, ensure the accuracy of regression inference.

plot(model)

From residual vs fitted plot, it’s not randomly scattered. Therefore, the variance is not constant. From the normal QQ plot, we cant see a straight line, then the data is not normally distributed. So our conclusion is that the data is not adequate for linear models. So we try to use transformation of the data to improve the validity of the model.

6 MODEL TRANSFORMATION

Try to identify another equation of the transformation :\[ Y^\lambda = \beta _0 + \beta _1 X + \epsilon \]

Now, find the optimal Box Cox transformation parameter λ for data transformation, making the model more in line with the assumption of normal distribution and improving the effectiveness of regression analysis.

library(MASS)
b <- boxcox(model)

#Find the value of best lambda 
best_lambda <- b$x[which.max(b$y)]
print(best_lambda)
## [1] -0.8282828

So we can know that the best value of lambda is: \[\lambda = −0.8282828\]

The transformation equation is: \[y^−0.8282828=4.8705+0.12295x\]

data$y <- (data$y)^best_lambda
head(data)
##        x          y
## 1  87.45 0.10544964
## 2 145.07 0.07291642
## 3 123.20 0.08396726
## 4 109.87 0.10007828
## 5  65.60 0.12025129
## 6  65.60 0.11741633

Using the transformation, try to build a new model.

model2 <- lm(y~x, data)
summary(model2)
## 
## Call:
## lm(formula = y ~ x, data = data)
## 
## Residuals:
##        Min         1Q     Median         3Q        Max 
## -0.0108944 -0.0019298 -0.0002015  0.0024314  0.0091823 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.561e-01  1.303e-03  119.75   <2e-16 ***
## x           -5.763e-04  1.285e-05  -44.85   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.003804 on 98 degrees of freedom
## Multiple R-squared:  0.9535, Adjusted R-squared:  0.9531 
## F-statistic:  2011 on 1 and 98 DF,  p-value: < 2.2e-16

We notice some improvements in the statistical results. The R² value has increased to 0.9535, indicating this model is better. Although the p-value remains the same at 2.2e−16 with Model 1, the diagnostic plots for Model 2 reveal a better adherence to the normal probability plot. Additionally, the variance appears more stable, suggesting an overall improvement in model performance.

plot(model2)

7 Prediction interval

For Model 2, we will compute the 95% confidence interval (CI) and prediction interval (PI) for resistance when the thickness is 100 nm.

new_data <- data.frame(x = 100)
Ci_100 <- predict(model2, newdata = new_data, interval = "confidence", level = 0.95)
# Calculate 95% Prediction Interval
Pi_100 <- predict(model2, newdata = new_data, interval = "prediction", level = 0.95)
Pi_100
##          fit        lwr       upr
## 1 0.09845334 0.09086715 0.1060395

From this part, it is demonstrated the prediction interval is between 0.9086715 and 0.1060395, and the most likely value predicted by the model is 0.09845334.

8 SUMMARY

We conclude that Model 2, represented by the equation\(y = 0.1561 - 0.005763x\), provides a good fit for our data. The results indicate a negative relationship between electrical resistance and film thickness, meaning that as film thickness increases, electrical resistance decreases.

So we can know that, controlling film thickness is crucial for controlling resistance accuracy.