1 Background

A sputter deposition process is used to deposit thin metallic films on silicon wafers. These films act as a conductive layer. Process input parameters, such as deposition rate, chamber pressure, or plasma intensity can change throughout manufacturing which result in variation in the final film thickness. This variation can have an impact on the electrical resistance of the final component, which is a key output of the process.

1.1 Objective

To use linear regression to predict the electrical resistance of a semiconductor based on the film coating thickness that is applied to the silicon wafer.

2 Methods

2.1 Exploratory Data Analysis

Histograms and Box Plots were used to examine the distribution of measured electrical resistance across the sample.

An initial scatter plot of electrical resistance versus film thickness was used to explore potential relationships between the two parameters.

2.2 Regression Modeling

Simple linear regression modeling using least squares estimation was used to regress electrical resistance on film thickness.

\[ y = \beta _{0} + \beta _{1}x +\varepsilon \] Model adequacy was evaluated by evaluating the variance with respect to the independent variable and the normality of the residuals. The models were also evaluated by the proportion of variance explained by the independent variable through the use of R-squared.

\[ R^{2}=1-\frac{SSR}{SST} \]

2.3 Prediction

A 95% confidence interval on the mean was calculated with the following formula.

\[(\hat{\beta}_{0}+\hat{\beta}_{1}x_{j})\pm t_{1-\alpha /2,n-2}\left ( \sqrt{MSE\left [ \frac{1}{n}+\frac{(x_{j}-\bar{x})^{2}}{\sum (x_{j}-\bar{x})^{2}} \right]} \right )\]

A prediction interval was calculated to account for variation due to point spread form the mean by the following formula.

\[(\hat{\beta}_{0}+\hat{\beta}_{1}x_{j})\pm t_{1-\alpha /2,n-2}\left ( \sqrt{MSE\left [1+ \frac{1}{n}+\frac{(x_{j}-\bar{x})^{2}}{\sum (x_{j}-\bar{x})^{2}} \right]} \right )\]

3 Exploratory Data Analysis

The data set was read from https://raw.githubusercontent.com/tmatis12/datafiles/refs/heads/main/semiconductor_SLR_dataset.csv

semi_df <- read.csv('https://raw.githubusercontent.com/tmatis12/datafiles/refs/heads/main/semiconductor_SLR_dataset.csv')

A histogram and box plot of the Electrical Resistance data set reveals a right skewed distribution. This means the data is not normally distributed.

hist(semi_df$Electrical_Resistance_mOhm,
     main="Frequency of Electical Resistance", xlab="Electrical Resistance (mOhm)") #skewed data, not normal

boxplot(semi_df$Electrical_Resistance_mOhm, main="Boxplot of Electrical Resistance",
        ylab="Electrical Resistance (mOhm)")

A scatter plot of Electrical Resistance versus Film Thickness reveals a positive correlation with what appears to be a consistent spread of data over the range of the independent variable.

plot(semi_df$Film_Thickness_nm, semi_df$Electrical_Resistance_mOhm, main = "Electrical Resistance vs Film Thickness",
     xlab= "Film Thickness (nm)", ylab="Electrical Resistance (mOhm)")

4 Regression

Multiple regression models were evaluated to determine the best relationship to use to predict the response, Electrical Resistance, with Film Thickness as a predictor.

4.1 Simple Linear Model

The first model simply regressed Electrical Resistance on Film Thickness with simple linear regression. This can be represented by the formula \(Electrical Resistance = \beta _{0}+\beta _{1}(Film Thickness)+\varepsilon\)

model <- lm(semi_df$Electrical_Resistance_mOhm~semi_df$Film_Thickness_nm)
plot(semi_df$Film_Thickness_nm, semi_df$Electrical_Resistance_mOhm,
     main="Regression of Electrical Resistance on Film Thickness", xlab="Film Thickness (nm)",
     ylab="Electrical Resistance (mOhm)")
abline(model, col="red")
legend("bottomright", legend=c( "Regression Line"), fill=c("red"))

plot(model)

summary(model)
## 
## Call:
## lm(formula = semi_df$Electrical_Resistance_mOhm ~ semi_df$Film_Thickness_nm)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.27640 -0.75508 -0.08631  0.70422  2.69671 
## 
## Coefficients:
##                           Estimate Std. Error t value Pr(>|t|)    
## (Intercept)               4.870489   0.356848   13.65   <2e-16 ***
## semi_df$Film_Thickness_nm 0.122954   0.003518   34.95   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.041 on 98 degrees of freedom
## Multiple R-squared:  0.9257, Adjusted R-squared:  0.925 
## F-statistic:  1221 on 1 and 98 DF,  p-value: < 2.2e-16

The plot of the regression line on the observed values of Electrical Resistance vs Film Thickness shows what visually appears to be a good fit to the data. The residuals vs fitted values shows a weak relationship between the residuals and values which would imply that the data might not have a constant variance. However, the model summary shows that the regression coefficients are statistically significant and the model explains 92.6 percent of the variation from the data set according to the R-squared value.

4.2 Transformed Linear Model

While the initial model had an R-squared value of 0.926 and was statistically significant according to the p value for the regression coefficients, a Box-Cox transformation was executed to attempt to improve the model.

#install.packages("MASS")
library("MASS")
boxcox(model)

From the output of the Box-Cox function, a \(\lambda\) of -0.8 was selected to transform the data.

semi_df$transformed <- (semi_df$Electrical_Resistance_mOhm)^-.8
hist(semi_df$transformed, main="Histogram of Transformed Electical Resistance",
     xlab="Transformed Electrical Resistance") #better normality

model_trans <-lm(semi_df$transformed~semi_df$Film_Thickness_nm)
plot(model_trans) #variance improved, more constant

summary(model_trans)
## 
## Call:
## lm(formula = semi_df$transformed ~ semi_df$Film_Thickness_nm)
## 
## Residuals:
##        Min         1Q     Median         3Q        Max 
## -0.0113638 -0.0019719 -0.0002117  0.0025504  0.0095043 
## 
## Coefficients:
##                             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                1.668e-01  1.363e-03  122.39   <2e-16 ***
## semi_df$Film_Thickness_nm -6.026e-04  1.343e-05  -44.86   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.003976 on 98 degrees of freedom
## Multiple R-squared:  0.9536, Adjusted R-squared:  0.9531 
## F-statistic:  2012 on 1 and 98 DF,  p-value: < 2.2e-16

The transformed model did result in a more centered distribution of the response, however the tails do appear heavy and this is also evident in the Q-Q Residuals plot. The model now explains a higher proportion of the variance in the response variable with an R-squared value of 0.954.

4.3 Model using Quadratic Relationship

The scatter plot of Electrical Resistance versus Film Thickness demonstrated an apparent quadratic relationship between the two parameters. This can be represented by the formula \(Electrical Resistance = \beta _{0}+\beta_{1}(Film Thickness)+\beta _{2}(Film Thickness)^{2}+\varepsilon\)

#regression with quadratic relationship between x and y
model_sq <- lm(Electrical_Resistance_mOhm~poly(Film_Thickness_nm,2), data=semi_df)
plot(model_sq)

summary(model_sq)
## 
## Call:
## lm(formula = Electrical_Resistance_mOhm ~ poly(Film_Thickness_nm, 
##     2), data = semi_df)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.17338 -0.53515 -0.02541  0.36803  2.18576 
## 
## Coefficients:
##                             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                 16.79921    0.07677 218.827  < 2e-16 ***
## poly(Film_Thickness_nm, 2)1 36.39389    0.76769  47.407  < 2e-16 ***
## poly(Film_Thickness_nm, 2)2  7.00731    0.76769   9.128 1.03e-14 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.7677 on 97 degrees of freedom
## Multiple R-squared:   0.96,  Adjusted R-squared:  0.9592 
## F-statistic:  1165 on 2 and 97 DF,  p-value: < 2.2e-16
  #plot scatter with lm line
plot(semi_df$Film_Thickness_nm, semi_df$Electrical_Resistance_mOhm, main="Electrical Resistance vs Film Thickness",
     xlab= "Film Thickness (nm)", ylab="Electrical Resistance (mOhm)")
lines(sort(semi_df$Film_Thickness_nm),
      fitted(model_sq)[order(semi_df$Film_Thickness_nm)], col="red")

The quadratic relationship resulted in an improved constant variance simliar to the model transformed with a Box Cox derived lambda value. The quadratic model also explained the higher proportion of variance in the response data with an R-squared value of 0.96.

5 Results and Conclusion

In conclusion Film Thickness appears to be an excellent predictor of Electrical Resistance with 96% percent of the variation in Electrical Resistance being explained by a linear regression model using a quadratic relationship between Film Thickness and Electrical resistance. The significance of this relationship is significant with a p value of \(2.2*10^{-16}\).

Using this model to predict Electrical Resistance based on a Film Thickness of 100 nm, a common nominal value for this process, a value of 16.33 mOhm is returned. The table below displays both the confidence and prediction intervals for the expected value of Electrical Resistance.

Parameter Value
Expected Value 16.33
Confidence Interval 16.10-16.57
Prediction Interval 14.79-17.87

If the prediction interval of 14.79-17.87 mohm is included in the range of acceptable electrical resistance per design requirements, than the process is capable if the film thickness can be reliably held at 100nm. If the process will regularly yield a range of thickness values, that range can be used to predict a range of electrical resistance values expected from the process. That range can be compared to design requirements to assess the capability of the process.

See the graph below for the confidence and prediction intervals of this model for the full range of the data sampled.

new_df <-data.frame(c(100))
colnames(new_df)[1]<-"Film_Thickness_nm"
conf <-predict(model_sq, interval="confidence")
conf_df <- data.frame(conf[,1],conf[,2],conf[,3])
colnames(conf_df)<- c("fit", "lower", "upper")
pred <-predict(model_sq, interval="prediction")
pred_df <- data.frame(pred[,1],pred[,2],pred[,3])
colnames(pred_df)<- c("fit", "lower", "upper")
conf_value <-predict(model_sq, newdata=new_df, interval="confidence")
pred_value <-predict(model_sq, newdata=new_df, interval="prediction")


plot(semi_df$Film_Thickness_nm, semi_df$Electrical_Resistance_mOhm, main = "Regression with Confidence Interval",
     xlab= "Film Thickness (nm)", ylab="Electrical Resistance (mOhm)")
lines(sort(semi_df$Film_Thickness_nm),
      fitted(model_sq)[order(semi_df$Film_Thickness_nm)], col="red")
lines(sort(semi_df$Film_Thickness_nm),
      (conf_df$upper)[order(semi_df$Film_Thickness_nm)], col="blue")
lines(sort(semi_df$Film_Thickness_nm),
      (conf_df$lower)[order(semi_df$Film_Thickness_nm)], col="blue")
lines(sort(semi_df$Film_Thickness_nm),
      (pred_df$upper)[order(semi_df$Film_Thickness_nm)], col="purple")
lines(sort(semi_df$Film_Thickness_nm),
      (pred_df$lower)[order(semi_df$Film_Thickness_nm)], col="purple")



legend("bottomright", legend=c("Regression Line", "Confidence Interval", "Prediction Interval"), fill=c("red", "blue", "purple"))

6 Complete R Code

semi_df <- read.csv('https://raw.githubusercontent.com/tmatis12/datafiles/refs/heads/main/semiconductor_SLR_dataset.csv')
summary(semi_df)
str(semi_df)
plot(semi_df$Film_Thickness_nm, semi_df$Electrical_Resistance_mOhm, main = "Electrical Resistance vs Film Thickness",
     xlab= "Film Thickness (nm)", ylab="Electrical Resistance (mOhm)")
hist(semi_df$Electrical_Resistance_mOhm,
     main="Frequency of Electical Resistance", xlab="Electrical Resistance (mOhm)") #skewed data, not normal
boxplot(semi_df$Electrical_Resistance_mOhm, main="Boxplot of Electrical Resistance",
        ylab="Electrical Resistance (mOhm)")


#simple linear model
model <- lm(semi_df$Electrical_Resistance_mOhm~semi_df$Film_Thickness_nm)
plot(model)  # not constant variance, but not horrible

  #plot scatter with lm regression line
plot(semi_df$Film_Thickness_nm, semi_df$Electrical_Resistance_mOhm,
     main="Regression of Electrical Resistance on Film Thickness", xlab="Film Thickness (nm)",
     ylab="Electrical Resistance (mOhm)")
abline(model, col="red")
legend("bottomright", legend=c( "Regression Line"), fill=c("red"))
summary(model)

#transform linear regression line model
#install.packages("MASS")
library("MASS")
boxcox(model)
  #lambda roughly -.8
semi_df$transformed <- (semi_df$Electrical_Resistance_mOhm)^-.8
hist(semi_df$transformed) #better normality
model_trans <-lm(semi_df$transformed~semi_df$Film_Thickness_nm)
plot(model_trans) #variance improved, more constant
summary(model_trans)

#regression with quadratic relationship between x and y
model_sq <- lm(Electrical_Resistance_mOhm~poly(Film_Thickness_nm,2), data=semi_df)
plot(model_sq)
summary(model_sq)
  #plot scatter with lm line
plot(semi_df$Film_Thickness_nm, semi_df$Electrical_Resistance_mOhm, main="Electrical Resistance vs Film Thickness",
     xlab= "Film Thickness (nm)", ylab="Electrical Resistance (mOhm)")
lines(sort(semi_df$Film_Thickness_nm),
      fitted(model_sq)[order(semi_df$Film_Thickness_nm)], col="red")


# predict using model_sq
new_df <-data.frame(c(100))
colnames(new_df)[1]<-"Film_Thickness_nm"
conf <-predict(model_sq, interval="confidence")
conf_df <- data.frame(conf[,1],conf[,2],conf[,3])
colnames(conf_df)<- c("fit", "lower", "upper")
pred <-predict(model_sq, interval="prediction")
pred_df <- data.frame(pred[,1],pred[,2],pred[,3])
colnames(pred_df)<- c("fit", "lower", "upper")
conf_value <-predict(model_sq, newdata=new_df, interval="confidence")
pred_value <-predict(model_sq, newdata=new_df, interval="prediction")


plot(semi_df$Film_Thickness_nm, semi_df$Electrical_Resistance_mOhm, main = "Regression with Confidence Interval",
     xlab= "Film Thickness (nm)", ylab="Electrical Resistance (mOhm)")
lines(sort(semi_df$Film_Thickness_nm),
      fitted(model_sq)[order(semi_df$Film_Thickness_nm)], col="red")
lines(sort(semi_df$Film_Thickness_nm),
      (conf_df$upper)[order(semi_df$Film_Thickness_nm)], col="blue")
lines(sort(semi_df$Film_Thickness_nm),
      (conf_df$lower)[order(semi_df$Film_Thickness_nm)], col="blue")
lines(sort(semi_df$Film_Thickness_nm),
      (pred_df$upper)[order(semi_df$Film_Thickness_nm)], col="purple")
lines(sort(semi_df$Film_Thickness_nm),
      (pred_df$lower)[order(semi_df$Film_Thickness_nm)], col="purple")
legend("bottomright", legend=c("Regression Line", "Confidence Interval", "Prediction Interval"), fill=c("red", "blue", "purple"))

#transformation of quadratic data
boxcox(model_sq) #lambda roughly zero
semi_df$sq_transformed <- log(semi_df$Electrical_Resistance_mOhm)
model_sqln <-lm(sq_transformed~Film_Thickness_nm, data=semi_df)
summary(model_sqln) #model performance decreases
plot(model_sqln)