1 Introduction

A critical product performance characteristic that our semiconductor device customers look for is the amount of resistance within an acceptable range based on design specifications. In our process engineering guidebook, we always seek to minimize defects by controling Electrical Resistance, which is a Key Process Output Variable (KPOV) of our wafer fabrication process.

In our approach we will use what statisticians have come to adopt as the PGA method, which stands for Practical, Graphical and Analytical reviews.

Practically, we know based on past empirical evidence that resistance is highly dependent on how well we deposit metal films on wafers. In any of our production samples, if this dependency is not validated to be strong, we question the reliability of our process. The general purpose of this project is to validate this ideal relationship between the input, measured in terms of variations in film thickness and the output, measured in terms of electrical resistance, given the data set in the following URL: https://raw.githubusercontent.com/tmatis12/datafiles/refs/heads/main/semiconductor_SLR_dataset.csv

We use one of the common forms of depositing metal films on wafers called Sputter Deposition, which uses physical vapor. This process is very precise in thickness control. We have made the investment on such a process to improve our first time through quality and to reduce scape, so we obtain a double-gain in quality and cost. Therefore, validating the supposed strong relationship between electrical resistance and the proper film thickness will help us reduce material expenses, improve the quality of our semiconductor devices, and gain market share.

2 Exploratory Data Analysis (EDA)

First, using R, we extract the data by reading the csv file which represent100 pairs of film thicknesses in nm and Electrical Resistances in ohm.

#download csv file
#install.packages("ggplot2")
library(ggplot2)
df<-read.csv("https://raw.githubusercontent.com/tmatis12/datafiles/refs/heads/main/semiconductor_SLR_dataset.csv")
colnames(df)<-c("Thickness","Electrical_Resistance")

Next, we leverage the graphical steps by the use of three common graphs called histogram, box plot and scatterplot to start graphically exploring the data.

First we look at the histogram of Electrical Resistance, which is the output we are looking to analyze.

hist(df$Electrical_Resistance,
     main= "Electrical_Resistance",
     xlab = "Electrical_Resistance in Ohm",
     )

The graph obtained above does not assure us for normally distributed values of electrical resistance. It appears more like a log-normal distribution.

Now let’s take a look at the box plot.

boxplot(df$Electrical_Resistance)

The box plot confirms the output of the histogram and we can identify the presence of outliers on the high end of resistance.

Now let’s take a look at the scatterplot.

ggplot(df,aes(x=Thickness,y=Electrical_Resistance))+
  geom_point(color = "blue", size = 3)

The scatter plot suggests a relationship between film thickness and electrical resistance.

All the graphical displays above so far advocate for a progression towards the next step of the PGA, which is Analytical review.

3 Regression Model Fitting and Assumption Checking

The hypothesis of the existence of a relationship between film thickness and resistance we are currently testing relies on the study of two variables, therefore Simple Linear Regression will suffice to begin as the test model.

3.1 Modeling the Initial Simple Linear Regression

We start by regressing electrical resistance on thickness. Although graphical displays will be used, this is a totally analytical procedure in the PGA scheme.

We start the analysis by using the executable R script cell below:

LinRegression<-lm(Electrical_Resistance~Thickness,data=df)
summary(LinRegression)
## 
## Call:
## lm(formula = Electrical_Resistance ~ Thickness, data = df)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.27640 -0.75508 -0.08631  0.70422  2.69671 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 4.870489   0.356848   13.65   <2e-16 ***
## Thickness   0.122954   0.003518   34.95   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.041 on 98 degrees of freedom
## Multiple R-squared:  0.9257, Adjusted R-squared:  0.925 
## F-statistic:  1221 on 1 and 98 DF,  p-value: < 2.2e-16

The summary table above provides us with the first parameters to start analyzing the hypothesized Simple Linear Regression in the following form:

\[ y = B_0 + B_1x + \varepsilon \]

With y = electrical resistance in ohm

and x = film thickness in nm

The first parameter of interest before we start looking at anything else is R-squared known as the coefficient of determination. It is an estimate of the proportion of variation in y (electrical resistance) explained by the variation in the regressor x (film thickness). The table above indicates that value to be 92.6%. It is a strong indication that there’s correlation between the 2 variables. It’s often said in statistical analysis when we arrive at this point that correlation does not necessarily mean causality. All we can infer at this point is that the 2 are correlated but not necessarily that that the amount of film thickness deposited in the wafers is causing the results obtained for electrical resistance in this dataset. For this we need further digging.

\(B_{o} = 4.87\)

\(B_{1} = 0.123\)

Therefore, our Simple Linear Regression usinf least squares estimation is :

\[ y = 4.87 + 0.123x + \varepsilon \]

This is a hypothesized model with the following tests:

\(H{o}: B_{1}= 0\)

\(H{a}: B_{1}\neq 0\)

With a p-value from the table above < 2.2e-16, way below the 5% significance level, we fail to accept the Null \(H{o}: B_{1}= 0\) and conclude that there’s significance in a relationship between film thickness and electrical resistance. But we must not rush to conclusion that the model of significance is:

\[ y = 4.87 + 0.123x + \varepsilon \]

3.2 Model Adequacy Review

There are four assumptions in general which we need to verify to test the adequacy of any regression model. All of these tests involve the use of graphic displays of residuals which are defined as the differences between the observed values and the corresponding fitted values, estimated from the model. They are also known as error terms of the model and denoted as \(\varepsilon_{i}\).

  1. The assumption of random distribution of residuals and constant variance. The graph to check for this is the Residuals vs Fitted.
  2. The assumption of normal distribution of the residuals of the model. The graph to check for this is the Q-Q Residuals.
  3. The assumption of homoscedasticity or constant variance meaning that the variances of the error terms are constant across all values of film thickness which is the predictor variable. The graph to check for this is the Scale-Location.
  4. The assumption of the absence of outliers. The Residuals vs Leverage graph checks for this.
plot(LinRegression)

In almost all of these graphs we can detect abnormalities.

When a model is inadequate, we use transformation and re-test it.

4 The Development of a New Model using Data Transformation

To assess the proper transformation, we use the executable R script cell below:

library(MASS)
boxcox(df$Electrical_Resistance~df$Thickness)

We obtained the graph above which gives 95% confidence that the following value \(\lambda = 1\) will give us the right transformation.

We then transformed the original dataset using the executable R script cell below:

df$Electrical_Resistance<-(df$Electrical_Resistance)^-1
head(df)
##   Thickness Electrical_Resistance
## 1     87.45            0.06614632
## 2    145.07            0.04237109
## 3    123.20            0.05024116
## 4    109.87            0.06210023
## 5     65.60            0.07751337
## 6     65.60            0.07531255

We reconstructed a new linear model with the transformed dataset and pulled the new summary table using the executable R script cell below:

LinRegressionTransf<-lm(Electrical_Resistance~Thickness,data=df)
summary(LinRegressionTransf)
## 
## Call:
## lm(formula = Electrical_Resistance ~ Thickness, data = df)
## 
## Residuals:
##        Min         1Q     Median         3Q        Max 
## -0.0082499 -0.0015050 -0.0000331  0.0016006  0.0072755 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.041e-01  9.771e-04  106.56   <2e-16 ***
## Thickness   -4.300e-04  9.633e-06  -44.63   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.002851 on 98 degrees of freedom
## Multiple R-squared:  0.9531, Adjusted R-squared:  0.9526 
## F-statistic:  1992 on 1 and 98 DF,  p-value: < 2.2e-16

4.1 Interpretations of the Table:

Our coefficient of determination R-squared has increased and is now 95%, suggesting an even stronger correlation between the 2 variables the regressor variable x (film thickness) and y (electrical resistance). As before, we need to check our model adequacy before making any inference. We use the plot of residuals again by executing the R script cell below:

plot(LinRegressionTransf)

We noticed that 2 of the 4 plots have improved significantly: The Residuals vs Fitted and the Residual vs Leverage plots. The first one indicates that the assumption of randomness of the distribution of the residuals is now well validated and that constant variance of the error terms is consistent across all values. The second one shows that outliers have less of an effect now.

These residual plots seem to support the data transformation and ascertain the use of the new least square estimations as a better representation of the relationship model between film thickness and electrical resistance.

4.2 Our New Model

The new estimates from the table are now:

\(B_{o} = 0.1041\)

\(B_{1} = -0.00043\)

Therefore, our new Simple Linear Regression model using least squares estimation is :

\[ y = 0.1041 - 0.00043 + \varepsilon \]

5 Key Questions Answers

1&2. How well does film thickness predict electrical resistance and how significant is their relationship?

fter we transformed our dataset and obtained the new simple linear regression model with a coefficient of determination R-squared increased and now at 95% and a p-value remaining the same at < 2.2e-16 (way below the 5% significance level and leading us to fail to accept the Null hypothesis that \(H{o}: B_{1}= 0\) ), we can conclude that there’s strong significance in a relationship between film thickness and electrical resistance. With the model adequacy validated, we can say that film thickness predict very well electrical resistance.

Last Question on Point Estimates

In semiconductor manufacturing, common target film thickness values depend on the specific application and material being deposited. For conductive metal layers, typical thickness values range between 50 nm and 150 nm. A frequently monitored value in many deposition processes is 100 nm, as it serves as a midpoint in the standard process range and is often used as a reference for quality control.

For this analysis, calculate a 95% confidence interval (CI) and prediction interval (PI) for resistance at 100 nm of thickness. This will help assess the expected variation in resistance at this critical level, allowing engineers to determine whether the process is stable and meets design specifications. Additionally, generate a scatterplot of the data with the fitted regression line, including both confidence and prediction intervals.

datapoint<-data.frame(Thickness=100)
CI_100<-predict(LinRegressionTransf, newdata = datapoint, interval = "confidence", level = 0.95)
CI_100
##          fit        lwr        upr
## 1 0.06112794 0.06055924 0.06169665

The resistance at 95% with thickness of 100 nm = 0.06112794

PI_100<-predict(LinRegressionTransf,newdata = datapoint, interval = "prediction", level = 0.95)
PI_100
##          fit        lwr        upr
## 1 0.06112794 0.05544105 0.06681484

The resistance at 95% Prediction Interval with thickness of 100 nm = 0.06112794

ggplot(df, aes(x = Thickness, y = Electrical_Resistance)) +
  geom_point(color = "blue") +
  geom_smooth(method = "lm", se = TRUE, fill = "yellow", alpha = 0.7) +
  geom_vline(xintercept = 100, linetype = "dashed", color = "red") +
  labs(title = "Resistance vs Thickness with CI and PI",
       x = "Film Thickness",
       y = "Electrical Resistance") +
  theme_minimal()

6 Conclusion:

Using the step by step process of Linear Regression proved to be very critical for our company to develop the right regression model between film thickness on wafers and the proper resistance. This does not just help us achieve quality, it will also improve the cost of materials by applying the right amount of film thickness to meet design specifications and also eliminate scrap.