1 Introduction

In semiconductor manufacturing, multiple processes contribute to the successful production of high-quality wafers. One of the more critical steps is Physical Vapor Deposition (PVD), where metals are vaporized within a chamber environment and deposited in thin layers onto a wafer surface. These thin metallic and transition-metal nitride films, commonly referred to as thin films, play a crucial role in semiconductor effectiveness and functionality.

Several PVD processes exist such as thermal evaporation, cobalt deposition, and copper barrier deposition; however, the focus of this analysis is sputter deposition. In sputter dep. ions are bombarded through a metal target inside a vacuum-sealed chamber and deposited onto a wafer, forming thin-uniform films. This method is highly regarded due to its versatility, scalability, and film uniformity.

Thin film uniformity is critical as it may affect electrical resistance, a key performance metric in semiconductor devices. Variations in thickness can lead to inconsistencies in resistance, impacting the functionality and reliability of the semiconductor circuits produced.

This study aims to evaluate if there is a direct linear relationship between film thickness and electrical resistance in the sputter deposition process. A Simple Linear Regression (SLR) model will be used to analyze collected data.

2 Methodology

2.1 Dataset

The data that is being analyzed contains film thickness (nm) and electrical resistance (mOhm) that was measured post sputter process. In terms of a graph or equation consider film thickness as the x-value and electrical resistance as the y-value, this is because we are evaluating if electrical resistance has a dependent relationship with film thickness.

# Load required libraries for data handling and visualization
library(ggplot2)
library(dplyr)

# Read the dataset from the given URL
url <- "https://raw.githubusercontent.com/tmatis12/datafiles/refs/heads/main/semiconductor_SLR_dataset.csv"
data <- read.csv(url)

# Display the first few rows of the dataset
head(data)
##   Film_Thickness_nm Electrical_Resistance_mOhm
## 1             87.45                     15.118
## 2            145.07                     23.601
## 3            123.20                     19.904
## 4            109.87                     16.103
## 5             65.60                     12.901
## 6             65.60                     13.278
# Summary statistics of the dataset
summary(data)
##  Film_Thickness_nm Electrical_Resistance_mOhm
##  Min.   : 50.55    Min.   :11.68             
##  1st Qu.: 69.32    1st Qu.:13.60             
##  Median : 96.42    Median :15.74             
##  Mean   : 97.02    Mean   :16.80             
##  3rd Qu.:123.02    3rd Qu.:19.82             
##  Max.   :148.69    Max.   :25.74
# Check for missing values
sum(is.na(data))
## [1] 0

This snippet above enables for us to collect the data from an online source and load it into R, now ready for us to use. Additionally we are displaying fundamental statistical values such as the mean, min, and max for each of the two kinds of values. This helps us verify that the data is reliant and of high quality which we can begin to use in our analysis.

2.2 Exploratory Data Analysis

Before we could begin to build our SLR model we need to conduct an exploratory data analysis (EDA). Performing an EDA will help us understand the dataset, check for anomalies, and visualize the relationship between film thickness and electrical resistance before performing a SLR model. Visualizing the data set will make it easier for us to analyze the data and see if there are any patterns with our two variables.

The following shows us a histogram and boxplot which can help us analyze the electrical resistance’s distribution. The histogram data seems to be right-skewed which means that most of the resistance values are on the lower end, and we have some larger values on the right. We see that the highest frequency of resistance occurs from 12-16 mOhm, which implies that most of the data falls under this range. The boxplot shows us a median resistance of around 15-16 mOhm which does parallel the peak we saw in the histogram. The boxplot does verify that the histogram is right-skewed, seeing that the interquartile range is between 14-20 mOhm, this does lead for us to want to check what could be causing this in the electrical resistance

In the scatter-plot we do see that as film thickness increases so does the electrical resistance, this would suggest that a positive correlation does exist between the two variables. This is something to note as electrical resistance is typically expected to decrease as thickness increases in conductive materials.

# Histogram of Electrical Resistance
ggplot(data, aes(x = Electrical_Resistance_mOhm)) +
  geom_histogram(binwidth = 0.5, fill = "blue", alpha = 0.7, color = "black") +
  labs(title = "Distribution of Electrical Resistance", x = "Electrical Resistance (mOhm)", y = "Count")

# Boxplot of Electrical Resistance
ggplot(data, aes(y = Electrical_Resistance_mOhm)) +
  geom_boxplot(fill = "red", alpha = 0.6) +
  labs(title = "Boxplot of Electrical Resistance", y = "Electrical Resistance (mOhm)")

# Scatterplot of Thickness vs. Resistance
ggplot(data, aes(x = Film_Thickness_nm, y = Electrical_Resistance_mOhm)) +
  geom_point(color = "blue", alpha = 0.6) +
  geom_smooth(method = "lm", color = "red", se = FALSE) +
  labs(title = "Scatterplot of Film Thickness vs. Electrical Resistance", 
       x = "Film Thickness (nm)", y = "Electrical Resistance (mOhm)")

2.3 Simple Linear Regression

Now that we have completed an EDA we’ll begin to build a simple linear regression (SLR) model. A SLR is a visualization of the relationship between an independent and dependent variable. In our case, as described earlier, film thickness (x-value) is our independent variable and electrical thickness (y-value) as our dependent variable. To clarify this is because we are measuring what direct impact does film thickness have on electrical resistance. The SLR model equation is below

Y=β0​+β1​X+ϵ

where:

  • Y = Electrical Resistance (mOhm)

  • X = Film Thickness (nm)

  • β0 = Intercept (resistance when thickness is zero)

  • β1 = Slope (change in resistance per thickness unit)

  • ϵ = Error term

# Fit a Simple Linear Regression model
model <- lm(Electrical_Resistance_mOhm ~ Film_Thickness_nm, data = data)

# Display regression model summary
summary(model)
## 
## Call:
## lm(formula = Electrical_Resistance_mOhm ~ Film_Thickness_nm, 
##     data = data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.27640 -0.75508 -0.08631  0.70422  2.69671 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       4.870489   0.356848   13.65   <2e-16 ***
## Film_Thickness_nm 0.122954   0.003518   34.95   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.041 on 98 degrees of freedom
## Multiple R-squared:  0.9257, Adjusted R-squared:  0.925 
## F-statistic:  1221 on 1 and 98 DF,  p-value: < 2.2e-16

From our output we can see the following for the relationship between film thickness and electrical resistance:

  • Intercept Value of 4.8705 is the measured electrical resistance when when film thickness is zero

  • Slope value of 0.1223 means that as thickness increases by 1 nm, electrical resistance increases by 0.12 mOhm

  • Our P-value is <2e-16 which is smaller than 0.05, meaning that there is less than a 5% chance the results happened by chance, meaning that the relationship between our two variables is significant

  • The r-squared value is 0.9257, which means that 92.57% of the variation seen in electrical resistance is caused by film thickness and with a value so close to 1 it implies a strong linear relationship

2.4 Model Assumption Checks

In order for our SLR model to be considered valid, we need to perform a residual analysis checking for two key assumptions. These two assumptions are residual normality and homoscedasticity . Residual normality is the assumption that errors or differences are normally distributed and homoscedasticity which is the asumption that variance in a SLR model is constant.

# Extract residuals
residuals <- model$residuals

# Histogram of residuals
ggplot(data.frame(residuals), aes(x = residuals)) +
  geom_histogram(fill = "blue", alpha = 0.7, color = "black", bins = 20) +
  labs(title = "Histogram of Residuals", x = "Residuals", y = "Count")

# Q-Q Plot for normality check
qqnorm(residuals)
qqline(residuals, col = "red", lwd = 2)

# Shapiro-Wilk test for normality (p > 0.05 means residuals are normal)
shapiro.test(residuals)
## 
##  Shapiro-Wilk normality test
## 
## data:  residuals
## W = 0.98927, p-value = 0.6059
# Residuals vs. Fitted plot
ggplot(data, aes(x = fitted(model), y = residuals)) +
  geom_point(color = "blue", alpha = 0.6) +
  geom_hline(yintercept = 0, linetype = "dashed", color = "red") +
  labs(title = "Residuals vs. Fitted Values", x = "Fitted Values", y = "Residuals")

With our Normality and homoscedasticity tests we can see the following:

  • The Histogram Not perfectly bell-shaped since slight deviations are present

  • The Q-Q Plot mostly follows a diagonal path which means residuals are somewhat normal, with slight deviations at extremes

  • The Shapiro-Wilk Test gave us a value of p = 0.6059, since p > 0.05, residuals are approximately normal

  • Looking at the Residuals vs. Fitted Plot, it does create a sort-of funnel shape and appears wider at higher fitted values, suggesting some heteroscedasticity

Since our data can be considered non-normal we need to apply a transformation, in this case we are going to use the BOXCOX transformation to improve the model and refit the regression.

2.5 Box-Cox Transformation and Model Refinement

In performing our model assumption check, we saw that variance was not constant so we will apply the Box-Cox transformation. The Box-Cox transformation is a technique used to transform non-normal data into normal and refit the SLR model. This helps stabilize variance and further validates our model. Essentially we are using it to help determine the optimal transformation parameter of lambda (λ)

#install.packages("MASS")
library(MASS)
#perform the boxcox transformation
boxcox_result <- boxcox(model, lambda = seq(-2, 2, by = 0.1))

# Find the lambda that maximizes log-likelihood
lambda_opt <- boxcox_result$x[which.max(boxcox_result$y)]
print(lambda_opt)  
## [1] -0.8282828
# This gave a best lambda value of (-0.83)
# Box-Cox suggests a lambda of -0.83, close to -1, so we apply transformation

# Since -0.8 didn't fully normalize residuals, trying -0.95
lambda_final <- -0.95  

# Apply the transformation to the response variable
data$Electrical_Resistance_mOhm <- (data$Electrical_Resistance_mOhm)^lambda_final

# Refit the model with transformed response
model <- lm(
Electrical_Resistance_mOhm ~ Film_Thickness_nm, data = data)
# Verify model output
summary(model)
## 
## Call:
## lm(formula = Electrical_Resistance_mOhm ~ Film_Thickness_nm, 
##     data = data)
## 
## Residuals:
##        Min         1Q     Median         3Q        Max 
## -0.0089765 -0.0016543 -0.0000973  0.0018175  0.0078148 
## 
## Coefficients:
##                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)        1.172e-01  1.066e-03  109.98   <2e-16 ***
## Film_Thickness_nm -4.699e-04  1.051e-05  -44.72   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.00311 on 98 degrees of freedom
## Multiple R-squared:  0.9533, Adjusted R-squared:  0.9528 
## F-statistic:  2000 on 1 and 98 DF,  p-value: < 2.2e-16
# Check residual plots
par(mfrow = c(2,2))  # Arrange plots side by side
plot(model, which = 1)  # Residuals vs Fitted
qqnorm(resid(model)); qqline(resid(model))  # QQ plot for normality check

When performing the Box-Cox transformation we were given an optimal lambda value of -0.83. However, when using -0.83 we were still being left with non-normal residuals so we had to adjust slightly and the value of -0.95 seemed to work. With our lambda value of -0.95 we then had to apply the data transformation to better fit the SLR model. Here are some takeaways

Some of the key takeaways from the Regression output:

  • Small residuals do indicate that the predictions for the model are close to actual resistance values

  • Intercept of (0.1172) is the predicted resistance when film thickness is 0 nm (not physically possible however).

  • Film Thickness Coefficient of (-4.699e-04) indicates that resistance decreases by 0.4699 mOhm as nm increase in thickness

    • This does confirms an inverse linear relationship between thickness and resistance as when thickness increases the electrical resistance decreases

This regression model does support that film thickness has a significant impact on being able to predict what the electrical resistance of a wafer may be

We also created two more plots, a residuals vs fitted and a normal Q-Q plot. When looking at the residuals vs fitted plot; residuals are look good as they are mainly centered around zero. There is a small curve in the red smooth line, which suggests a minor non-linearity. The residuals spread appears even so there is no heteroscedasticity (variance issues). Looking at the Q-Q plot the points are mainly aligned with the diagonal line, which means residuals are approximately normal and the normality assumption is reasonably met. There are some minor deviations at the tails, thicker tails, but not severe.

2.6 Confidence and Prediction Intervals

In order to assess that the sputter process is sable, we computed a 95% confidence interval (CI) and prediction interval (PI) for 100 nm film thickness. The reason we are using a confidence interval is because it does give arange where the mean resistance at 100 nm is expected to fall. The prediction interval gives the range where the individual resistance values at 100 nm may be observed.

# Define target thickness
target_thickness <- data.frame(Film_Thickness_nm = 100)

# Compute 95% Confidence Interval (CI) and Prediction Interval (PI)
ci <- predict(model, newdata = target_thickness, interval = "confidence", level = 0.95)
pi <- predict(model, newdata = target_thickness, interval = "prediction", level = 0.95)

# Print results
print(ci)  # Confidence interval
##          fit        lwr        upr
## 1 0.07021841 0.06959814 0.07083869
print(pi)  # Prediction interval
##          fit        lwr        upr
## 1 0.07021841 0.06401583 0.07642099
# Generate scatterplot with fitted line and intervals
library(ggplot2)

ggplot(data, aes(x = Film_Thickness_nm, y = Electrical_Resistance_mOhm)) +
  geom_point(alpha = 0.6) +  # Scatter plot
  geom_smooth(method = "lm", formula = y ~ x, level = 0.95, color = "blue", fill = "lightblue") +
  labs(title = "Film Thickness vs. Electrical Resistance",
       x = "Film Thickness (nm)",
       y = "Electrical Resistance (mOhm)") +
  theme_minimal()

For the CI ouput, our fit is 0.0702 mOhm which means this is our predicted mean electrical resistance for the 100 nm thickness. We got a value of 0.0696 mOhm as our lower bound value of the 95% CI, and 0.0708 is our upper bound. This means that there is a 95% confidence the true mean resistance at 100nm is between 0.0696-0.0708 mOhm.

For the PI ouput, our fit is still 0.0702 mOhm. We got a value of 0.0640 mOhm as our lower bound value and 0.0764 is our upper bound. This indicates that a new measurement, with 95% confidence, falls between 0.0640-0.0764 mOhm.

3 Conclusions

This study does confirm that film thickness is influential to electrical resistance in the sputter processes. We did confirm this by conducting a Simple Linear Regression model, where we did see a strong correlation with an R-squared value of 0.9257 and statistical significance of p<2e-16. The 0.9257 value indicated that there was a 92.57% variation in electrical resistance that was caused by film thickness. The p-value of p<2e-16 validates the relationship but there were deviations in normality and heteroscedasticity which led us to create a transformation.

We used a Box-Cox transformation and we did get a lambda value of -0.83, however had to use -0.95 due to having 0-value errors. Once we did the transformation we revised the model and after getting reliable residuals it further defended the position on the relationship on our two variables.

4 Complete Code

#install.packages("ggplot2", dependencies = TRUE, repos = "http://cran.rstudio.com")
# Load library to use ggplo2
library(ggplot2)  # For visualization
library(dplyr)    # For data manipulation

# This will read the csv from URL rather than upload
url <- "https://raw.githubusercontent.com/tmatis12/datafiles/refs/heads/main/semiconductor_SLR_dataset.csv"
data <- read.csv(url)

# View the first few rows
head(data)
##   Film_Thickness_nm Electrical_Resistance_mOhm
## 1             87.45                     15.118
## 2            145.07                     23.601
## 3            123.20                     19.904
## 4            109.87                     16.103
## 5             65.60                     12.901
## 6             65.60                     13.278
# Perform summary of statistics
summary(data)
##  Film_Thickness_nm Electrical_Resistance_mOhm
##  Min.   : 50.55    Min.   :11.68             
##  1st Qu.: 69.32    1st Qu.:13.60             
##  Median : 96.42    Median :15.74             
##  Mean   : 97.02    Mean   :16.80             
##  3rd Qu.:123.02    3rd Qu.:19.82             
##  Max.   :148.69    Max.   :25.74
# Check for missing values
sum(is.na(data))
## [1] 0
# Check data set structure
str(data)
## 'data.frame':    100 obs. of  2 variables:
##  $ Film_Thickness_nm         : num  87.5 145.1 123.2 109.9 65.6 ...
##  $ Electrical_Resistance_mOhm: num  15.1 23.6 19.9 16.1 12.9 ...
colnames(data)
## [1] "Film_Thickness_nm"          "Electrical_Resistance_mOhm"
# Resistance Histogram
ggplot(data, aes(x = Electrical_Resistance_mOhm)) +
  geom_histogram(binwidth = 0.5, fill = "blue", alpha = 0.7, color = "black") +
  labs(title = "Distribution of Electrical Resistance", x = "Electrical Resistance (mOhm)", y = "Count")

# Resistance Box plot
ggplot(data, aes(y = Electrical_Resistance_mOhm)) +
  geom_boxplot(fill = "red", alpha = 0.6) +
  labs(title = "Boxplot of Electrical Resistance", y = "Electrical Resistance (mOhm)")

#Checking relationship between thickness and resistance
ggplot(data, aes(x = Film_Thickness_nm, y = Electrical_Resistance_mOhm)) +
  geom_point(color = "blue", alpha = 0.6) +
  geom_smooth(method = "lm", color = "red", se = FALSE) +
  labs(title = "Scatterplot of Film Thickness vs. Electrical Resistance", 
       x = "Film Thickness (nm)", y = "Electrical Resistance (mOhm)")

# Fit a linear regression model
model <- lm(Electrical_Resistance_mOhm ~ Film_Thickness_nm, data = data)

# View summary data of the model
summary(model)
## 
## Call:
## lm(formula = Electrical_Resistance_mOhm ~ Film_Thickness_nm, 
##     data = data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.27640 -0.75508 -0.08631  0.70422  2.69671 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       4.870489   0.356848   13.65   <2e-16 ***
## Film_Thickness_nm 0.122954   0.003518   34.95   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.041 on 98 degrees of freedom
## Multiple R-squared:  0.9257, Adjusted R-squared:  0.925 
## F-statistic:  1221 on 1 and 98 DF,  p-value: < 2.2e-16
# Extract residuals
residuals <- model$residuals

# Histogram of residuals
ggplot(data.frame(residuals), aes(x = residuals)) +
  geom_histogram(fill = "blue", alpha = 0.7, color = "black", bins = 20) +
  labs(title = "Histogram of Residuals", x = "Residuals", y = "Count")

# Q-Q Plot to see if normal or not
qqnorm(residuals)
qqline(residuals, col = "red", lwd = 2)

# Shapiro-Wilk test for normality (p value > 0.05 means residuals are normal)
shapiro.test(residuals)
## 
##  Shapiro-Wilk normality test
## 
## data:  residuals
## W = 0.98927, p-value = 0.6059
# Residuals vs. Fitted Plot (Homoscedasticity Check)
ggplot(data, aes(x = fitted(model), y = residuals)) +
  geom_point(color = "blue", alpha = 0.6) +
  geom_hline(yintercept = 0, linetype = "dashed", color = "red") +
  labs(title = "Residuals vs. Fitted Values", x = "Fitted Values", y = "Residuals")

ls()  # Lists all objects in R environment
##  [1] "boxcox_result"    "ci"               "data"             "lambda_final"    
##  [5] "lambda_opt"       "model"            "pi"               "residuals"       
##  [9] "target_thickness" "url"
colnames(data)
## [1] "Film_Thickness_nm"          "Electrical_Resistance_mOhm"
#install.packages("MASS")
library(MASS)
#perform the boxcox transformation
boxcox_result <- boxcox(model, lambda = seq(-2, 2, by = 0.1))

# Find the lambda that maximizes log-likelihood
lambda_opt <- boxcox_result$x[which.max(boxcox_result$y)]
print(lambda_opt)  
## [1] -0.8282828
# This gave a best lambda value of (-0.83)
# Box-Cox suggests a lambda of -0.83, close to -1, so we apply transformation

# Since -0.8 didn't fully normalize residuals, trying -0.95
lambda_final <- -0.95  

# Apply the transformation to the response variable
data$Electrical_Resistance_mOhm <- (data$Electrical_Resistance_mOhm)^lambda_final

# Refit the model with transformed response
model <- lm(Electrical_Resistance_mOhm ~ Film_Thickness_nm, data = data)
summary(model)  # Verify model output
## 
## Call:
## lm(formula = Electrical_Resistance_mOhm ~ Film_Thickness_nm, 
##     data = data)
## 
## Residuals:
##        Min         1Q     Median         3Q        Max 
## -0.0089765 -0.0016543 -0.0000973  0.0018175  0.0078148 
## 
## Coefficients:
##                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)        1.172e-01  1.066e-03  109.98   <2e-16 ***
## Film_Thickness_nm -4.699e-04  1.051e-05  -44.72   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.00311 on 98 degrees of freedom
## Multiple R-squared:  0.9533, Adjusted R-squared:  0.9528 
## F-statistic:  2000 on 1 and 98 DF,  p-value: < 2.2e-16
# Check residual plots
par(mfrow = c(2,2))  # Arrange plots side by side
plot(model, which = 1)  # Residuals vs Fitted
qqnorm(resid(model)); qqline(resid(model))  # QQ plot for normality check

# Define target thickness
target_thickness <- data.frame(Film_Thickness_nm = 100)

# Compute 95% Confidence Interval (CI) and Prediction Interval (PI)
ci <- predict(model, newdata = target_thickness, interval = "confidence", level = 0.95)
pi <- predict(model, newdata = target_thickness, interval = "prediction", level = 0.95)

# Print results
print(ci)  # Confidence interval
##          fit        lwr        upr
## 1 0.07021841 0.06959814 0.07083869
print(pi)  # Prediction interval
##          fit        lwr        upr
## 1 0.07021841 0.06401583 0.07642099
# Generate scatterplot with fitted line and intervals
library(ggplot2)

ggplot(data, aes(x = Film_Thickness_nm, y = Electrical_Resistance_mOhm)) +
  geom_point(alpha = 0.6) +  # Scatter plot
  geom_smooth(method = "lm", formula = y ~ x, level = 0.95, color = "blue", fill = "lightblue") +
  labs(title = "Film Thickness vs. Electrical Resistance",
       x = "Film Thickness (nm)",
       y = "Electrical Resistance (mOhm)") +
  theme_minimal()