Project 1: Simple Linear Regression

Modeling Electrical Resistance in Semiconductor Wafer Testing

Author: Tafadzwa Banga
Course: IE 5344
Date: 16/03/2025

Introduction

Semiconductor manufacturing relies heavily on precise control of thin film deposition processes. In this project, we investigate the relationship between film thickness and electrical resistance in semiconductor wafers. The goal is to determine whether film thickness is a strong predictor of resistance, which has implications for process control and quality improvement in semiconductor fabrication.

Dataset Description

The dataset consists of 100 observations collected from a wafer fabrication process. The variables include:
- Film Thickness (X): Measured in nanometers (nm).
- Electrical Resistance (Y): Measured in ohms (Ω).

The dataset can be accessed here.

Exploratory Data Analysis (EDA)

Distribution of Electrical Resistance

We begin by examining the distribution of electrical resistance using a histogram and box plot.

# Load necessary libraries 
library(ggplot2)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

Load the dataset

#fetching data from url link
url<- "https://raw.githubusercontent.com/tmatis12/datafiles/refs/heads/main/semiconductor_SLR_dataset.csv"
#downloading data from url

download.file(url, destfile = "semiconductor_SLR_dataset.csv")
#load the datasets

semiconductor_SLR_dataset <- read.csv("semiconductor_SLR_dataset.csv")
#view data
head(semiconductor_SLR_dataset)
##   Film_Thickness_nm Electrical_Resistance_mOhm
## 1             87.45                     15.118
## 2            145.07                     23.601
## 3            123.20                     19.904
## 4            109.87                     16.103
## 5             65.60                     12.901
## 6             65.60                     13.278

Histogram of Electrical Resistance

ggplot(semiconductor_SLR_dataset, aes(x = Electrical_Resistance_mOhm)) + geom_histogram(binwidth = 5, fill = "blue", color = "black") + labs(title = "Histogram of Electrical Resistance", x = "Resistance (Ω)", y = "Frequency")

Box plot of Electrical Resistance

ggplot(semiconductor_SLR_dataset, aes(y = Electrical_Resistance_mOhm)) + geom_boxplot(fill = "orange") + labs(title = "Box Plot of Electrical Resistance", y = "Resistance (Ω)")

Scatterplot of Resistance vs. Film Thickness

ggplot(semiconductor_SLR_dataset, aes(x = Film_Thickness_nm, y = Electrical_Resistance_mOhm)) + geom_point(color = "darkgreen") + labs(title = "Scatterplot of Resistance vs. Film Thickness", x = "Film Thickness (nm)", y = "Resistance (Ω)")

Simple Linear Regression Model

For the simple linear regression we aim to fit the regression model to electrical resistance based on the film thickness. To achive the doal is to ensure that we are able to define how much data does the model explain from the data set. This can help validate its usefulness.

# Fit the linear regression model 
model <- lm(Electrical_Resistance_mOhm ~ Film_Thickness_nm, data = semiconductor_SLR_dataset) 
summary(model)
## 
## Call:
## lm(formula = Electrical_Resistance_mOhm ~ Film_Thickness_nm, 
##     data = semiconductor_SLR_dataset)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.27640 -0.75508 -0.08631  0.70422  2.69671 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       4.870489   0.356848   13.65   <2e-16 ***
## Film_Thickness_nm 0.122954   0.003518   34.95   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.041 on 98 degrees of freedom
## Multiple R-squared:  0.9257, Adjusted R-squared:  0.925 
## F-statistic:  1221 on 1 and 98 DF,  p-value: < 2.2e-16

The positive slope indicates that as film thickness increases, electrical resistance increases. The R-squared value suggests that 92.57% of the variation in resistance is explained by film thickness.

The regression equation used for this model is :

\[ Resistance=β0​+β1​×Thickness​+e \]

Assumption Checking

To check the assumptions of the linear regression of normal residuals and homoscedastacity we start by showing the regression line on the data

 ggplot(semiconductor_SLR_dataset, aes(x = Film_Thickness_nm, y = Electrical_Resistance_mOhm)) +
   geom_point() +
   stat_smooth(method = "lm", col = "red")
## `geom_smooth()` using formula = 'y ~ x'

Normality of Residuals

We check the normality of residuals using a Q-Q plot.

qqnorm(resid(model)) 
qqline(resid(model))

Residual vs. Fitted plot

plot(model, which = 1)

Confidence and Prediction Intervals

Calculating the 95% confidence interval (CI) and prediction interval (PI) for resistance at 100 nm thickness.

  • The analysis includes a scatterplot with the regression line, CI, and PI.

    new_data <- data.frame(Film_Thickness_nm = 100) 
    predict(model, new_data, interval = "confidence") 
    ##        fit      lwr      upr
    ## 1 17.16585 16.95815 17.37354
    predict(model, new_data, interval = "prediction")
    ##        fit      lwr      upr
    ## 1 17.16585 15.08893 19.24276

At 100 nm thickness, the expected resistance is 17.16585 Ω, with a 95% confidence interval of [Lower= 16.95815, Upper = 17.37354] Ω. The prediction interval is [Lower =15.08893, Upper = 19.24276] Ω, indicating the range within which future observations are expected to fall.

Conclusion

The results from the simple linear regression model indicate that film thickness is a statistically significant predictor of electrical resistance. Specifically, the model demonstrates that as film thickness increases, electrical resistance increases. This relationship is supported by a high R-squared value of 92.57%, indicating that the model explains approximately 92.57% of the variability in electrical resistance based on film thickness alone. This is a strong result, suggesting that film thickness is a key factor influencing resistance in this context.

The statistical significance of the model is further confirmed by the p-value associated with the slope coefficient, which is significantly lower than common significance thresholds (e.g., 0.001, 0.01, 0.05, and 0.1). This provides strong evidence against the null hypothesis, reinforcing the validity of the relationship between film thickness and resistance.

Additionally, the 95% confidence and prediction intervals calculated for a film thickness of 100 nm, a critical value in semiconductor manufacturing provide valuable insights for process control. These intervals indicate that the process is stable and predictable, with resistance values falling within an acceptable range. This is crucial for ensuring that semiconductor devices meet design specifications and perform reliably.

From a theoretical perspective, these findings do not quite align with fundamental principles of electrical conductivity at macro scale but this can be understandable since material behave differently at nanoscale. Thicker films generally provide more pathways for current flow, leading to lower resistance. This consistency between the empirical data from the linear model validates the model’s reliability.