1 INTRODUCTION

This report evaluates the relationship between film thickness (x) and electrical resistance (y) using simple linear regression. We apply relevant mathematical models and provide supporting R code for analysis.

  • x: film thickness

  • y: electrical resistance

Let’s first download the data set from the link below and install all the necessary libraries for this project.

https://raw.githubusercontent.com/tmatis12/datafiles/refs/heads/main/semiconductor_SLR_dataset.csv

#install ggplot2 library
#install.packages("ggplot2")
library(ggplot2)

#dowload the dataset
data_url <- "https://raw.githubusercontent.com/tmatis12/datafiles/refs/heads/main/semiconductor_SLR_dataset.csv"
data <- read.csv(data_url)
head(data)
##   Film_Thickness_nm Electrical_Resistance_mOhm
## 1             87.45                     15.118
## 2            145.07                     23.601
## 3            123.20                     19.904
## 4            109.87                     16.103
## 5             65.60                     12.901
## 6             65.60                     13.278

2 EXPLORATORY DATA ANALYSIS (EDA)

In this part we will do an exploratory analysis of our data and observe what relation we can find between the film thickness and the resistance using visual approach by using histogram, box plot and scatter plot

2.1 Exploratory data analysis on Electrical Resistance (y)

Let’s first do an histogram plot and the box plot of our response variable electrical resistance

#Histogram of electrical resistance (y)
hist(data$Electrical_Resistance_mOhm,
     main= "Histogram of Electrical_Resistance_mOhm",
     xlab = "Electrical_Resistance_mOhm",
     ) 

#Boxplot of electrical resistance (y)
boxplot(data$Electrical_Resistance_mOhm) 

looking on those graphs we can said that Electrical_Resistance_mOhm follows such a Skew shape

2.2 Exploratory data analysis on Electrical_Resistance (y) versus Thickness (x)

let’s do a Scatter plot of our response ( y =electrical resistance) versus our predictor (x= thickness)

\[ y=f(x) \]

#scatterplot of resistance (y) versus thickness (x)
ggplot(data,aes(x=Film_Thickness_nm, y=Electrical_Resistance_mOhm))+
  geom_point(color="red", size=3) 

Looking on this plot we can recognize a curve shape.

To summarize this section, the various plots suggest that the data may not follow a normal distribution.

In the following part, we will fit a simple linear regression model to further investigate and validate this assumption.

3 REGRESSION MODEL FITTING AND ASSUMPTION CHECKING

Let’s fit a simple linear regression model using least squares estimation to determine the nature of the relationship between film thickness and resistance. For the simple linear regression we have the following formula.

\[ y=B_o+B_1x \]

with:

  • x: Film_Thickness

  • y: Electrical_Resistance

model1 <- lm(Electrical_Resistance_mOhm~Film_Thickness_nm, data)
summary(model1)
## 
## Call:
## lm(formula = Electrical_Resistance_mOhm ~ Film_Thickness_nm, 
##     data = data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.27640 -0.75508 -0.08631  0.70422  2.69671 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       4.870489   0.356848   13.65   <2e-16 ***
## Film_Thickness_nm 0.122954   0.003518   34.95   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.041 on 98 degrees of freedom
## Multiple R-squared:  0.9257, Adjusted R-squared:  0.925 
## F-statistic:  1221 on 1 and 98 DF,  p-value: < 2.2e-16

By just looking on the statistical we can see that the \(Pvalue= 2.2e^-16\)is very small that means our predictor variable is significant also we can see that \(R^2=0.9257\) that means our predicted variables are explained by the model at 92.57% and that is actually good.

The model equation is:

\[ y=4.8705+0.12295x \]with

  • \(B_0= 4.8705\)

  • \(B_1=0.12295\)

Let’s plot the model and see what we get.

#Let's observe the normal probability and the constance of the variance
plot(model1)

By examining the diagnostic plots of the model, we observe that while the residuals generally align with the normal probability plot, the variance of the errors is not constant. This indicates a violation of the homoscedasticity assumption, suggesting that the data does not fully satisfy the normality requirements. Therefore, a transformation of the data is necessary to improve the model’s validity.

4 MODEL TRANSFORMATION

Given that previously our model doesn’t respect normal distribution assumption, we have to bring modification on our variable Y= Electrical_Resistance following the BoxCox transformation.

the equation of the transformation is:

\[ y^\lambda =B_o+B_1x \]

#let's look at the value of lambda that give us the max likehood
b$x[which.max(b$y)]
## [1] -0.8282828

Now by using the BoxCox transformation, we found that the best value of lambda to transform our data is \(\lambda=-0.8282828\)

so the transformation equation is:

\[ y^-0.8282828=4.8705+0.12295x \]

#Sqr transformation with the value of lambda for the maximize likelihood
data$Electrical_Resistance_mOhm <- (data$Electrical_Resistance_mOhm)^-.8282828
head(data)
##   Film_Thickness_nm Electrical_Resistance_mOhm
## 1             87.45                 0.10544965
## 2            145.07                 0.07291642
## 3            123.20                 0.08396727
## 4            109.87                 0.10007829
## 5             65.60                 0.12025129
## 6             65.60                 0.11741634

By applying the transformation, let’s build the new model

model2 <- lm(Electrical_Resistance_mOhm~Film_Thickness_nm, data)
summary(model2)
## 
## Call:
## lm(formula = Electrical_Resistance_mOhm ~ Film_Thickness_nm, 
##     data = data)
## 
## Residuals:
##        Min         1Q     Median         3Q        Max 
## -0.0108944 -0.0019298 -0.0002015  0.0024314  0.0091823 
## 
## Coefficients:
##                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)        1.561e-01  1.303e-03  119.75   <2e-16 ***
## Film_Thickness_nm -5.763e-04  1.285e-05  -44.85   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.003804 on 98 degrees of freedom
## Multiple R-squared:  0.9535, Adjusted R-squared:  0.9531 
## F-statistic:  2011 on 1 and 98 DF,  p-value: < 2.2e-16

after the transformation we have this model2 equation:

\[ y=0.1561-0.0005763x \]

we can observe some changes in the statistical result like the \(R^2=0.9535\) is a little bit better than in the model1, we have the same \(Pvalue= 2.2e^-16\) but by looking down on the model2 plot, we can see that this new model respects the normal probability plot but also the variance is look more constant.

#Let's observe the normal probability and the constance of the variance
plot(model2)

5 ANALYSIS

Given our model2 let’s calculate a 95% confidence interval (CI) and prediction interval (PI) for resistance at 100 nm of thickness

#Create a new data point at 10nm of thickness
new_data <- data.frame(Film_Thickness_nm = 100)

#calculate 95% Confidence Interval
Ci_100 <- predict(model2, newdata = new_data, interval = "confidence", level = 0.95)

Ci_100
##          fit        lwr        upr
## 1 0.09845335 0.09769471 0.09921198

Looking on this result of confidence Interval, we are 95% confident that the mean resistance for all wafers with 100 nm film thickness lies between 0.09769471 Ω and 0.09921198 Ω.

# Calculate 95% Prediction Interval
Pi_100 <- predict(model2, newdata = new_data, interval = "prediction", level = 0.95)
Pi_100
##          fit        lwr       upr
## 1 0.09845335 0.09086716 0.1060395

Looking on this result of Prediction Interval, we are 95% confident that a single new wafer with 100 nm thickness will have a resistance between 0.09086716 Ω and 0.1060395 Ω.

Additionally let’s generate a scatterplot of the data with the fitted regression line, including both confidence and prediction intervals.

library(ggplot2)

# Scatterplot + regression line + CI/PI
ggplot(data, aes(x = Film_Thickness_nm, y = Electrical_Resistance_mOhm)) +
  geom_point(color = "black") +
  geom_smooth(method = "lm", se = TRUE, fill = "green", alpha = 0.4) +
  geom_vline(xintercept = 100, linetype = "dashed", color = "red") +
  labs(title = "Resistance vs Thickness with CI and PI",
       x = "Film Thickness (nm)",
       y = "Electrical Resistance (Ω)") +
  theme_minimal()

Looking at this plot we can observe how strong and consistent the relationship is a linear trend and also how good our model2 is fitting the data.

6 SUMMARY

We can conclude that model2 given by the equation \(y=0.1561-0.0005763x\)provides a good fit for our data. It shows that there is a negative relationship between Electrical Resistance and Film Thickness meaning that Electrical Resistance decreases as Film Thickness increases.