Assignment_2_hw

#Task2

results1<- read.csv("bootstrap_data1.csv")
results2<- read.csv("bootstrap_data2.csv")
results3<- read.csv("bootstrap_data3.csv")

analyze_results<- function(results, dataset_name) {
  
means<- apply(results, 2, mean) #calculating mean and sd using apply fn
sds<- apply(results, 2, sd)
  
  summary_table<- data.frame(
Coefficient = names(results), #creating a summary table
 Mean = round(means, 4),
    SD = round(sds, 4)
  )
  print(summary_table)
}

analyze_results(results1, "Data Set 1")

##           Coefficient    Mean     SD
## Intercept   Intercept -0.1104 0.0256
## x1                 x1 -0.3081 0.0747
## x2                 x2  0.2242 0.0475
## x3                 x3  1.2875 0.0620

analyze_results(results2, "Data Set 2")

##           Coefficient     Mean      SD
## Intercept   Intercept -25.1467  1.7721
## x1                 x1  31.0996 11.2264
## x2                 x2   8.7000 11.9418
## x3                 x3  12.3147  2.7542
## x4                 x4  47.5258  2.2409

analyze_results(results3, "Data Set 3")

##           Coefficient     Mean     SD
## Intercept   Intercept  25.0574 3.9762
## x1                 x1  -2.5108 3.4102
## x2                 x2   5.6524 4.4081
## x3                 x3  -5.3723 5.0465
## x4                 x4   3.7050 3.5555
## x5                 x5 -16.9922 5.6090
## x6                 x6   4.7713 5.1905

par(mfrow=c(1, 3))

#Boxplot of dataset1
boxplot(results1, main = "Data Set 1", las = 2, col = "blue")
abline(h = 0, lty = 2, col = "red")

#Boxplot of dataset2  
boxplot(results2, main = "Data Set 2", las = 2, col = "green")
abline(h = 0, lty = 2, col = "red")

#Boxplot of dataset 3
boxplot(results3, main = "Data Set 3", las = 2, col = "yellow")
abline(h = 0, lty = 2, col = "red")

par(mfrow = c(1, 1))

Comment: For all the cases, the entire boxes are either above zero or below zero. These boxes show the Interquartile range(Q3-Q1)which represents the 50% of the values. For data set1, boxplots of the three covariates and the intercept are either above the 0 line (Which indicates possible positive effect) or below it(which indicates possible negative effect). In this case,even the whiskers are either above 0 or below 0, which represents 95% of the data, which makes it more certain that the coefficients are different from 0. For data set2, all the boxes are above or below the zero line which tells us all of them are likely to be significant.Except x2, rest of the whiskers are also either above 0 or below 0. For dataset3, all the boxes are either above or below 0, that means all of them are likely to be significant. For the intercept and x5, the whiskers are also either above or below 0. In the bootstrap samples, no boxes for the variables of all the three datasets have crossed 0 line, hence we cannot see any probable insignicant coeffecients.

Assignment_2_hw_2

Sabuj Ganguly

2025-10-01