Using R, build a multiple regression model for data that interests you. Include in this model at least one quadratic term, one dichotomous term, and one dichotomous vs. quantitative interaction term. Interpret all coefficients. Conduct residual analysis. Was the linear model appropriate? Why or why not?

#Loading data from CSV file
data <- read.csv("C:/Users/aleja/Desktop/forestfires.csv")

#View of the first few rows of dataset
head(data)
##   X Y month day FFMC  DMC    DC  ISI temp RH wind rain area
## 1 7 5   mar fri 86.2 26.2  94.3  5.1  8.2 51  6.7  0.0    0
## 2 7 4   oct tue 90.6 35.4 669.1  6.7 18.0 33  0.9  0.0    0
## 3 7 4   oct sat 90.6 43.7 686.9  6.7 14.6 33  1.3  0.0    0
## 4 8 6   mar fri 91.7 33.3  77.5  9.0  8.3 97  4.0  0.2    0
## 5 8 6   mar sun 89.3 51.3 102.2  9.6 11.4 99  1.8  0.0    0
## 6 8 6   aug sun 92.3 85.3 488.0 14.7 22.2 29  5.4  0.0    0
#structure of dataset
str(data)
## 'data.frame':    517 obs. of  13 variables:
##  $ X    : int  7 7 7 8 8 8 8 8 8 7 ...
##  $ Y    : int  5 4 4 6 6 6 6 6 6 5 ...
##  $ month: chr  "mar" "oct" "oct" "mar" ...
##  $ day  : chr  "fri" "tue" "sat" "fri" ...
##  $ FFMC : num  86.2 90.6 90.6 91.7 89.3 92.3 92.3 91.5 91 92.5 ...
##  $ DMC  : num  26.2 35.4 43.7 33.3 51.3 ...
##  $ DC   : num  94.3 669.1 686.9 77.5 102.2 ...
##  $ ISI  : num  5.1 6.7 6.7 9 9.6 14.7 8.5 10.7 7 7.1 ...
##  $ temp : num  8.2 18 14.6 8.3 11.4 22.2 24.1 8 13.1 22.8 ...
##  $ RH   : int  51 33 33 97 99 29 27 86 63 40 ...
##  $ wind : num  6.7 0.9 1.3 4 1.8 5.4 3.1 2.2 5.4 4 ...
##  $ rain : num  0 0 0 0.2 0 0 0 0 0 0 ...
##  $ area : num  0 0 0 0 0 0 0 0 0 0 ...
#Fitting the multiple regression model with log-transformed response variable
model <- lm(log(area + 1) ~ FFMC + DMC + DC + ISI + temp + RH + wind + rain, data = data)

#Summary of the model
summary(model)
## 
## Call:
## lm(formula = log(area + 1) ~ FFMC + DMC + DC + ISI + temp + RH + 
##     wind + rain, data = data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.5203 -1.1129 -0.6158  0.8787  5.7121 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)  
## (Intercept)  0.2224140  1.3604350   0.163    0.870  
## FFMC         0.0077082  0.0144884   0.532    0.595  
## DMC          0.0011915  0.0014642   0.814    0.416  
## DC           0.0002737  0.0003570   0.767    0.444  
## ISI         -0.0239494  0.0169248  -1.415    0.158  
## temp         0.0024618  0.0172593   0.143    0.887  
## RH          -0.0051729  0.0051889  -0.997    0.319  
## wind         0.0757669  0.0366155   2.069    0.039 *
## rain         0.0965122  0.2121461   0.455    0.649  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.395 on 508 degrees of freedom
## Multiple R-squared:  0.01988,    Adjusted R-squared:  0.004446 
## F-statistic: 1.288 on 8 and 508 DF,  p-value: 0.2472

Residual analysis

# Residual analysis
par(mfrow=c(2,2))
plot(model)

Conclusion:

The linear regression model attempted with the forest fires dataset showed that only the wind variable had a statistically significant effect on the log-transformed area burned, with all other predictors not being statistically significant. However, the residual analysis revealed violations of key assumptions of linear regression, including heteroscedasticity and deviation from normality in the residuals.

LS0tDQp0aXRsZTogIkRhdGEgNjA1IERpc2N1c3Npb24gMTIiDQphdXRob3I6ICJMYXVyYSBCIg0KZGF0ZTogImByIFN5cy5EYXRlKClgIg0Kb3V0cHV0OiBvcGVuaW50cm86OmxhYl9yZXBvcnQNCi0tLQ0KDQoNClVzaW5nIFIsIGJ1aWxkIGEgbXVsdGlwbGUgcmVncmVzc2lvbiBtb2RlbCBmb3IgZGF0YSB0aGF0IGludGVyZXN0cyB5b3UuIEluY2x1ZGUgaW4gdGhpcyBtb2RlbCBhdCBsZWFzdCBvbmUgcXVhZHJhdGljIHRlcm0sIG9uZSBkaWNob3RvbW91cyB0ZXJtLCBhbmQgb25lIGRpY2hvdG9tb3VzIHZzLiBxdWFudGl0YXRpdmUgaW50ZXJhY3Rpb24gdGVybS4gSW50ZXJwcmV0IGFsbCBjb2VmZmljaWVudHMuIENvbmR1Y3QgcmVzaWR1YWwgYW5hbHlzaXMuIFdhcyB0aGUgbGluZWFyIG1vZGVsIGFwcHJvcHJpYXRlPyBXaHkgb3Igd2h5IG5vdD8NCg0KDQpgYGB7ciBsb2FkLXBhY2thZ2VzLCBtZXNzYWdlPUZBTFNFfQ0KI0xvYWRpbmcgZGF0YSBmcm9tIENTViBmaWxlDQpkYXRhIDwtIHJlYWQuY3N2KCJDOi9Vc2Vycy9hbGVqYS9EZXNrdG9wL2ZvcmVzdGZpcmVzLmNzdiIpDQoNCiNWaWV3IG9mIHRoZSBmaXJzdCBmZXcgcm93cyBvZiBkYXRhc2V0DQpoZWFkKGRhdGEpDQoNCiNzdHJ1Y3R1cmUgb2YgZGF0YXNldA0Kc3RyKGRhdGEpDQoNCg0KYGBgDQoNCg0KDQpgYGB7ciB2aWV3LWdpcmxzLWNvdW50c30NCiNGaXR0aW5nIHRoZSBtdWx0aXBsZSByZWdyZXNzaW9uIG1vZGVsIHdpdGggbG9nLXRyYW5zZm9ybWVkIHJlc3BvbnNlIHZhcmlhYmxlDQptb2RlbCA8LSBsbShsb2coYXJlYSArIDEpIH4gRkZNQyArIERNQyArIERDICsgSVNJICsgdGVtcCArIFJIICsgd2luZCArIHJhaW4sIGRhdGEgPSBkYXRhKQ0KDQojU3VtbWFyeSBvZiB0aGUgbW9kZWwNCnN1bW1hcnkobW9kZWwpDQoNCmBgYA0KDQoNClJlc2lkdWFsIGFuYWx5c2lzDQoNCmBgYHtyIHRyZW5kLWdpcmxzfQ0KIyBSZXNpZHVhbCBhbmFseXNpcw0KcGFyKG1mcm93PWMoMiwyKSkNCnBsb3QobW9kZWwpDQoNCmBgYA0KDQpDb25jbHVzaW9uOg0KDQpUaGUgbGluZWFyIHJlZ3Jlc3Npb24gbW9kZWwgYXR0ZW1wdGVkIHdpdGggdGhlIGZvcmVzdCBmaXJlcyBkYXRhc2V0IHNob3dlZCB0aGF0IG9ubHkgdGhlIHdpbmQgdmFyaWFibGUgaGFkIGEgc3RhdGlzdGljYWxseSBzaWduaWZpY2FudCBlZmZlY3Qgb24gdGhlIGxvZy10cmFuc2Zvcm1lZCBhcmVhIGJ1cm5lZCwgd2l0aCBhbGwgb3RoZXIgcHJlZGljdG9ycyBub3QgYmVpbmcgc3RhdGlzdGljYWxseSBzaWduaWZpY2FudC4gSG93ZXZlciwgdGhlIHJlc2lkdWFsIGFuYWx5c2lzIHJldmVhbGVkIHZpb2xhdGlvbnMgb2Yga2V5IGFzc3VtcHRpb25zIG9mIGxpbmVhciByZWdyZXNzaW9uLCBpbmNsdWRpbmcgaGV0ZXJvc2NlZGFzdGljaXR5IGFuZCBkZXZpYXRpb24gZnJvbSBub3JtYWxpdHkgaW4gdGhlIHJlc2lkdWFscy4NCg0KDQo=