Mini_Practice

Define & Check the data set

mini <- read.csv("C:/Users/grace/Downloads/mini_DATA.csv", stringsAsFactors=TRUE)
nrow(mini)

[1] 108

names(mini)

 [1] "Time"            "fb.friends"      "insta.followers" "insta.follows"  
 [5] "bored"           "thirsty"         "tired"           "satlife"        
 [9] "oski.love"       "r.love"          "socmeduse"       "attention"      
[13] "hrs.sleep"       "selfes1"         "data.power"      "corp.power"     
[17] "success.work"    "success.priv"    "selfes2"         "lovetapwater"   
[21] "catdog"          "tuhobura"        "cal.sports"      "caffeine"       
[25] "had.breakfast"   "is.female"       "long.hair"       "has.water"      
[29] "shoe.size"       "height"

Check for outliers in dataset

hist(mini$satlife)

hist(mini$selfes1)

Define & Check the data set

plot(satlife ~ selfes1, data=mini, xlab = "Self-Esteem", ylab = "Satisfaction with Life")
mod <- lm(satlife ~ selfes1, data=mini)
abline(mod, col="red", lwd = 3)

coef(mod)

(Intercept)     selfes1 
  2.4218015   0.6560048

summary(mod)$r.squared

[1] 0.5005193

Interpretations

We learned that there’s a positive slope:

When there’s people with 0 self esteem, we expect them to rate a life satisfaction as 2.4
With every 1 increase in self-esteem, we expect to see 0.65 increase in satisfaction of life
Predictor of self esteem explains 50% of the variation

Some reasons we might found this relationship:

Causation, higher self esteem causes more satisfaction of life
Pure chance, luck of sample, not representing whole population
Reverse causation: greater satisfaction o flife causes higher self esteem
Third variable causes both to be seeming related

Define & Check the data set (Categorical)

plot(mini$is.female)

summary(mini$is.female) #1 person seems left the field blank, the question intended to have only 2 responses, thus, there's an outlier

     No Yes 
  1  27  80

Removing Outlier (categorical)

levels(mini$is.female)[1] <- NA

Define Linear Models

library(gplots)

Warning: package 'gplots' was built under R version 4.5.3


---------------------
gplots 3.3.0 loaded:
  * Use citation('gplots') for citation info.
  * Homepage: https://talgalili.github.io/gplots/
  * Report issues: https://github.com/talgalili/gplots/issues
  * Ask questions: https://stackoverflow.com/questions/tagged/gplots
  * Suppress this message with: suppressPackageStartupMessages(library(gplots))
---------------------


Attaching package: 'gplots'

The following object is masked from 'package:stats':

    lowess

mod2<-lm(satlife ~ is.female, data = mini)
plotmeans(satlife ~ is.female, data=mini, xlab="Female", ylab="Satisfaction with Life")
install.packages("gplots")

Warning: package 'gplots' is in use and will not be installed

abline(mod2, col="red", lwd = 3)

coef(mod2)

 (Intercept) is.femaleYes 
    5.296296     1.678704

summary(mod2)$r.squared

[1] 0.1262859

0 = no

1 = yes

When x is 0, female = no, dealing wiith males

Expected life satisfaction for male is 5.29 (x-intercept)

When goes from male to female, the expected life satisfaction for female is expected to increase by 1.67 (slope)

Our model explains 13% of variation in gender and satisfaction of life

Self esteem predicts life satisfaction better as it has a higher r-squared (explaining 50% of the variation) while gender only explains 13% of the variation

4 reasons:

Chance: Sample by chance show this relationship that may not represent the whole population
Causation: Being female causes greater satisfaction of life
Reverse causation: If you have greater life satisfaction, that make you women???NOT REALLY WORKING
Third variable:

Graph Categorical IV

plot(mini$caffeine)

Relevel the Data

mini$caffeine <- relevel(mini$caffeine, ref="Never")

no caffeine = 0

mini$caffeine <- factor(mini$caffeine, levels=c("Never", "Rarely", "Sometimes", "Always"))

Plot

mod3<-lm(satlife ~ caffeine, data=mini)
plotmeans(satlife ~ caffeine, data=mini)

Statistics

coef(mod3)

      (Intercept)    caffeineRarely caffeineSometimes    caffeineAlways 
       6.28571429       -0.07518797        0.24369748        0.42261905

summary(mod3)$r.squared

[1] 0.008563796

Interpretation (Comparing each category (1) to baseline (0))

When all three are 0, for someone who never drink caffeine, the expected life satisfaction is 6.28 (x-intercept)
Compare to the baseline (someone who never drinks caffeine), someone who rarely drinks caffeine has a lower life satisfaction by 0.07
Compare to the baseline (someone who never drinks caffeine), someone who sometimes drinks caffeine has a higher life satisfaction by 0.24
Compare to the baseline (someone who never drinks caffeine), some one who always drinks caffeine has a higher life satisfaction by 0.42
Our model (caffeine as predictor) explains 0.8% of variation in life satisfaction