I. Initialization Block

Initializing RStudio

The data set we will use primarily is Data3350 which was produced in 2015 during an undergraduate research project about personality and humor. The VarsData3350 PDF file has descriptions of each variable in the Data3350 file. Both are available for download in D2L. Be sure to put the Data3350 in your R folder in Documents, and make sure your working directory is set the same way (Session menu). The code block below uses the library function to ensure that the Mosaic package is loaded and will import the data frames used in this module: Data3350 and the U.S. wages data frames Med, Perc and PercB.

library(mosaic)
library(readxl)
Data3350 = read_excel("Data3350.xlsx")
Med = read_excel("Med.xlsx")
Perc = read_excel("Perc.xlsx")
PerB = read_excel("PercB.xlsx")

II. Exercises

  1. Use the Data3350 data frame to build and evaluate a linear model for narcissism (dependent variabele) vs. thrill-seeking (independent variable) using the Thrill and Narc variables. Be sure to check the linearity and normality assumptions and analyze all regression statistics. Construct a confidence interval for the slope of the regression line using an appropriate method.

  2. Use the Data3350 data frame to build and evaluate a linear model for neuroticism (dependent variabele) vs. optimism (independent variable) using the Neuro and Opt variables. Be sure to check the linearity and normality assumptions and analyze all regression statistics. Construct a confidence interval for the slope of the regression line using an appropriate method.

  3. Using the built-in Mosaic data set Dimes, test the significance of the correlation between mass and year. Would it be appropriate to build a linear model in this case? Why or why not?

  4. Use the Perc data frame to create linear models for the 25th percentile wage earners from 2000 Q1 through 2019 Q4. Be sure to look carefully at all diagnostic plots. Create models for the Trump era and the last 3 years of the Obmama era and conduct hypothesis tests that the Trump era growth was significantly greater than the historic trend as well as greater than the Obama era.

  5. Use the Perc data frame to create linear models for the 75th and 90th percentile wage earners from 2000 Q1 through 2019 Q4. Be sure to look carefully at all diagnostic plots. Create models for the Trump era and the last 3 years of the Obmama era and conduct hypothesis tests that the Trump era growth was significantly greater than the historic trend as well as greater than the Obama era.

  6. The PercB data is similar to Perc in that it includes the 10th, 25th, 50th, 75th and 90th percentile wages, but PerB includes Black wage earners only. Use the PercB data frame to create a linear model for the median wage growth (50th percentile) from 2000 Q1 through 2019 Q4. Be sure to look carefully at all diagnostic plots. Create models for the Trump era and the last 3 years of the Obmama era and conduct hypothesis tests that the Trump era growth was significantly greater than the historic trend as well as greater than the Obama era.

III. Code Blocks

xyplot(Anx ~ Opt, data = Data3350 , type = c("p","r"),
       main = "Optimism vs. Anxiety",
       xlab = "Optimism",
       ylab = "Anxiety")
lm(Anx ~ Opt, data = Data3350)
mod = lm(Anx ~ Opt, data = Data3350)
summary(mod)
histogram (~ resid (mod))
qqmath( ~ resid(mod))
mod1 = lm(SE ~ Opt, data = Data3350)
qqmath( ~ resid(mod1), type = c("p","r"))
summary(mod)
xyplot(Earners ~ Period, data = Med, 
       xlab = "Quarters since 1981 Q1", 
       ylab = "Number of Wage Earners (in thousands)")
newMed = subset( Med , Period <= 155 )
xyplot( Median ~ Period, data = newMed , 
        type = c("p","r") ,
        xlab = "Quarters since 1981 Q1" , 
        ylab = "U.S. Median Wage")
lm( Median ~ Period, data = newMed)
cor(Median ~ Period, data = Med)
cor(Median ~ Period, data = newMed)
mod = lm( Median ~ Period, data = newMed)
summary(mod)
TrumpMed = subset( Med , Period >=  144 & Period <= 155)
xyplot(Median ~ Period, data = TrumpMed , 
        type = c("p","r") ,
        xlab = "All Quarters 2017 to 2019" ,
        ylab = "Trump Era Median Wage") 
modT = lm(Median ~ Period, data = TrumpMed)
summary(modT)
qqmath( ~ resid(modT), type = c("p","r"))
ObamaMedTotal = subset( Med , Period >= 112 & Period <= 143 )
ObamaMedL3 = subset( Med , Period >= 132 & Period <= 143 )
xyplot(Median ~ Period, data = ObamaMedTotal , 
        type = c("p","r") ,
        xlab = "All Quarters 2009 to 2016" ,
        ylab = "Obama Era Median Wage") 
xyplot(Median ~ Period, data = ObamaMedL3 , 
        type = c("p","r") ,
        xlab = "All Quarters 2014 to 2016" ,
        ylab = "Obama Era Median Wage") 
modB = lm(Median ~ Period, data = ObamaMedTotal)
modBL3 = lm(Median ~ Period, data = ObamaMedL3)
summary(modB)
summary(modBL3)
confint.lm(modT)
confint.lm(modT, level = 0.90)
confint.lm(modB)
confint.lm(modBL3, level = 0.95)
qqmath( ~ resid(modB), type = c("p","r"))
qqmath( ~ resid(modBL3), type = c("p","r"))
coef(lm(Median ~ Period, data=resample(TrumpMed, size = 10)))
bootstrap = do(500) * coef(lm(Median ~ Period, data=resample(TrumpMed)))
densityplot(~Period, data=bootstrap)
qdata(~Period, p=c(0.025, 0.975), data=bootstrap)
---
title: "Basics of Correlation and Regression"
subtitle: UNG MATH 3350 (online)
author: Robb Sinn
date: September 2020
output: html_notebook
---

# <span style="color: blue;">I. Initialization Block</span>

<div style="float:right; margin: 8px; border:2px black solid; padding: 0px 10px 5px">
### <span style="color: red;">Initializing RStudio</span>
The data set we will use primarily is **Data3350** which was produced in 2015 during an undergraduate research project about personality and humor. The **VarsData3350** PDF file has descriptions of each variable in the Data3350 file. Both are available for download in D2L. Be sure to put the Data3350 in your R folder in Documents, and make sure your working directory is set the same way (Session menu). The code block below uses the **library** function to ensure that the **Mosaic** package is loaded and will import the data frames used in this module: **Data3350** and the U.S. wages data frames **Med**, **Perc** and **PercB**.

```{r}
library(mosaic)
library(readxl)
Data3350 = read_excel("Data3350.xlsx")
Med = read_excel("Med.xlsx")
Perc = read_excel("Perc.xlsx")
PerB = read_excel("PercB.xlsx")
```
</div>

# <span style="color: blue;">II. Exercises</span>

1. Use the Data3350 data frame to build and evaluate a linear model for narcissism (dependent variabele) vs. thrill-seeking (independent variable) using the **Thrill** and **Narc** variables. Be sure to check the linearity and normality assumptions and analyze all regression statistics. Construct a confidence interval for the slope of the regression line using an appropriate method.

2. Use the Data3350 data frame to build and evaluate a linear model for neuroticism (dependent variabele) vs. optimism (independent variable) using the **Neuro** and **Opt** variables. Be sure to check the linearity and normality assumptions and analyze all regression statistics. Construct a confidence interval for the slope of the regression line using an appropriate method.

3. Using the built-in Mosaic data set **Dimes**, test the significance of the correlation between **mass** and **year**. Would it be appropriate to build a linear model in this case? Why or why not?

4. Use the **Perc** data frame to create linear models for the 25th percentile wage earners from 2000 Q1 through  2019 Q4. Be sure to look carefully at all diagnostic plots. Create models for the Trump era and the last 3 years of the Obmama era and conduct hypothesis tests that the Trump era growth was significantly greater than the historic trend as well as greater than the Obama era.

5. Use the **Perc** data frame to create linear models for the 75th and 90th percentile wage earners from 2000 Q1 through  2019 Q4. Be sure to look carefully at all diagnostic plots. Create models for the Trump era and the last 3 years of the Obmama era and conduct hypothesis tests that the Trump era growth was significantly greater than the historic trend as well as greater than the Obama era.

6. The **PercB** data is similar to **Perc** in that it includes the 10th, 25th, 50th, 75th and 90th percentile wages, but **PerB** includes Black wage earners only. Use the **PercB** data frame to create a linear model for the median wage growth (50th percentile) from 2000 Q1 through 2019 Q4. Be sure to look carefully at all diagnostic plots. Create models for the Trump era and the last 3 years of the Obmama era and conduct hypothesis tests that the Trump era growth was significantly greater than the historic trend as well as greater than the Obama era.

# <span style="color: blue;">III. Code Blocks</span>

```{r}
xyplot(Anx ~ Opt, data = Data3350 , type = c("p","r"),
       main = "Optimism vs. Anxiety",
       xlab = "Optimism",
       ylab = "Anxiety")
```

```{r}
lm(Anx ~ Opt, data = Data3350)
```


```{r}
mod = lm(Anx ~ Opt, data = Data3350)
```


```{r}
summary(mod)
```


```{r}
histogram (~ resid (mod))
```


```{r}
qqmath( ~ resid(mod))
```


```{r}
mod1 = lm(SE ~ Opt, data = Data3350)
qqmath( ~ resid(mod1), type = c("p","r"))
```


```{r}
summary(mod)
```



```{r}
xyplot(Earners ~ Period, data = Med, 
       xlab = "Quarters since 1981 Q1", 
       ylab = "Number of Wage Earners (in thousands)")
```


```{r}
newMed = subset( Med , Period <= 155 )
```

```{r}
xyplot( Median ~ Period, data = newMed , 
        type = c("p","r") ,
        xlab = "Quarters since 1981 Q1" , 
        ylab = "U.S. Median Wage")
```


```{r}
lm( Median ~ Period, data = newMed)
cor(Median ~ Period, data = Med)
cor(Median ~ Period, data = newMed)
```


```{r}
mod = lm( Median ~ Period, data = newMed)
summary(mod)
```


```{r}
TrumpMed = subset( Med , Period >=  144 & Period <= 155)
```


```{r}
xyplot(Median ~ Period, data = TrumpMed , 
        type = c("p","r") ,
        xlab = "All Quarters 2017 to 2019" ,
        ylab = "Trump Era Median Wage") 
```


```{r}
modT = lm(Median ~ Period, data = TrumpMed)
summary(modT)
```


```{r}
qqmath( ~ resid(modT), type = c("p","r"))
```


```{r}
ObamaMedTotal = subset( Med , Period >= 112 & Period <= 143 )
ObamaMedL3 = subset( Med , Period >= 132 & Period <= 143 )
```


```{r}
xyplot(Median ~ Period, data = ObamaMedTotal , 
        type = c("p","r") ,
        xlab = "All Quarters 2009 to 2016" ,
        ylab = "Obama Era Median Wage") 
```


```{r}
xyplot(Median ~ Period, data = ObamaMedL3 , 
        type = c("p","r") ,
        xlab = "All Quarters 2014 to 2016" ,
        ylab = "Obama Era Median Wage") 
```


```{r}
modB = lm(Median ~ Period, data = ObamaMedTotal)
modBL3 = lm(Median ~ Period, data = ObamaMedL3)
summary(modB)
summary(modBL3)
```


```{r}
confint.lm(modT)
```

```{r}
confint.lm(modT, level = 0.90)
```


```{r}
confint.lm(modB)
```


```{r}
confint.lm(modBL3, level = 0.95)
```


```{r}
qqmath( ~ resid(modB), type = c("p","r"))
```

```{r}
qqmath( ~ resid(modBL3), type = c("p","r"))
```



```{r}
coef(lm(Median ~ Period, data=resample(TrumpMed, size = 10)))
```


```{r}
bootstrap = do(500) * coef(lm(Median ~ Period, data=resample(TrumpMed)))
densityplot(~Period, data=bootstrap)
```


```{r}
qdata(~Period, p=c(0.025, 0.975), data=bootstrap)
```

