Welcome to homework 2! This homework assignment will predominantly be plotting and graphing using the package ‘ggplot2.’ If you have not done so already, ensure that these package is already installed so you can load it successfully below!

Load in Packages


Load in Data


For this assignment, use the ‘HW2data.csv’ on Canvas to answer the following six questions. This dataset consists of 12 variables, two of which you will need to tell R that they are categorical (see the key below). Please make sure you answer each question thoroughly!

Key:

gender (1=male; 2=female)

marital (0 = single; 1 = married)

Load in data here:



Homework Questions (10 points total)


1. Check if there are any missing values in the data. If so, listwise delete all missing values and create a new datset called ‘HW2.Clean’ (1pt).

2. Check the structure of the data. Are variables that should be categorical read as factors? If not, change them in the original dataset to be a factor with appropriate labels. (1pt)

3. Describe the characteristics of depression using ggplot2 using a histogram (1pts).

4. Now using the same histogram you just created, separate ‘depression’ by ‘gender’ in the same histogram plot. Make sure to add a title and legend to the histogram (2pt)

5. Create a bar plot using the ‘marital’ variable in the dataset. Make sure to include a title and label your x and y axes. (2pts)

6. Create a Scatterplot Using ggplot2 between variables “age” and “tenure” (1pt). Make sure to add a ‘smoothing’ fit line to your plot with standard errors, as well as a title and labels for the x and y axes (1pt). Finally, describe the relationship between “age” and “tenure” based on the plot (1pt).

Reminders:


You can write your responses directly into the Rmarkdown document, along with your code.

  • Code will go in the ‘gray boxes,’ whereas your interpretation or answer to the question should come after!!
    • You will need to turn in both the ‘Rmarkdown’ HTML file, as well as the specific Rmarkdown base file that ends in ‘.Rmd’ (both should be saved to your working directory).



Start Actual Coding Below!!!

Question 1: Check if there are any missing values in the data. If so, listwise delete all missing values and create a new datset called ‘HW2.Clean’




Question 2: Check the structure of the data. Are variables that should be categorical read as factors correctly? If not, change them in the dataset to be a ‘factor’ with appropriate labels.

Check structure of data

change levels of one variable here

change levels of the other variable here

Check to make sure you changed the structure of the variables correctly




Question 3: Describe the characteristics of depression using ggplot2 using a histogram




Question 4: Using the same histogram you just created, separate ‘depression’ by ‘gender’ in the same histogram plot. Make sure to add a title and legend to the histogram




Question 5: Create a bar plot using the ‘marital’ variable in the dataset. Make sure to include a title and label your x and y axes.




Question 6: Create a Scatterplot between variables “age” and “tenure.” Add a ‘smoothing’ fit line to your plot with standard errors, as well as a title and labels for the x and y axes. Finally, describe in your own words the relationship between “age” and “tenure” based on the plot (2pts total)


Description of Scatterplot[[insert your explanation anywhere below here]]



Again, don’t forget to ‘knit’ your current file, and turn in BOTH the .html and .Rmd files in Canvas!!!!


 




A work by Your Name Here