Homework 7 problems

My Turn 1

Read in the diabetes data.

diabetes <- read_csv(here("~/Desktop/Rstudio work/project 7/data/diabetes.csv"))
## Rows: 403 Columns: 19
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (3): location, gender, frame
## dbl (16): id, chol, stab.glu, hdl, ratio, glyhb, age, height, weight, bp.1s,...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Run the code on the slide to make a graph. Pay strict attention to spelling, capitalization, and parentheses!

ggplot(data = diabetes, mapping = aes(x = weight, y = hip)) +
geom_point()

My Turn 2

Add color, size, alpha, and shape aesthetics to your graph using the gender variable. Experiment.

ggplot(
  data = diabetes,
  mapping = aes(x = weight, y = hip, 
                color = gender, 
                size = gender, 
                alpha = gender, 
                shape = gender)
) +
  geom_point()

Try moving the mapping argument to geom_point(). Add in any aesthetics you found helpful.

ggplot(data = diabetes, aes(x = weight, y = hip)) +
  geom_point(aes(color = gender, shape = gender), alpha = 0.8, size = 2.75)+
  scale_color_manual(values = c("female" = "pink", "male" = "#87CEEB"))+
  theme_minimal()

My Turn 3

Replace this scatterplot with one that draws boxplots. Try typing “geom_” and hitting the tab button to see the functions that start with “geom_”. Try to guess which one to use. If you need help, there’s a copy of the cheatsheet in this folder.

ggplot(diabetes, aes(gender, chol)) + 
  geom_boxplot()+
  theme_minimal()

My Turn 4

Make a histogram of the glyhb variable in diabetes.

ggplot(data = diabetes, aes(x = glyhb)) +
  geom_histogram(binwidth = .5, fill = "skyblue", color = "black", alpha = 0.8) +
  labs(title = "Histogram of Glyhb Levels", x = "Glyhb", y = "Frequency")+
  theme_minimal()

Redo the glyhb plot as a density plot.

ggplot(data = diabetes, aes(x = glyhb)) +
  geom_density(fill = "skyblue", color = "black", alpha = 0.8) +
  labs(title = "Density Plot of Glyhb Levels", x = "Glyhb", y = "Density")+
  theme_minimal()


My Turn 5

Make a bar chart of frame colored by gender. Then, try it with the fill aesthetic instead of color.

diabetes %>% 
  drop_na() %>% 
  ggplot(aes(x = frame, color = gender)) +
  geom_bar() +
  labs(title = "Bar Chart of Frame by Gender", x = "Frame", y = "Count")+
  theme_minimal()

diabetes %>% 
  drop_na() %>% 
  ggplot(aes(x = frame, fill = gender)) +
  geom_bar(alpha = 0.7) +
  labs(title = "Bar Chart of Frame by Gender", x = "Frame", y = "Count")+
  theme_minimal()

My Turn 6

Take your code for the bar chart above (using the fill aesthetic). Experiment with different position values: “dodge”, “fill”, “stack”

diabetes %>% 
  drop_na() %>% 
  ggplot(aes(x = frame, fill = gender)) +
  geom_bar(position = "dodge", alpha = 0.8) +
  labs(title = "Bar Chart of Frame by Gender", x = "Frame", y = "Count")+
  scale_fill_manual(values = c("female" = "pink", "male" = "#87CEEB"))+
  theme_minimal()

My Turn 7

Run the code after every change you make.

  1. Predict what this code will do. Then run it.
  2. Add a linetype aesthetic for gender. Run it again.
  3. Set the color of geom_smooth() to “black”
  4. Add se = FALSE to the geom_smooth()
  5. It’s hard to see the lines well now. How about setting alpha = .2 in geom_point()?
  6. Jitter the points. You can either change the geom or change the position argument.
  7. Add another layer, theme_bw(). Remember to use +.
ggplot(diabetes, aes(weight, hip)) + 
  geom_jitter(aes(alpha = .2)) +
  geom_smooth(aes(linetype = gender), color = "black", se = FALSE) +
  labs(title = "Jitter Plot of Weight vs Hip with Gender-Specific Smooth Lines", 
       x = "Weight", 
       y = "Hip")+
  theme_bw()
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

My Turn 8

Use a facet grid by gender and location

ggplot(diabetes, aes(weight, hip)) + 
  geom_point() +
  geom_smooth()+
  facet_grid(gender ~ location) +
  labs(title = "Scatter Plot of Weight vs Hip with Smooth Line, Faceted by Gender and Location", 
       x = "Weight", 
       y = "Hip")
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

My Turn 9: Challenge!

  1. Load the datasauRus package. This package includes a data set called datasaurus_dozen.
  2. Use dplyr to summarize the correlation between x and y. First, group it by dataset, and then summarize with the cor() function. Call the new variable corr. What’s it look like?
  3. Mutate corr. Round it to 2 digits. Then, mutate it again (or wrap it around your first change) using: paste("corr:", corr)
  4. Save the summary data frame as corrs.
library(datasauRus)

corrs <- datasaurus_dozen %>%
  group_by(dataset) %>%
  summarize(corr = cor(x, y)) %>%
  mutate(
    corr = round(corr, 2), # Round the correlation to 2 digits
    corr = paste("corr:", corr) # Add "corr:" before the rounded value
  ) %>%
  print()
## # A tibble: 13 × 2
##    dataset    corr       
##    <chr>      <chr>      
##  1 away       corr: -0.06
##  2 bullseye   corr: -0.07
##  3 circle     corr: -0.07
##  4 dino       corr: -0.06
##  5 dots       corr: -0.06
##  6 h_lines    corr: -0.06
##  7 high_lines corr: -0.07
##  8 slant_down corr: -0.07
##  9 slant_up   corr: -0.07
## 10 star       corr: -0.06
## 11 v_lines    corr: -0.07
## 12 wide_lines corr: -0.07
## 13 x_shape    corr: -0.07
  1. Pass datasaurus_dozen to ggplot() and add a point geom
  2. Use a facet (wrap) for dataset.
  3. Add a text geom. For this geom, set data = corrs. You also need to use aes() in this call to set label = corr, x = 50, and y = 110.
datasaurus_dozen |> 
  ggplot(aes(x,y))+
  geom_point()+
  facet_wrap(~ dataset)+
  geom_text(data = corrs, aes(label = corr, x = 50, y = 110), color = "red") +  
  labs(title = "Scatter Plot of x vs y by Dataset with Correlation Text", x = "x", y = "y")

My Turn 10

  1. Change the color scale by adding a scale layer. Experiment with scale_color_distiller() and scale_color_viridis_c(). Check the help pages for different palette options.
  2. Set the color aesthetic to gender. Try scale_color_brewer().
  3. Set the colors manually with scale_color_manual(). Use values = c("#E69F00", "#56B4E9") in the function call.
  4. Change the legend title for the color legend. Use the name argument in whatever scale function you’re using.
diabetes %>% 
  ggplot(aes(waist, hip, color = weight)) + 
  geom_point() +
  scale_color_viridis_c() +  
  labs(title = "Scatter Plot of Waist vs Hip (Viridis Color Scale)")+
  theme_minimal()

diabetes %>% 
  ggplot(aes(waist, hip, color = weight)) + 
  geom_point() +
  scale_color_distiller(palette = "Spectral") + 
  labs(title = "Scatter Plot of Waist vs Hip (Distiller Color Scale)")+
  theme_minimal()

diabetes %>% 
  ggplot(aes(waist, hip, color = gender)) + 
  geom_point() +
  scale_color_brewer(palette = "Set1") +  # Use 'Set1' color palette for gender
  labs(title = "Scatter Plot of Waist vs Hip (Gender Color Scale)")+
  theme_minimal()

diabetes %>% 
  ggplot(aes(waist, hip, color = gender)) + 
  geom_point() +
  scale_color_manual(values = c("#E69F00", "#56B4E9"), name = "Gender") + 
  labs(title = "Scatter Plot of Waist vs Hip with Custom Legend Title") +
  theme_minimal()

My Turn 11

  1. Change the theme using one of the built-in theme functions.
  2. Use theme() to change the legend to the bottom with legend.position = "bottom".
  3. Remove the axis ticks by setting the axis.ticks argument to element_blank()
  4. Change the font size for the axis titles. Use element_text(). Check the help page if you don’t know what option to change.
diabetes %>% 
  ggplot(aes(waist, hip, color = weight)) + 
  geom_point() +
  theme(
    legend.position = "bottom", 
    axis.ticks = element_blank(),
    axis.title.x = element_text(size = 20), 
    axis.title.y = element_text(size = 20)
  )

My Turn 12

Take a look at the diamonds data set from ggplot2. How many rows does it have?

str(diamonds)
## tibble [53,940 × 10] (S3: tbl_df/tbl/data.frame)
##  $ carat  : num [1:53940] 0.23 0.21 0.23 0.29 0.31 0.24 0.24 0.26 0.22 0.23 ...
##  $ cut    : Ord.factor w/ 5 levels "Fair"<"Good"<..: 5 4 2 4 2 3 3 3 1 3 ...
##  $ color  : Ord.factor w/ 7 levels "D"<"E"<"F"<"G"<..: 2 2 2 6 7 7 6 5 2 5 ...
##  $ clarity: Ord.factor w/ 8 levels "I1"<"SI2"<"SI1"<..: 2 3 5 4 2 6 7 3 4 5 ...
##  $ depth  : num [1:53940] 61.5 59.8 56.9 62.4 63.3 62.8 62.3 61.9 65.1 59.4 ...
##  $ table  : num [1:53940] 55 61 65 58 58 57 57 55 61 61 ...
##  $ price  : int [1:53940] 326 326 327 334 335 336 336 337 337 338 ...
##  $ x      : num [1:53940] 3.95 3.89 4.05 4.2 4.34 3.94 3.95 4.07 3.87 4 ...
##  $ y      : num [1:53940] 3.98 3.84 4.07 4.23 4.35 3.96 3.98 4.11 3.78 4.05 ...
##  $ z      : num [1:53940] 2.43 2.31 2.31 2.63 2.75 2.48 2.47 2.53 2.49 2.39 ...
  1. Make a scatterplot of carat vs. price. How’s it look?
  2. Try adjusting the transparency.
  3. Replace geom_point() with 2d bins.
  4. Try hex bins.
ggplot(diamonds, aes(x = carat, y = price)) +
  geom_point() +
  labs(title = "Scatterplot of Carat vs. Price", x = "Carat", y = "Price")

# Scatterplot with transparency
ggplot(diamonds, aes(x = carat, y = price)) +
  geom_point(alpha = 0.3) + 
  labs(title = "Scatterplot of Carat vs. Price with Transparency", x = "Carat", y = "Price")

# 2D bins for carat vs. price
ggplot(diamonds, aes(x = carat, y = price)) +
  geom_bin2d() +
  labs(title = "2D Binned Plot of Carat vs. Price", x = "Carat", y = "Price")

# Hexagonal binning for carat vs. price
ggplot(diamonds, aes(x = carat, y = price)) +
  geom_hex() +
  labs(title = "Hexagonal Binned Plot of Carat vs. Price", x = "Carat", y = "Price")

My Turn 13

  1. Add a title.
  2. Change the x and y axis labels to include the units (inches for hip and pounds for weight). You can use either labs() or xlab() and ylab()
  3. Add scale_linetype() and set the name argument to “Sex”.
HW7plot<- ggplot(diabetes, aes(weight, hip, linetype = gender)) +
  geom_jitter(alpha = .2, size = 2.5) +  
  geom_smooth(color = "black", se = FALSE) +  
  labs(
    title = "Scatterplot of Weight vs Hip Size by Gender",  
    x = "Weight (pounds)",  
    y = "Hip (inches)"  
  ) +
  scale_linetype(name = "Sex") +
  theme_bw(base_size = 12) 
  print(HW7plot)
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

My Turn 14

Save the last plot and then locate it in the files pane.

ggsave("~/Desktop/Rstudio work/project 7/HW7plot.png",plot = HW7plot)
## Saving 7 x 5 in image
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
datasaurus_dozen |> 
  filter(dataset == "dino") |> 
  ggplot(aes(x = x, y = y, color = dataset)) +
  geom_point(size = 7, shape = 18, color ="#228B22")+  
  theme_minimal()+
  theme(
    axis.text = element_blank(),  
    axis.ticks = element_blank(),
    axis.title = element_blank(),
    panel.grid = element_blank()
)