Read in the diabetes data.
diabetes <- read_csv(here("~/Desktop/Rstudio work/project 7/data/diabetes.csv"))
## Rows: 403 Columns: 19
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (3): location, gender, frame
## dbl (16): id, chol, stab.glu, hdl, ratio, glyhb, age, height, weight, bp.1s,...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Run the code on the slide to make a graph. Pay strict attention to spelling, capitalization, and parentheses!
ggplot(data = diabetes, mapping = aes(x = weight, y = hip)) +
geom_point()
Add color, size, alpha, and
shape aesthetics to your graph using the
gender variable. Experiment.
ggplot(
data = diabetes,
mapping = aes(x = weight, y = hip,
color = gender,
size = gender,
alpha = gender,
shape = gender)
) +
geom_point()
Try moving the mapping argument to geom_point(). Add in
any aesthetics you found helpful.
ggplot(data = diabetes, aes(x = weight, y = hip)) +
geom_point(aes(color = gender, shape = gender), alpha = 0.8, size = 2.75)+
scale_color_manual(values = c("female" = "pink", "male" = "#87CEEB"))+
theme_minimal()
Replace this scatterplot with one that draws boxplots. Try typing “geom_” and hitting the tab button to see the functions that start with “geom_”. Try to guess which one to use. If you need help, there’s a copy of the cheatsheet in this folder.
ggplot(diabetes, aes(gender, chol)) +
geom_boxplot()+
theme_minimal()
Make a histogram of the glyhb variable in
diabetes.
ggplot(data = diabetes, aes(x = glyhb)) +
geom_histogram(binwidth = .5, fill = "skyblue", color = "black", alpha = 0.8) +
labs(title = "Histogram of Glyhb Levels", x = "Glyhb", y = "Frequency")+
theme_minimal()
Redo the glyhb plot as a density plot.
ggplot(data = diabetes, aes(x = glyhb)) +
geom_density(fill = "skyblue", color = "black", alpha = 0.8) +
labs(title = "Density Plot of Glyhb Levels", x = "Glyhb", y = "Density")+
theme_minimal()
Make a bar chart of frame colored by
gender. Then, try it with the fill aesthetic
instead of color.
diabetes %>%
drop_na() %>%
ggplot(aes(x = frame, color = gender)) +
geom_bar() +
labs(title = "Bar Chart of Frame by Gender", x = "Frame", y = "Count")+
theme_minimal()
diabetes %>%
drop_na() %>%
ggplot(aes(x = frame, fill = gender)) +
geom_bar(alpha = 0.7) +
labs(title = "Bar Chart of Frame by Gender", x = "Frame", y = "Count")+
theme_minimal()
Take your code for the bar chart above (using the fill
aesthetic). Experiment with different position values:
“dodge”, “fill”, “stack”
diabetes %>%
drop_na() %>%
ggplot(aes(x = frame, fill = gender)) +
geom_bar(position = "dodge", alpha = 0.8) +
labs(title = "Bar Chart of Frame by Gender", x = "Frame", y = "Count")+
scale_fill_manual(values = c("female" = "pink", "male" = "#87CEEB"))+
theme_minimal()
Run the code after every change you make.
linetype aesthetic for gender. Run
it again.geom_smooth() to “black”se = FALSE to the geom_smooth()alpha = .2 in geom_point()?position argument.theme_bw(). Remember to use
+.ggplot(diabetes, aes(weight, hip)) +
geom_jitter(aes(alpha = .2)) +
geom_smooth(aes(linetype = gender), color = "black", se = FALSE) +
labs(title = "Jitter Plot of Weight vs Hip with Gender-Specific Smooth Lines",
x = "Weight",
y = "Hip")+
theme_bw()
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
Use a facet grid by gender and location
ggplot(diabetes, aes(weight, hip)) +
geom_point() +
geom_smooth()+
facet_grid(gender ~ location) +
labs(title = "Scatter Plot of Weight vs Hip with Smooth Line, Faceted by Gender and Location",
x = "Weight",
y = "Hip")
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
datasauRus package. This package includes a
data set called datasaurus_dozen.x and
y. First, group it by dataset, and then
summarize with the cor() function. Call the new variable
corr. What’s it look like?corr. Round it to 2 digits. Then, mutate it
again (or wrap it around your first change) using:
paste("corr:", corr)corrs.library(datasauRus)
corrs <- datasaurus_dozen %>%
group_by(dataset) %>%
summarize(corr = cor(x, y)) %>%
mutate(
corr = round(corr, 2), # Round the correlation to 2 digits
corr = paste("corr:", corr) # Add "corr:" before the rounded value
) %>%
print()
## # A tibble: 13 × 2
## dataset corr
## <chr> <chr>
## 1 away corr: -0.06
## 2 bullseye corr: -0.07
## 3 circle corr: -0.07
## 4 dino corr: -0.06
## 5 dots corr: -0.06
## 6 h_lines corr: -0.06
## 7 high_lines corr: -0.07
## 8 slant_down corr: -0.07
## 9 slant_up corr: -0.07
## 10 star corr: -0.06
## 11 v_lines corr: -0.07
## 12 wide_lines corr: -0.07
## 13 x_shape corr: -0.07
datasaurus_dozen to ggplot() and add
a point geomdataset.data = corrs. You
also need to use aes() in this call to set
label = corr, x = 50, and
y = 110.datasaurus_dozen |>
ggplot(aes(x,y))+
geom_point()+
facet_wrap(~ dataset)+
geom_text(data = corrs, aes(label = corr, x = 50, y = 110), color = "red") +
labs(title = "Scatter Plot of x vs y by Dataset with Correlation Text", x = "x", y = "y")
scale_color_distiller() and
scale_color_viridis_c(). Check the help pages for different
palette options.gender. Try
scale_color_brewer().scale_color_manual(). Use
values = c("#E69F00", "#56B4E9") in the function call.name argument in whatever scale function you’re using.diabetes %>%
ggplot(aes(waist, hip, color = weight)) +
geom_point() +
scale_color_viridis_c() +
labs(title = "Scatter Plot of Waist vs Hip (Viridis Color Scale)")+
theme_minimal()
diabetes %>%
ggplot(aes(waist, hip, color = weight)) +
geom_point() +
scale_color_distiller(palette = "Spectral") +
labs(title = "Scatter Plot of Waist vs Hip (Distiller Color Scale)")+
theme_minimal()
diabetes %>%
ggplot(aes(waist, hip, color = gender)) +
geom_point() +
scale_color_brewer(palette = "Set1") + # Use 'Set1' color palette for gender
labs(title = "Scatter Plot of Waist vs Hip (Gender Color Scale)")+
theme_minimal()
diabetes %>%
ggplot(aes(waist, hip, color = gender)) +
geom_point() +
scale_color_manual(values = c("#E69F00", "#56B4E9"), name = "Gender") +
labs(title = "Scatter Plot of Waist vs Hip with Custom Legend Title") +
theme_minimal()
theme() to change the legend to the bottom with
legend.position = "bottom".axis.ticks
argument to element_blank()element_text(). Check the help page if you don’t know what
option to change.diabetes %>%
ggplot(aes(waist, hip, color = weight)) +
geom_point() +
theme(
legend.position = "bottom",
axis.ticks = element_blank(),
axis.title.x = element_text(size = 20),
axis.title.y = element_text(size = 20)
)
Take a look at the diamonds data set from ggplot2. How
many rows does it have?
str(diamonds)
## tibble [53,940 × 10] (S3: tbl_df/tbl/data.frame)
## $ carat : num [1:53940] 0.23 0.21 0.23 0.29 0.31 0.24 0.24 0.26 0.22 0.23 ...
## $ cut : Ord.factor w/ 5 levels "Fair"<"Good"<..: 5 4 2 4 2 3 3 3 1 3 ...
## $ color : Ord.factor w/ 7 levels "D"<"E"<"F"<"G"<..: 2 2 2 6 7 7 6 5 2 5 ...
## $ clarity: Ord.factor w/ 8 levels "I1"<"SI2"<"SI1"<..: 2 3 5 4 2 6 7 3 4 5 ...
## $ depth : num [1:53940] 61.5 59.8 56.9 62.4 63.3 62.8 62.3 61.9 65.1 59.4 ...
## $ table : num [1:53940] 55 61 65 58 58 57 57 55 61 61 ...
## $ price : int [1:53940] 326 326 327 334 335 336 336 337 337 338 ...
## $ x : num [1:53940] 3.95 3.89 4.05 4.2 4.34 3.94 3.95 4.07 3.87 4 ...
## $ y : num [1:53940] 3.98 3.84 4.07 4.23 4.35 3.96 3.98 4.11 3.78 4.05 ...
## $ z : num [1:53940] 2.43 2.31 2.31 2.63 2.75 2.48 2.47 2.53 2.49 2.39 ...
carat vs. price.
How’s it look?geom_point() with 2d bins.ggplot(diamonds, aes(x = carat, y = price)) +
geom_point() +
labs(title = "Scatterplot of Carat vs. Price", x = "Carat", y = "Price")
# Scatterplot with transparency
ggplot(diamonds, aes(x = carat, y = price)) +
geom_point(alpha = 0.3) +
labs(title = "Scatterplot of Carat vs. Price with Transparency", x = "Carat", y = "Price")
# 2D bins for carat vs. price
ggplot(diamonds, aes(x = carat, y = price)) +
geom_bin2d() +
labs(title = "2D Binned Plot of Carat vs. Price", x = "Carat", y = "Price")
# Hexagonal binning for carat vs. price
ggplot(diamonds, aes(x = carat, y = price)) +
geom_hex() +
labs(title = "Hexagonal Binned Plot of Carat vs. Price", x = "Carat", y = "Price")
hip and pounds for weight). You can use either
labs() or xlab() and ylab()scale_linetype() and set the name
argument to “Sex”.HW7plot<- ggplot(diabetes, aes(weight, hip, linetype = gender)) +
geom_jitter(alpha = .2, size = 2.5) +
geom_smooth(color = "black", se = FALSE) +
labs(
title = "Scatterplot of Weight vs Hip Size by Gender",
x = "Weight (pounds)",
y = "Hip (inches)"
) +
scale_linetype(name = "Sex") +
theme_bw(base_size = 12)
print(HW7plot)
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
Save the last plot and then locate it in the files pane.
ggsave("~/Desktop/Rstudio work/project 7/HW7plot.png",plot = HW7plot)
## Saving 7 x 5 in image
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
datasaurus_dozen |>
filter(dataset == "dino") |>
ggplot(aes(x = x, y = y, color = dataset)) +
geom_point(size = 7, shape = 18, color ="#228B22")+
theme_minimal()+
theme(
axis.text = element_blank(),
axis.ticks = element_blank(),
axis.title = element_blank(),
panel.grid = element_blank()
)