Replace “Your Name” with your actual name.

Objective:

In this lab, you will practice creating APA-compliant graphs using ggplot2. Please complete each exercise by filling in the code chunks and interpreting the resulting graphs. Once you have completed the exercises, knit this document to HTML and publish it to RPubs. Make sure your YAML header includes a title, your name, and the date.

Run the below chunk to create the apa_theme()

# Load ggplot2
library(ggplot2)
apa_theme <- theme_classic() +
  theme(
    panel.grid.major = element_blank(),
    panel.grid.minor = element_blank(),
    legend.position = "bottom",
    text = element_text(family = "serif", size = 12),
    axis.title = element_text(size = 12),
    plot.title = element_text(size = 14, hjust = 0.5)
  )

Exercise 1: Creating an APA-Compliant Bar Graph

Dataset: Use the built-in iris dataset to compare the mean sepal length across different species.

Tasks:
1. Create a bar graph that shows the average sepal length for each species in the iris dataset.
2. Add error bars representing the standard error of the mean.
3. Apply APA formatting, including a descriptive title, axis labels, and removing unnecessary grid lines.
4. Interpretation: Describe which species has the longest average sepal length and which has the shortest. Discuss the reliability of these comparisons based on the error bars.

head(iris)
##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1          5.1         3.5          1.4         0.2  setosa
## 2          4.9         3.0          1.4         0.2  setosa
## 3          4.7         3.2          1.3         0.2  setosa
## 4          4.6         3.1          1.5         0.2  setosa
## 5          5.0         3.6          1.4         0.2  setosa
## 6          5.4         3.9          1.7         0.4  setosa
# Create the APA-compliant bar graph

# Load necessary packages
library(ggplot2)
library(dplyr)

# Compute mean and standard error for sepal length by species
iris_summary <- iris %>%
  group_by(Species) %>%
  summarise(
    mean_sepal_length = mean(Sepal.Length),
    se = sd(Sepal.Length) / sqrt(n())
  )

# Create APA-style bar graph
ggplot(iris_summary, aes(x = Species, y = mean_sepal_length)) +
  geom_bar(stat = "identity", fill = "skyblue", color = "black", width = 0.6) +
  geom_errorbar(aes(ymin = mean_sepal_length - se, ymax = mean_sepal_length + se),
                width = 0.2, color = "black") +
  theme_minimal(base_size = 14) +
  theme(
    panel.grid.major.x = element_blank(),
    panel.grid.minor = element_blank()
  ) +
  labs(
    title = "Average Sepal Length by Iris Species",
    x = "Species",
    y = "Mean Sepal Length (cm)"
  )

Interpretation:
The bar graph shows that Iris setosa has the shortest average sepal length, followed by Iris versicolor, while Iris virginica has the longest average sepal length.

The error bars, representing the standard error of the mean, are relatively small and do not overlap substantially between species. This suggests that the observed differences in average sepal length are consistent and likely statistically reliable.

Exercise 2: Modifying a ggplot2 Scatter Plot

Dataset: Use the built-in mtcars dataset to create a scatter plot of hp (horsepower) versus qsec (quarter-mile time).

Tasks:
1. Create a scatter plot showing the relationship between horsepower and quarter-mile time.
2. Add a line that summarizes the overall trend in the data (a line that shows the general direction the data points are heading).
3. Apply APA formatting to ensure the plot is clear and professional.
4. Interpretation: Explain whether cars with higher horsepower tend to have faster or slower quarter-mile times based on the direction of the trend line.

head(mtcars)
##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1
# Create the APA-compliant scatter plot
# Load necessary library
library(ggplot2)

# Create the APA-compliant scatter plot
ggplot(mtcars, aes(x = hp, y = qsec)) +
  geom_point(color = "steelblue", size = 3) +
  geom_smooth(method = "lm", color = "darkred", se = FALSE, linetype = "solid") +
  theme_minimal(base_size = 14) +
  theme(
    panel.grid.major = element_blank(),
    panel.grid.minor = element_blank()
  ) +
  labs(
    title = "Relationship Between Horsepower and Quarter-Mile Time",
    x = "Horsepower",
    y = "Quarter-Mile Time (seconds)"
  )
## `geom_smooth()` using formula = 'y ~ x'

Interpretation:
The scatter plot shows a clear negative relationship between horsepower (hp) and quarter-mile time (qsec). This means that as horsepower increases, the time it takes to complete a quarter-mile generally decreases.

This pattern aligns with performance expectations in vehicles: higher horsepower contributes to better speed and acceleration, making the car more capable of quick sprints like the quarter-mile test.

Exercise 3: Creating an APA-Compliant Line Graph

Dataset: Use the built-in airquality dataset to create a line graph showing the trend of average temperature (Temp) over the months (Month).

Tasks:
1. Calculate the average temperature for each month in the airquality dataset.
2. Create a line graph that shows the trend of average temperature over the months.
3. Apply APA formatting, ensuring the title is centered, axis labels are clear, and the graph is free of unnecessary grid lines.
4. Interpretation: Describe how temperature changes over the months. Identify which month has the highest average temperature and discuss any patterns you observe.

head(airquality)
##   Ozone Solar.R Wind Temp Month Day
## 1    41     190  7.4   67     5   1
## 2    36     118  8.0   72     5   2
## 3    12     149 12.6   74     5   3
## 4    18     313 11.5   62     5   4
## 5    NA      NA 14.3   56     5   5
## 6    28      NA 14.9   66     5   6
# Create the APA-compliant line graph
# Load necessary packages
library(ggplot2)
library(dplyr)

# Calculate average temperature by month
temp_by_month <- airquality %>%
  group_by(Month) %>%
  summarise(avg_temp = mean(Temp, na.rm = TRUE))

# Convert numeric month to factor with labels for better readability
temp_by_month$Month <- factor(temp_by_month$Month,
                               levels = 5:9,
                               labels = c("May", "June", "July", "August", "September"))

# Create the APA-compliant line graph
ggplot(temp_by_month, aes(x = Month, y = avg_temp, group = 1)) +
  geom_line(color = "firebrick", size = 1.2) +
  geom_point(size = 3, color = "firebrick") +
  theme_minimal(base_size = 14) +
  theme(
    panel.grid.major.x = element_blank(),
    panel.grid.minor = element_blank(),
    plot.title = element_text(hjust = 0.5)
  ) +
  labs(
    title = "Average Temperature Across Months in New York (1973)",
    x = "Month",
    y = "Average Temperature (°F)"
  )
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

Interpretation: The line graph demonstrates a seasonal temperature trend in New York during 1973. From May to July, average temperatures steadily increase, indicating the transition from late spring into the peak of summer. The highest average temperature occurs in July, suggesting it was the hottest month.

Submission Instructions:

Ensure to knit your document to HTML format, checking that all content is correctly displayed before submission. Submit the RPubs URL to Canvas Assignments.