install.packages("ggplot2")
trying URL 'http://rspm/default/__linux__/noble/latest/src/contrib/ggplot2_4.0.2.tar.gz'
Content type 'application/x-gzip' length 8482095 bytes (8.1 MB)
==================================================
downloaded 8.1 MB


The downloaded source packages are in
    ‘/tmp/RtmpfVUqIx/downloaded_packages’
install.packages("dplyr")
trying URL 'http://rspm/default/__linux__/noble/latest/src/contrib/dplyr_1.2.1.tar.gz'
Content type 'application/x-gzip' length 1521233 bytes (1.5 MB)
==================================================
downloaded 1.5 MB


The downloaded source packages are in
    ‘/tmp/RtmpfVUqIx/downloaded_packages’
install.packages("GGally")
trying URL 'http://rspm/default/__linux__/noble/latest/src/contrib/GGally_2.4.0.tar.gz'
Content type 'application/x-gzip' length 2003384 bytes (1.9 MB)
==================================================
downloaded 1.9 MB


The downloaded source packages are in
    ‘/tmp/RtmpfVUqIx/downloaded_packages’
# Infection Data Visualizations in R

# Create the data frame
infections <- c(245, 215, 2076, 5023, 189, 195, 123, 116, 3298, 430, 502, 126, 112, 67, 52, 39, 54, 2356, 6781, 120, 2389, 279, 257, 290, 234, 5689, 261, 672, 205)
ufo2010 <- c(2, 6, 2, 59, 0, 1, 1, 0, 115, 0, 0, 0, 0, 0, 0, 0, 6, 4, 2, 7, 2, 9, 2, 29, 10, 169, 1, 40, 16)
pop <- c(25101, 61912, 33341, 409061, 7481, 18675, 25581, 22286, 459598, 3915, 67197, 34365, 3911, 32122, 31459, 2311, 28350, 101482, 19005, 20679, 36745, 162812, 15927, 251417, 153920, 1554720, 16148, 305455, 37276)
df <- data.frame(infections, ufo2010, pop)

# Load necessary libraries
library(ggplot2)
library(dplyr)

# --- 1. Bar Graph: Comparing Infections and UFO Sightings ---
ggplot(df, aes(x = 1:nrow(df))) +
  geom_bar(aes(y = infections, fill = "Infections"), stat = "identity", position = "dodge") +
  geom_bar(aes(y = ufo2010, fill = "UFO Sightings (2010)"), stat = "identity", position = "dodge", alpha = 0.7) +
  scale_fill_manual("Variables", values = c("Infections" = "skyblue", "UFO Sightings (2010)" = "salmon")) +
  labs(x = "Data Point Index", y = "Count", title = "Comparison of Infections and UFO Sightings") +
  theme_minimal() +
  theme(legend.position = "top")

#The grouped bar chart is a reasonable starting point for side-by-side comparison, but using a numeric index (1–30) on the x-axis instead of a meaningful label (county, region, etc.) makes it nearly impossible for a reader to interpret the data in context. The scale disparity between infections (reaching 6,000+) and UFO sightings (near zero) is so extreme that the salmon-colored UFO bars are practically invisible — a dual-axis approach or separate panels would communicate both variables far more effectively.

# --- 2. Line Chart: Trends in Infections and Population ---
ggplot(df, aes(x = 1:nrow(df))) +
  geom_line(aes(y = infections, color = "Infections"), linewidth = 1) +
  geom_line(aes(y = pop, color = "Population"), linewidth = 1, linetype = "dashed") +
  scale_color_manual("Variables", values = c("Infections" = "green", "Population" = "purple")) +
  labs(x = "Data Point Index", y = "Count", title = "Trends in Infections and Population") +
  theme_minimal() +
  theme(legend.position = "top")

#Same scale problem here, and it's arguably worse in a line chart. The infection line (green) flatlines completely at the bottom because population values are in the hundreds of thousands to millions, dwarfing everything else. This chart is essentially showing only the population trend. A secondary y-axis or normalization (e.g., infections per 1,000 residents) is needed to make this meaningful. The dashed line style for population is a nice touch, but it can't overcome the fundamental axis issue.
# --- 3. Scatter Plot: Relationship between Population and Infections ---
ggplot(df, aes(x = pop, y = infections)) +
  geom_point(color = "blue", alpha = 0.6) +
  labs(x = "Population", y = "Number of Infections", title = "Relationship between Population and Number of Infections") +
  theme_minimal()

#This is one of the stronger visuals in the set. Mapping population on the x-axis against infections on the y-axis is conceptually sound and appropriate for exploring a potential relationship. The clustering of points near zero reveals that most areas have both low population and low infections, with a few clear outliers. Adding a regression line (geom_smooth()) would strengthen the interpretation significantly — right now the reader has to guess at the direction of the relationship.

# --- 4. Box Plot: Distribution of Infections ---
ggplot(df, aes(y = infections)) +
  geom_boxplot(fill = "lightcoral") +
  labs(y = "Number of Infections", title = "Distribution of Number of Infections") +
  theme_minimal()

#This plot has a rendering issue, the box is oriented horizontally at the bottom of the chart while the y-axis is labeled "Number of Infections," which creates a confusing mismatch. The x-axis appears to span roughly -0.4 to 0.4, which doesn't correspond to infection counts at all. Several points are flagged as outliers above 2,000, which is consistent with the data, but the overall layout makes this hard to read and interpret accurately. The axis orientation needs to be corrected.
# --- 5. Histogram: Frequency Distribution of UFO Sightings ---
ggplot(df, aes(x = ufo2010)) +
  geom_histogram(binwidth = 5, fill = "orange", color = "black", alpha = 0.7) +
  labs(x = "Number of UFO Sightings (2010)", y = "Frequency", title = "Frequency Distribution of UFO Sightings (2010)") +
  theme_minimal()

#This is actually one of the cleaner charts in the assignment. The heavy right skew is clearly visible, the vast majority of data points cluster near zero sightings, with very few observations extending toward 150+. The gold/yellow color is distinct and readable. One improvement would be to adjust the bin width for more granularity near zero, where most of the action is, and to annotate the outlier bars for context.
# --- 6. Scatter Plot: Relationship between Population and UFO Sightings ---
ggplot(df, aes(x = pop, y = ufo2010)) +
  geom_point(color = "purple", alpha = 0.6) +
  labs(x = "Population", y = "Number of UFO Sightings (2010)", title = "Relationship between Population and UFO Sightings (2010)") +
  theme_minimal()

#Structurally similar to Visual 3, and it works for the same reasons — the axes are correctly assigned and the question being asked (does population predict UFO sightings?) is valid. The one outlier at ~1.5M population and ~170 sightings is visually prominent and worth calling out. Like Visual 3, a trend line would help the reader assess the relationship. The purple color is distinct and works well here.
# --- 7. Scatter Plot: Infections vs. UFOs with Population Size ---
ggplot(df, aes(x = ufo2010, y = infections, size = pop)) +
  geom_point(alpha = 0.6, color = "maroon") +
  scale_size_continuous(name = "Population Size") +
  labs(x = "Number of UFO Sightings (2010)", y = "Number of Infections", title = "Infections vs. UFO Sightings, Size by Population") +
  theme_minimal()

#That's a smart design choice and shows a stronger grasp of multivariate visualization than the earlier charts. The legend is clean and appropriately scaled. A few things worth noting: the point in the upper left (high infections, near-zero UFO sightings) and the large bubble in the upper right (high infections, high UFO sightings, large population) are the two most interesting observations and deserve annotation or callout, they tell a story the reader currently has to hunt for. 
# --- 8. Pair Plot: Overview of Relationships ---
library(GGally)
ggpairs(df) +
  ggtitle("Pair Plot of Infections, UFO Sightings, and Population") +
  theme_minimal()

#This is the most analytically rich visual in the set. Using ggpairs() from GGally is a strong choice — it compactly shows pairwise scatter plots, density curves, and correlation coefficients all in one matrix. The standout finding is the very high correlation between UFO sightings and population (0.942***), which is far stronger than either variable's correlation with infections (~0.58–0.60). This plot essentially does the work of summarizing the whole dataset and should arguably be the centerpiece of the analysis, not the last chart.
---
title: "R Notebook"
output: html_notebook
---
```{r}
install.packages("ggplot2")
```



```{r}
install.packages("dplyr")
```



```{r}
install.packages("GGally")
```




```{r}
# Infection Data Visualizations in R

# Create the data frame
infections <- c(245, 215, 2076, 5023, 189, 195, 123, 116, 3298, 430, 502, 126, 112, 67, 52, 39, 54, 2356, 6781, 120, 2389, 279, 257, 290, 234, 5689, 261, 672, 205)
ufo2010 <- c(2, 6, 2, 59, 0, 1, 1, 0, 115, 0, 0, 0, 0, 0, 0, 0, 6, 4, 2, 7, 2, 9, 2, 29, 10, 169, 1, 40, 16)
pop <- c(25101, 61912, 33341, 409061, 7481, 18675, 25581, 22286, 459598, 3915, 67197, 34365, 3911, 32122, 31459, 2311, 28350, 101482, 19005, 20679, 36745, 162812, 15927, 251417, 153920, 1554720, 16148, 305455, 37276)
```





```{r}
df <- data.frame(infections, ufo2010, pop)

# Load necessary libraries
library(ggplot2)
library(dplyr)

# --- 1. Bar Graph: Comparing Infections and UFO Sightings ---
ggplot(df, aes(x = 1:nrow(df))) +
  geom_bar(aes(y = infections, fill = "Infections"), stat = "identity", position = "dodge") +
  geom_bar(aes(y = ufo2010, fill = "UFO Sightings (2010)"), stat = "identity", position = "dodge", alpha = 0.7) +
  scale_fill_manual("Variables", values = c("Infections" = "skyblue", "UFO Sightings (2010)" = "salmon")) +
  labs(x = "Data Point Index", y = "Count", title = "Comparison of Infections and UFO Sightings") +
  theme_minimal() +
  theme(legend.position = "top")
```

```{r}
#The grouped bar chart is a reasonable starting point for side-by-side comparison, but using a numeric index (1–30) on the x-axis instead of a meaningful label (county, region, etc.) makes it nearly impossible for a reader to interpret the data in context. The scale disparity between infections (reaching 6,000+) and UFO sightings (near zero) is so extreme that the salmon-colored UFO bars are practically invisible — a dual-axis approach or separate panels would communicate both variables far more effectively.
```







```{r}

# --- 2. Line Chart: Trends in Infections and Population ---
ggplot(df, aes(x = 1:nrow(df))) +
  geom_line(aes(y = infections, color = "Infections"), linewidth = 1) +
  geom_line(aes(y = pop, color = "Population"), linewidth = 1, linetype = "dashed") +
  scale_color_manual("Variables", values = c("Infections" = "green", "Population" = "purple")) +
  labs(x = "Data Point Index", y = "Count", title = "Trends in Infections and Population") +
  theme_minimal() +
  theme(legend.position = "top")
```


```{r}
#Same scale problem here, and it's arguably worse in a line chart. The infection line (green) flatlines completely at the bottom because population values are in the hundreds of thousands to millions, dwarfing everything else. This chart is essentially showing only the population trend. A secondary y-axis or normalization (e.g., infections per 1,000 residents) is needed to make this meaningful. The dashed line style for population is a nice touch, but it can't overcome the fundamental axis issue.
```




```{r}
# --- 3. Scatter Plot: Relationship between Population and Infections ---
ggplot(df, aes(x = pop, y = infections)) +
  geom_point(color = "blue", alpha = 0.6) +
  labs(x = "Population", y = "Number of Infections", title = "Relationship between Population and Number of Infections") +
  theme_minimal()
```


```{r}
#This is one of the stronger visuals in the set. Mapping population on the x-axis against infections on the y-axis is conceptually sound and appropriate for exploring a potential relationship. The clustering of points near zero reveals that most areas have both low population and low infections, with a few clear outliers. Adding a regression line (geom_smooth()) would strengthen the interpretation significantly — right now the reader has to guess at the direction of the relationship.
```








```{r}

# --- 4. Box Plot: Distribution of Infections ---
ggplot(df, aes(y = infections)) +
  geom_boxplot(fill = "lightcoral") +
  labs(y = "Number of Infections", title = "Distribution of Number of Infections") +
  theme_minimal()
```




```{r}
#This plot has a rendering issue, the box is oriented horizontally at the bottom of the chart while the y-axis is labeled "Number of Infections," which creates a confusing mismatch. The x-axis appears to span roughly -0.4 to 0.4, which doesn't correspond to infection counts at all. Several points are flagged as outliers above 2,000, which is consistent with the data, but the overall layout makes this hard to read and interpret accurately. The axis orientation needs to be corrected.
```



```{r}
# --- 5. Histogram: Frequency Distribution of UFO Sightings ---
ggplot(df, aes(x = ufo2010)) +
  geom_histogram(binwidth = 5, fill = "orange", color = "black", alpha = 0.7) +
  labs(x = "Number of UFO Sightings (2010)", y = "Frequency", title = "Frequency Distribution of UFO Sightings (2010)") +
  theme_minimal()
```



```{r}
#This is actually one of the cleaner charts in the assignment. The heavy right skew is clearly visible, the vast majority of data points cluster near zero sightings, with very few observations extending toward 150+. The gold/yellow color is distinct and readable. One improvement would be to adjust the bin width for more granularity near zero, where most of the action is, and to annotate the outlier bars for context.
```





```{r}
# --- 6. Scatter Plot: Relationship between Population and UFO Sightings ---
ggplot(df, aes(x = pop, y = ufo2010)) +
  geom_point(color = "purple", alpha = 0.6) +
  labs(x = "Population", y = "Number of UFO Sightings (2010)", title = "Relationship between Population and UFO Sightings (2010)") +
  theme_minimal()
```




```{r}
#Structurally similar to Visual 3, and it works for the same reasons — the axes are correctly assigned and the question being asked (does population predict UFO sightings?) is valid. The one outlier at ~1.5M population and ~170 sightings is visually prominent and worth calling out. Like Visual 3, a trend line would help the reader assess the relationship. The purple color is distinct and works well here.
```





```{r}
# --- 7. Scatter Plot: Infections vs. UFOs with Population Size ---
ggplot(df, aes(x = ufo2010, y = infections, size = pop)) +
  geom_point(alpha = 0.6, color = "maroon") +
  scale_size_continuous(name = "Population Size") +
  labs(x = "Number of UFO Sightings (2010)", y = "Number of Infections", title = "Infections vs. UFO Sightings, Size by Population") +
  theme_minimal()
```




```{r}
#That's a smart design choice and shows a stronger grasp of multivariate visualization than the earlier charts. The legend is clean and appropriately scaled. A few things worth noting: the point in the upper left (high infections, near-zero UFO sightings) and the large bubble in the upper right (high infections, high UFO sightings, large population) are the two most interesting observations and deserve annotation or callout, they tell a story the reader currently has to hunt for. 
```




```{r}
# --- 8. Pair Plot: Overview of Relationships ---
library(GGally)
ggpairs(df) +
  ggtitle("Pair Plot of Infections, UFO Sightings, and Population") +
  theme_minimal()
```




```{r}
#This is the most analytically rich visual in the set. Using ggpairs() from GGally is a strong choice — it compactly shows pairwise scatter plots, density curves, and correlation coefficients all in one matrix. The standout finding is the very high correlation between UFO sightings and population (0.942***), which is far stronger than either variable's correlation with infections (~0.58–0.60). This plot essentially does the work of summarizing the whole dataset and should arguably be the centerpiece of the analysis, not the last chart.
```



