For each of the following pairs of graphs, identify features that communicate better on one version than the other.
The graph on the left is a mosaic plot with distinct colors for survived and not survived for easy distinction between groups. The graph on the right has a larger font for the labels for easier readability. (Answer doesn’t have to be exact but something similar).
The graph on the left has shorter axis labels which makes it easier to read. The graph on the right has a larger data points depicting better distributions of data. (Answer doesn’t have to be exact but something similar).
Let’s use the data from “countries.csv” to practice making some graphs.
countries <- read.csv("C:\\BI412L\\ABDLabs\\ABDLabs\\DataForLabs\\countries.csv", stringsAsFactors = TRUE)
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 4.4.2
The package ggplot2 helps us customize our visuals to depict the messages we want to convey so its important we load this package for us to use the ggplot functions for creating our visuals (graphs).
ggplot(countries, aes(x = measles_immunization_oneyearolds)) + geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 2 rows containing non-finite outside the scale range
## (`stat_bin()`).
Distribution is left (or negatively) skewed.
ggplot(countries, aes(x = continent)) + geom_bar(stat = "count")
ggplot(countries, aes(x = life_expectancy_at_birth_male, y = life_expectancy_at_birth_female)) + geom_point()
## Warning: Removed 13 rows containing missing values or values outside the scale range
## (`geom_point()`).
There is a strong positive correlation/association between male life expectancy at birth and female life expectancy at birth.
The ecological footprint is a widely-used measure, developed at UBC, of the impact a person has on the planet. It measures the area of land (in hectares) required to generate the food, shelter, and other resources used by a typical person and required to dispose of that person’s wastes. Larger values of the ecological footprint indicate that the typical person from that country uses more resources. The countries data set has two variables for many countries showing the ecological footprint of an average person in each country. ecological_footprint_2000 and ecological_footprint_2012 show the ecological footprints for the years 2000 and 2012, respectively.
ggplot(countries, aes(x = ecological_footprint_2000 , y = ecological_footprint_2012)) + geom_point()
## Warning: Removed 150 rows containing missing values or values outside the scale range
## (`geom_point()`).
We can see a positive correlation, but it does not reveal a strong correlation due to the possible outliers. I would not say the value of ecological footprint of 2000 seem to predict anything about its value in 2012 due to this variation towards the right end in the data.
ggplot(countries, aes(x = ecological_footprint_2000 , y = ecological_footprint_2012)) + geom_point() + geom_abline(intercept = 0, slope = 1)
## Warning: Removed 150 rows containing missing values or values outside the scale range
## (`geom_point()`).
Most points seem to be close to or above the one-to-one line, suggesting that the ecological footprint tends to have either remained similar or increased between 2000 and 2012. The points farther from the one-to-one line (in either direction) represent the countries with larger changes in their ecological footprint. Based on the graph, it seems that countries with a lower ecological footprint in 2000 tend to show more variation, both increasing and decreasing. In contrast, countries with a higher ecological footprint in 2000 tend to show smaller changes over this period, as most of these points remain closer to the one-to-one line.
Use the countries data again. Plot the relationship between continent and female life expectancy at birth. Describe the patterns that you see.
ggplot(countries, aes(x = continent , y = life_expectancy_at_birth_female)) + geom_boxplot()
## Warning: Removed 13 rows containing non-finite outside the scale range
## (`stat_boxplot()`).
This box plot shows the life expectancy at birth for females across different continents. Africa has the lowest median life expectancy with a wide range, indicating significant variability among countries, and it also includes lower outliers. Asia displays a higher median than Africa but with a similarly large range, suggesting varied life expectancies across countries within the continent. Europe and North America have higher and more consistent life expectancies, with narrower interquartile ranges, indicating less variability. Oceania has a similar range and median to Europe and North America but includes a few lower outliers. South America shows a somewhat lower median than Europe and North America, with a moderate range and one outlier. Overall, Europe and North America have the highest and most stable life expectancies, while Africa and Asia show greater variability and generally lower life expectancies.(Can be something similar but plot must be a boxplot).
Muchhala (2006) measured the length of the tongues of eleven different species of South American bats, as well as the length of their palates (to get an indication of the size of their mouths). All of these bats use their tongues to feed on nectar from flowers. Data from the article are given in the file “BatTongues.csv”. In this file, both Tongue Length and Palette Length are given in millimeters.
bat_tounges <- read.csv("C:\\BI412L\\ABDLabs\\ABDLabs\\DataForLabs\\BatTongues.csv", stringsAsFactors = TRUE)
ggplot(bat_tounges, aes(x = palate_length , y = tongue_length)) + geom_point()
The scatter plot displays the relationship between palate length (x-axis) and tongue length (y-axis). The association appears to be weak, as the points are widely scattered with no clear pattern or trend. There is no strong, consistent increase or decrease in tongue length as palate length changes, suggesting that if there is any relationship, it is weak. Additionally, the association could be considered positive since there is a slight tendency for higher palate lengths to correspond with higher tongue lengths, but the relationship is not pronounced.
In nature, biological variation is bound to exist. The individual or species represented by this data point might naturally have a disproportionately long tongue relative to palate length, highlighting an exceptional case.
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
filter(bat_tounges,tongue_length> 80)
## species palate_length tongue_length
## 1 Anoura fistulata 12.4 85.2
The species that is an outlier is Anoura fisulata.
Import the data set collected on your class from the first lab. (We’ll return to some of the other variables later in the term.)
student_data_lab1 <- read.csv("C:\\BI412L\\Lab 3 Graphics\\BI412L Student Data.csv", stringsAsFactors = TRUE)
ggplot(student_data_lab1, aes(x = Sex , y = Height_cm)) + geom_boxplot()
Based off just the boxplot, there seeems to be no outliers because (absent of outlier points) and it does seem like everyone used the same unit since our y-axis is within reasonable range.
ggplot(student_data_lab1, aes(x = Head_Circumference_cm)) + geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Note: Students may have picked a different binwidth and that is okay, you may mark them correct.
Pick one of the plots you made using R today. What could be improved about this graph to make it a more effective presentation of the data?
Answer may vary, as long as it seems reasonable (i.e. they’ve justified their answers) why you may mark it correct.