(a) Population: All current UTSA students, considered as the full group of interest.
(b) Sample: The 150 randomly selected UTSA students whose study hours were recorded.
(c) Observation: The number of weekly study hours reported by one student.
(d) Parameter: The true average weekly study hours for all UTSA students (unknown).
(e) Statistic: The sample mean of weekly study hours for the 150 students.
Population: 50,000 households in San Antonio.
(a) Simple Random Sampling (SRS): Randomly select 100 unique households from the list of 50,000, each subset equally likely.
(b) Stratified Sampling: Divide households into groups (e.g., by region or income). Take a random sample from each group, proportional to group size.
(c) Cluster Sampling: Randomly select clusters (e.g., city blocks) and survey every household within selected clusters.
(d) Systematic Sampling: Randomly select a starting point between 1 and 500, then select every 500th household thereafter.
taxa <- tibble::tribble(
~Taxon, ~Species,
"Birds", 92,
"Clams", 70,
"Reptiles", 36,
"Fish", 115,
"Snails", 32,
"Plants", 745,
"Insects", 44,
"Crustaceans", 21,
"Mammals", 74,
"Amphibians", 22,
"Arachnids", 12
)
total_species <- sum(taxa$Species)
total_species
## [1] 1263
taxa_pareto <- taxa %>% arrange(desc(Species)) %>%
mutate(Taxon = factor(Taxon, levels = Taxon),
CumCount = cumsum(Species),
CumPerc = CumCount / sum(Species))
ggplot(taxa_pareto, aes(x = Taxon, y = Species)) +
geom_col(fill = "steelblue") +
geom_line(aes(y = CumPerc * max(Species), group = 1), color = "red") +
geom_point(aes(y = CumPerc * max(Species)), color = "red") +
scale_y_continuous(
name = "Number of Species",
sec.axis = sec_axis(~ . / max(taxa_pareto$Species),
labels = percent_format(accuracy = 1),
name = "Cumulative Percent")
) +
labs(title = "Pareto Chart of Endangered Species by Taxon",
subtitle = paste("Total Species =", total_species)) +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
taxa %>%
mutate(perc = Species / sum(Species)) %>%
ggplot(aes(x = "", y = perc, fill = Taxon)) +
geom_col(width = 1) +
coord_polar(theta = "y") +
labs(title = "Pie Chart of Endangered Species by Taxon", fill = "Taxon") +
theme_void()
utsa_points <- c(28, 10, 7, 45, 20, 27, 38, 45, 44, 48, 51, 24, 44)
summary(utsa_points)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 7.00 24.00 38.00 33.15 45.00 51.00
df_points <- data.frame(points = utsa_points)
ggplot(df_points, aes(x = points)) +
geom_histogram(binwidth = 5, fill = "blue", color = "black") +
labs(title = "Histogram of UTSA Points", x = "Points", y = "Frequency") +
theme_minimal()
ggplot(df_points, aes(y = points, x = "")) +
geom_boxplot(fill = "orange", width = 0.3) +
labs(title = "Boxplot of UTSA Points", y = "Points") +
theme_minimal() +
theme(axis.text.x = element_blank(), axis.ticks.x = element_blank())
Interpretation:
The mean is slightly higher than the median, with several high scores
(44–51).
This indicates the distribution is slightly
right-skewed, not perfectly symmetric.
Given \(X \sim \mathcal{N}(\mu=10, \sigma=2)\):
1 - pnorm(12, mean = 10, sd = 2)
## [1] 0.1586553
pnorm(14, 10, 2) - pnorm(8, 10, 2)
## [1] 0.8185946
pnorm(6, 10, 2)
## [1] 0.02275013
Let \(S=\{0,1,2,3,4,5,6,7,8,9\}\).
\(A=\{2,3,5,8\}\), \(B=\{1,3,5,7,9\}\), \(C=\{3,4,5,6\}\), \(D=\{0,1,5,8\}\).
Total students: 60. Five friends: J, S, M, K, C.
choose(55,25)/choose(60,30) + choose(55,30)/choose(60,30)
## [1] 0.05218555
(choose(5,4)*choose(55,26) + choose(5,1)*choose(55,29)) / choose(60,30)
## [1] 0.3010705
(choose(55,28) + choose(55,27)) / choose(60,30)
## [1] 0.0646744
| Outcome | Union | Nonunion |
|---|---|---|
| Same company | 28 | 24 |
| New company (same field) | 15 | 10 |
| New field | 5 | 11 |
| Unemployed (one year) | 2 | 5 |
| Total | 50 | 50 |
15 / (15 + 10)
## [1] 0.6
5 / 50
## [1] 0.1