Homework 1

Q1. Statistical Terms

(a) Population: All current UTSA students, considered as the full group of interest.

(b) Sample: The 150 randomly selected UTSA students whose study hours were recorded.

(c) Observation: The number of weekly study hours reported by one student.

(d) Parameter: The true average weekly study hours for all UTSA students (unknown).

(e) Statistic: The sample mean of weekly study hours for the 150 students.

Q2. Sampling Designs

Population: 50,000 households in San Antonio.

(a) Simple Random Sampling (SRS): Randomly select 100 unique households from the list of 50,000, each subset equally likely.

(b) Stratified Sampling: Divide households into groups (e.g., by region or income). Take a random sample from each group, proportional to group size.

(c) Cluster Sampling: Randomly select clusters (e.g., city blocks) and survey every household within selected clusters.

(d) Systematic Sampling: Randomly select a starting point between 1 and 500, then select every 500th household thereafter.

Q3. Endangered Species by Taxon

taxa <- tibble::tribble(
  ~Taxon, ~Species,
  "Birds", 92,
  "Clams", 70,
  "Reptiles", 36,
  "Fish", 115,
  "Snails", 32,
  "Plants", 745,
  "Insects", 44,
  "Crustaceans", 21,
  "Mammals", 74,
  "Amphibians", 22,
  "Arachnids", 12
)

total_species <- sum(taxa$Species)
total_species

## [1] 1263

Pareto Chart

taxa_pareto <- taxa %>% arrange(desc(Species)) %>%
  mutate(Taxon = factor(Taxon, levels = Taxon),
         CumCount = cumsum(Species),
         CumPerc = CumCount / sum(Species))

ggplot(taxa_pareto, aes(x = Taxon, y = Species)) +
  geom_col(fill = "steelblue") +
  geom_line(aes(y = CumPerc * max(Species), group = 1), color = "red") +
  geom_point(aes(y = CumPerc * max(Species)), color = "red") +
  scale_y_continuous(
    name = "Number of Species",
    sec.axis = sec_axis(~ . / max(taxa_pareto$Species),
                        labels = percent_format(accuracy = 1),
                        name = "Cumulative Percent")
  ) +
  labs(title = "Pareto Chart of Endangered Species by Taxon",
       subtitle = paste("Total Species =", total_species)) +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

Pie Chart

taxa %>%
  mutate(perc = Species / sum(Species)) %>%
  ggplot(aes(x = "", y = perc, fill = Taxon)) +
  geom_col(width = 1) +
  coord_polar(theta = "y") +
  labs(title = "Pie Chart of Endangered Species by Taxon", fill = "Taxon") +
  theme_void()

Q4. UTSA Football Scores

utsa_points <- c(28, 10, 7, 45, 20, 27, 38, 45, 44, 48, 51, 24, 44)
summary(utsa_points)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    7.00   24.00   38.00   33.15   45.00   51.00

Histogram

df_points <- data.frame(points = utsa_points)
ggplot(df_points, aes(x = points)) +
  geom_histogram(binwidth = 5, fill = "blue", color = "black") +
  labs(title = "Histogram of UTSA Points", x = "Points", y = "Frequency") +
  theme_minimal()

Boxplot

ggplot(df_points, aes(y = points, x = "")) +
  geom_boxplot(fill = "orange", width = 0.3) +
  labs(title = "Boxplot of UTSA Points", y = "Points") +
  theme_minimal() +
  theme(axis.text.x = element_blank(), axis.ticks.x = element_blank())

Interpretation:
The mean is slightly higher than the median, with several high scores (44–51).
This indicates the distribution is slightly right-skewed, not perfectly symmetric.

Q5. Normal Model for Watermelon Weights

Given \(X \sim \mathcal{N}(\mu=10, \sigma=2)\):

(a) \(P(X > 12)\)

1 - pnorm(12, mean = 10, sd = 2)

## [1] 0.1586553

(b) \(P(8 \leq X \leq 14)\)

pnorm(14, 10, 2) - pnorm(8, 10, 2)

## [1] 0.8185946

(c) \(P(X < 6)\)

pnorm(6, 10, 2)

## [1] 0.02275013

Q6. Set Operations

Let \(S=\{0,1,2,3,4,5,6,7,8,9\}\).
\(A=\{2,3,5,8\}\), \(B=\{1,3,5,7,9\}\), \(C=\{3,4,5,6\}\), \(D=\{0,1,5,8\}\).

(a) \(A \cup B = \{1,2,3,5,7,8,9\}\)
(b) \(A \cap D = \{5,8\}\)
(c) \((A^c \cap D) \cup B = \{0,1,3,5,7,9\}\)
(d) \((S \cap C)^c = \{0,1,2,7,8,9\}\)
(e) \(B \cap C \cap D^c = \{3\}\)

Q7. Probability with Two Classes of 30 Students

Total students: 60. Five friends: J, S, M, K, C.

(a) All five in the same class

choose(55,25)/choose(60,30) + choose(55,30)/choose(60,30)

## [1] 0.05218555

(b) Exactly four together

(choose(5,4)*choose(55,26) + choose(5,1)*choose(55,29)) / choose(60,30)

## [1] 0.3010705

(c) J and S together, M/K/C together in the other class

(choose(55,28) + choose(55,27)) / choose(60,30)

## [1] 0.0646744

Q8. Employment Transitions

Outcome	Union	Nonunion
Same company	28	24
New company (same field)	15	10
New field	5	11
Unemployed (one year)	2	5
Total	50	50

(a) \(P(\text{Union} \mid \text{New company, same field})\)

15 / (15 + 10)

## [1] 0.6

(b) \(P(\text{Unemployed one year} \mid \text{Nonunion})\)

5 / 50

## [1] 0.1