Homework2_RazanAlmasood

Problem 1: Step 1

Figure 1. Average bus boardings by hour of day, separated by day of week and colored by month (Sep–Nov 2018). Each facet displays the mean number of boardings in 15-minute intervals from 6 AM to 10 PM, with lines for September, October, and November.The hour of peak boardings is constant through all days of the week between 3PM and 4PM (hours 15-16). There’s only a minor variation around peak hours, this reflects after work/classes rush hour, with the exception of Saturday and Sunday which are weekends. Average boardings on Monday for September (orange line) are lower than October and November Mondays as well as lower than other weekdays in September. This could be due to the adjusting period students and faculty go through at the beginning of the academic year, another reason could be Labor Day Weekend which is observed on the first Monday of September. Boardings on Wednesdays, Thursdays, and Fridays in November drop most likely due to the Thanksgiving holiday.

Problem 1: Step 2

Figure 2. Shows bus boardings per 15 minutes vs. temperature (°F), faceted by hour of day from 6AM to 10PM. The points on the graphs show number of boardings in a given 15-minute interval, color-coded by type of day (weekday or weekend). When holding hour of day and weekend constant the plot shows no strong or consistent relationship between temperature and number of boardings. It seems that factors such as class schedules are stronger drivers of bus boarding activity.

Problem 2: Part (A)

Table 1. Top 10 Billboard songs by total weeks on the Top 100 chart (1958–mid-2021). Songs are ranked by total number of weeks on the chart, with performer, title, and number total weeks on the Top 100 for each song. (Apologies to Abel and Taylor Swift Version)
Performer	Song	Weeks on Chart
Imagine Dragons	Radioactive	87
AWOLNATION	Sail	79
Jason Mraz	I’m Yours	76
The Weeknd	Blinding Lights	76
LeAnn Rimes	How Do I Live	69
LMFAO Featuring Lauren Bennett & GoonRock	Party Rock Anthem	68
OneRepublic	Counting Stars	68
Adele	Rolling In The Deep	65
Jewel	Foolish Games/You Were Meant For Me	65
Carrie Underwood	Before He Cheats	64

Problem 2: Part (B)

Figure 3. Musical diversity on the Billboard Top 100, measured as the number of unique songs appearing on the chart each year (1959-2020). The graphs shows a spike in mid 60’s. Which upon my own research could be explained by the hippies becoming a national phenomenon, there was also a rise in British Boy Bands such as The Beatles (Sadly the members of One Direction were not born yet). After that the chart shows an almost constant dip likely associated the rise of superstars such as Michael Jackson in the 1980’s (Not complaining) The chart now features a much higher number of unique songs per year, possibly thanks to streaming, short attention spans, and the invention of TikTok making random artists famous for only a 10 second clip of their song.

Problem 2: Part (C)

Figure 4. Number of ten-week hits per artist for all performers with at least 30 songs that remined of the Billboard Top 100 for ten or more weeks (1958-mid-2021). Each artist is represented by a bar and the total count of their ten-week hits. Elton John is literally off the charts, followed by Madonna, and Kenny Chesney (I’m soory I only know Madonna, I’m guessing the first two were way before my time. I’m just glad I see Taylor Swift is on the chart)

Problem 3: Part (A)

Expected creatinine clearance for a 55‑year‑old: 113.7 mL/min

Scatter plot of creatinine clearance rate (mL/min) versus age (years), with fitted regression line. The negative slope shows creatinine clearance declines with age.

Regression Analysis

To analyze the relationship between age and creatinine clearance rate, I ran a simple linear regression using R. The regression predicts clearance rate (in mL/min) from age (in years) with the following estimated equation:

\[ \text{creatclear} = \text{intercept} + \text{slope} \times \text{age} \]

Using ordinary least squares (OLS), the estimated intercept is 147.8 mL/min, and the slope is -0.62 mL/min per year. This means for each additional year of age, the expected creatinine clearance rate decreases by about 0.62 mL/min. The regression equation is:

\[ \text{creatclear} = 147.8 - 0.62 \times \text{age} \]

These estimates were obtained using the lm() function in R, regressing creatclear on age using the creatinine dataset.

Problem 3: Part (B)

Creatinine clearance changes by -0.62 mL/min per year of age. age -0.6198159

Regression Slope Interpretation:

The regression slope is –0.62 mL/min per year. Which means on average each year older is associated with a decrease of 0.62 mL/min in expected creatinine clearance rate.

Problem 3: Part (C)

For each person, the residual is the difference between their actual creatinine clearance and the value predicted by the regression equation (residual = actual – predicted).

- 40-year-old: actual = 135 mL/min, predicted = 123 mL/min → residual = 12 mL/min

- 60-year-old: actual = 112 mL/min, predicted = 110.6 mL/min → residual = 1.4 mL/min

Conclusion: The 40-year-old is healthier relative to age than the 60-year-old, since their actual kidney function is further above the age-predicted value.

Problem 4: Part (A)

What is the probability that at least one of the three cars you buy is a lemon?

\[ Pr(\text{Lemon}) = 1 - Pr(\text{No Lemon}) \]

\(Pr(\text{No Lemon})\) is the probability that all three selected cars are not lemons. This is:

\[ Pr(\text{No Lemon}) = \frac{\binom{20}{3}}{\binom{30}{3}} = \frac{1140}{4060} = 0.2808 \]

So,

\[ Pr(\text{Lemon}) = 1 - 0.2808 = 0.7192 \]

The probability of getting at least one lemon is 0.719.

Problem 4: Part (B)

(i) What is the probability that the sum is odd?

There are \(6 \times 6 = 36\) possible outcomes.

A sum is odd if one die shows even and the other odd. - Odd numbers: 1, 3, 5 (3 possibilities per die) - Even numbers: 2, 4, 6 (3 possibilities per die)

So: - First die odd & second die even: \(3 \times 3 = 9\) outcomes - First die even & second die odd: \(3 \times 3 = 9\) outcomes - Total: \(9 + 9 = 18\) outcomes

So,

\[ Pr(\text{sum odd}) = \frac{18}{36} = 0.5 \]

Conclusion: The probability that the sum is odd is 0.5.

(ii) What is the probability that the sum is less than 7?

Let’s count all combinations with sum \(< 7\):

Sum = 2: (1,1)
Sum = 3: (1,2), (2,1)
Sum = 4: (1,3), (2,2), (3,1)
Sum = 5: (1,4), (2,3), (3,2), (4,1)
Sum = 6: (1,5), (2,4), (3,3), (4,2), (5,1)

Count: \(1 + 2 + 3 + 4 + 5 = 15\) outcomes

\[ Pr(\text{sum} < 7) = \frac{15}{36} \approx 0.417 \]

Conclusion: The probability that the sum is less than 7 is 0.417.

(iii) What is the probability that the sum is less than 7, given that it is odd?

We want: \(Pr(\text{sum} < 7 \mid \text{sum odd})\)

Odd sums less than 7: 3, 5
- Sum = 3: (1,2), (2,1) → 2 outcomes
- Sum = 5: (1,4), (2,3), (3,2), (4,1) → 4 outcomes
Sum = 7 is odd, but not less than 7, so not included.

Total “sum < 7 and odd”: \(2 + 4 = 6\) outcomes

From (i), total “sum odd” outcomes: 18

So,

\[ Pr(\text{sum} < 7 \mid \text{sum odd}) = \frac{6}{18} = 0.333 \]

Conclusion: The probability that the sum is less than 7, given it is odd, is 0.333.

(iv) Are these two events independent?

Events A: sum is odd.
Event B: sum is less than 7.

\(Pr(A) = 0.5\)
\(Pr(B) = 0.417\)
\(Pr(B \mid A) = 0.333\)

If independent, \(Pr(B \mid A) = Pr(B)\). But \(0.333 \neq 0.417\).

Conclusion:
These events are not independent because \(Pr(B \mid A) \neq Pr(B)\).

Problem 4: Part (C)

Let \(RC\) = Random Clicker, \(TC\) = Truthful Clicker.
Let \(Y\) = “answered Yes”. Let \(p\) = fraction of Truthful Clickers who answered Yes.

By the rule of total probability,

\[ Pr(Y) = Pr(Y \mid RC) \cdot Pr(RC) + Pr(Y \mid TC) \cdot Pr(TC) \]

Plug in the known values:

\[ 0.65 = 0.5 \times 0.3 + p \times 0.7 \] \[ 0.65 = 0.15 + 0.7p \] \[ 0.5 = 0.7p \] \[ p = \frac{0.5}{0.7} = 0.714 \]

Final answer: The fraction of Truthful Clickers who answered Yes is 0.714.

Problem 4: Part (D)

Given: - Sensitivity (true positive rate): \(Pr(\text{Test}^+ \mid \text{Disease}) = 0.993\) - Specificity (true negative rate): \(Pr(\text{Test}^- \mid \text{No Disease}) = 0.9999\) - Prevalence (prior): \(Pr(\text{Disease}) = 0.000025\) - \(Pr(\text{No Disease}) = 1 - 0.000025 = 0.999975\)

We want: \(Pr(\text{Disease} \mid \text{Test}^+)\)

By Bayes’ theorem: \[ Pr(\text{Disease} \mid \text{Test}^+) = \frac{Pr(\text{Test}^+ \mid \text{Disease}) \cdot Pr(\text{Disease})} {Pr(\text{Test}^+)} \]

Where \[ Pr(\text{Test}^+) = Pr(\text{Test}^+ \mid \text{Disease}) \cdot Pr(\text{Disease}) + Pr(\text{Test}^+ \mid \text{No Disease}) \cdot Pr(\text{No Disease}) \]

And \[ Pr(\text{Test}^+ \mid \text{No Disease}) = 1 - \text{Specificity} = 1 - 0.9999 = 0.0001 \]

Now plug in the numbers:

\[ Pr(\text{Test}^+) = 0.993 \times 0.000025 + 0.0001 \times 0.999975 \\ = 0.000024825 + 0.0000999975 \\ = 0.0001248225 \]

So,

\[ Pr(\text{Disease} \mid \text{Test}^+) = \frac{0.993 \times 0.000025}{0.0001248225} = \frac{0.000024825}{0.0001248225} = 0.199 \]

Final answer: If someone tests positive, the probability that they actually have the disease is 0.199 (about 19.9%).

Problem 4: Part (E)

Given:

\(Pr(A) = 0.05\) # Aircraft is present
\(Pr(\bar{A}) = 1 - 0.05 = 0.95\) Aircraft is not present
\(Pr(R \mid A) = 0.99\) True positive (radar detects when aircraft present)
\(Pr(R \mid \bar{A}) = 0.10\) False positive (radar registers aircraft when absent)

We want \(Pr(A \mid R)\): probability aircraft is present, given radar registers presence.

Bayes’ theorem:

\[ Pr(A \mid R) = \frac{Pr(R \mid A) \cdot Pr(A)}{Pr(R)} \]

where

\[ Pr(R) = Pr(R \mid A) \cdot Pr(A) + Pr(R \mid \bar{A}) \cdot Pr(\bar{A}) \]

Calculate each part:

\[ Pr(R \mid A) \cdot Pr(A) = 0.99 \times 0.05 = 0.0495 \\ Pr(R \mid \bar{A}) \cdot Pr(\bar{A}) = 0.10 \times 0.95 = 0.095 \\ Pr(R) = 0.0495 + 0.095 = 0.1445 \]

Plug into Bayes’ theorem:

\[ Pr(A \mid R) = \frac{0.0495}{0.1445} = 0.3425 \]

Final answer:
If the radar registers an aircraft presence, the probability that an aircraft is actually present is 0.343 (about 34.3%).

Problem 5: Part 1

Question

What are the probabilities of win, draw, and loss for: - Liverpool (home) vs Tottenham (away) - Manchester City (home) vs Arsenal (away) using the Poisson model described by Spiegelhalter and Ng?

Approach

We assume home and away goals are independent, as in the Spiegelhalter and Ng model. The expected goals for each team in a matchup, \(\lambda_{\text{Home}}\) and \(\lambda_{\text{Away}}\), are calculated using these strengths.
We model the number of goals scored by each team as independent Poisson random variables.

The probability of any final score (e.g., Home scores 2, Away scores 1) is: \[ P(\text{Home}=2, \text{Away}=1) = \frac{(\lambda_{\text{Home}})^2 e^{-\lambda_{\text{Home}}}}{2!} \cdot \frac{(\lambda_{\text{Away}})^1 e^{-\lambda_{\text{Away}}}}{1!} \] We then sum these probabilities over all scores to find the overall probabilities of a home win, draw, or away win.

Table 1. 2018–19 EPL attack strengths and defense weaknesses for top 10 teams.
Team	Attack (H)	Defense (H)	Attack (A)	Defense (A)
Arsenal	1.409	0.672	1.303	1.174
Bournemouth	1.007	1.050	1.092	1.510
Brighton	0.638	1.176	0.672	1.074
Burnley	0.805	1.345	0.882	1.208
Cardiff City	0.705	1.597	0.546	1.040
Chelsea	1.309	0.504	1.008	0.906
Crystal Palace	0.638	0.966	1.345	1.007
Everton	1.007	0.882	1.008	0.839
Fulham	0.738	1.513	0.504	1.510
Huddersfield	0.336	1.303	0.504	1.510

Problem 5: Part 2

Table X. Expected goals (λ) for selected EPL fixtures (2018–19).
Home Team	Away Team	λ (Home)	λ (Away)
Liverpool	Tottenham	2.23	0.73
Manchester City	Arsenal	3.52	0.82

Table Y. Win/draw/loss probabilities under the independent Poisson model.
Home Team	Away Team	P(Home Win)	P(Draw)	P(Away Win)
Liverpool	Tottenham	0.715	0.177	0.107
Manchester City	Arsenal	0.862	0.088	0.049

Conclusion

The Poisson model predicts Liverpool has a 71.5% chance to beat Tottenham at home, with a 17.7% chance of a draw and 10.8% chance of a Tottenham win. Manchester City is even more heavily favored, with an 86.2% chance to beat Arsenal at home.
These results show how season performance data can be used to model match outcomes, though real-world outcomes may vary and bookmaker models are more sophisticated.

Poisson PMFs for Home and Away expected goals

Joint distribution of home vs. away goals (Liverpool vs Tottenham)

Poisson PMFs (0–6 goals) for every Premier League team’s home‑vs‑away attack strength