Problem 1: Playlists Revisited

Part A: Daft Punk Listeners by David Bowie Status

Table1.1: P(DaftPunk|DavidBowie)
David.Bowie x Daft Punk Daft Punk
0 0.925 0.912
1 0.075 0.088

Table 1.1 shows the conditional probability of streaming Daft Punk if the person also streams David Bowie. Users who play David Bowie have a 8.8% chance of also streaming Daft Punk, and those who do not play Bowie have a 7.5% chance. This suggests a slight positive association or dependence between Bowie and Daft Punk fans.

Part B: Johnny Cash Listeners by Pink Floyd Status

Table 1.2: P(Johnny Cash | Pink Floyd) vs. P(Not Johnny Cash | Pink Floyd)
Pink.Floyd x Johnny Cash Johnny Cash
0 0.945 0.895
1 0.055 0.105

Interpretation

People who listen to Johnny Cash but not Pink Floyd are only 5.5%, while people who listen to Johnny Cash and Pink Floyd are 10.5%. The increse from 5.5% to 10.5% is not a big jump in absolute terms, however in relative terms there is about 91% increase in the likelyhood that someone who streams Pink Floyd also streams Johnny Cash. These events seem dependent.

Problem 2: Super Bowl ads

Part A: Danger in Ads by Humor

Table 2.A: Probability of Danger in Super Bowl Ads, Overall and by Humor
Condition Probability
Overall 0.30
Funny = TRUE 0.39
Funny = FALSE 0.12

Interpretation:

The probability of any AD to be in danger category is around 30%, while the probability of a funny ad to be in danger category is 39%, if the ad is not funny the probability of it containing danger is 12%. This shows us that funny ads are more likely to contain danger in them.

Part B: Animals in Ads by Sexual Appeal

Table 2.B: Probability of Animal Content, Overall and by Sexual Appeal
Condition Probability
Overall 0.37
Use_Sex = TRUE 0.38
Use_Sex = FALSE 0.37

Interpretation:

there are 37% ads with animals in them, the probability of an ad that uses sex to contain animals is 38%, while the probability of an ad that doesn’t use sex to have an animal is 37%. The events are completely independent.

Part C: Celebrity in Ads by Patriotism

Table 2.C: Probability of Celebrity Appearances, Overall and by Patriotism
Condition Probability
Overall 0.29
Patriotic = TRUE 0.29
Patriotic = FALSE 0.29

Interpretation:

around 29% of ads have celebrity in them, 29% of patriotic ads have celebrities in them, and 29% of ads that don’t have patriotic have celebrities in them. These events seem independent.

Problem 3:

Part A: Distribution of Evaluation Scores

Histogram of course evaluation scores (binwidth = 0.25).

Histogram of course evaluation scores (binwidth = 0.25).

Interpretation:evaluation scores mostly fall between 3.0 and 5.0, with is a peak around 4 with a skewness to the right. This shows us that most sampled student rating is high.

Part B: Evaluation by Native English Speaker

Boxplots of evaluation scores for native vs. non‑native English speakers.

Boxplots of evaluation scores for native vs. non‑native English speakers.

Interpretation:

these boxplots show us the difference between native and non-native instructors, native instructors have a median above 4.0 vs. 3.7 for non-natives, the interquartile overlaps with lower outliers for the native.

Part C: Distribution by Gender

Histograms of evaluation scores by gender, stacked vertically.

Histograms of evaluation scores by gender, stacked vertically.

Interpretation:

the shape of the histograms is almost identical with the females peak at 4 and males peak around 4 to 4.5.

Part D: Beauty vs. Evaluation Scatter:

Scatterplot of beauty (centered) vs. evaluation score with linear fit.

Scatterplot of beauty (centered) vs. evaluation score with linear fit.

Interpretation:

the scatterplot plus linear fit has a slope of roughly 0.1 which means a one point increase in beauty is associated with a 0.1 point rise in evaluation this is a very small positive relationship.

Problem 4: Summary Statistics

Table 4.1: Summary statistics for SAT Verbal, SAT Quantitative, and GPA.
Variable Mean SD IQR P5 P25 Median P75 P95
SAT Verbal 595.049 83.768 110.000 460.000 540.000 590.000 650.000 730.000
SAT Quantitative 619.979 83.082 120.000 480.000 560.000 620.000 680.000 760.000
GPA 3.212 0.480 0.723 2.361 2.872 3.252 3.595 3.921

Interpretation:

SAT Quantitative has a mean of 619.98, and a median of 620 which is almost perfectly symmetric, on the other hand SAT Verbal has a mean of 595.05, and a median of 590 which is slightly right‑skewed. Both SAT sections have the same standard deviation of approximately 83, but Quantitative has a larger interquartile range showing a somewhat wider mid‑spread. for GPA The 5th percentiles of Verbal and Quant are below the medians, showing a longer left tail of lower scores. The 95th percentiles extend to a large extent above the medians, reflecting top scorers.Neither distribution is heavily skewed. GPA clusters around 3.2 with tighter spread.

Problem 5: Bike Share Data Analysis

Part A: Average Hourly Rentals (All Days)

Average hourly bike rentals across all days.

Average hourly bike rentals across all days.

Interpretation:

bike usage goes down around midnight, and peaks around the begining of working hours starting at 8 AM, another peak happens around 5 PM end of working hours. This reflects work-commute behavior.

Part B: Rentals by Working Day Status

Hourly rentals on working vs. non‑working days.

Hourly rentals on working vs. non‑working days.

Interpretation:

working days have higher overall rentals with two peaks like we saw in the previous graph around working hours 8 AM and 5 PM, on the other hand non-working hours are generally flatter showing a slight peak around noon indicating leisure.

Part C: 9AM Rentals by Weather and Day Type

9 AM rentals by weather situation, split by working vs. non‑working days.

9 AM rentals by weather situation, split by working vs. non‑working days.

Interpretation:

working days are almost the same on clear and misty days but go down substantially on light drizzle days, on the other hand non-working days are much more responsive to weather conditions. on clear days rentals are higher and go down on misty days and even further down on light drizzle days.