If 50% of families subscribe to Disney+, 65% of families subscribe to Netflix, and 85% of families subscribe to at least one of the two, what percentage of the families subscribe to both Disney+ and Netflix?
P(A)=0.5 P(B)=0.65 P(AUB)=P(A) + P(B) - P(AnB) = 0.85 P(AnB) = 0.5 + 0.65 - 0.85 = 0.3
p_disney <- 0.50
p_netflix <- 0.65
p_union <- 0.85
#final answer is: %
p_both <- p_disney + p_netflix - p_union
p_both * 100
## [1] 30
If two dice are rolled, what is the probability that the sum of the two numbers that appear will be even? Will be odd? What is the probability that the difference between the two numbers on the dice will be less than 3? Show your work.
total no. outcomes = 6 * 6 =36 sum_even = (33)2=18 sum_odd = 36-18 = 18 P(sum_even) =18/36 =1/2 P(sum_odd) = 18/36 = 1/2 P(dif3) = (6 + 52 + 42) / 36
dice <- expand.grid(d1 = 1:6, d2 = 1:6)
p_even <- mean((dice$d1 + dice$d2) %% 2 == 0)
p_odd <- mean((dice$d1 + dice$d2) %% 2 == 1)
p_diff <- mean(abs(dice$d1 - dice$d2) < 3)
c(p_even = p_even, p_odd = p_odd, p_diff_lt3 = p_diff)
## p_even p_odd p_diff_lt3
## 0.5000000 0.5000000 0.6666667
Three classes contain 20, 18, and 25 students, respectively, and no student is a member of more than one class. If a team is to be composed of one student from each of the three classes, in how many different ways can the members of the team be chosen?
(20!/(1!19!)) (18!/(1!17!)) (25!/(1!*24!))
ways <- choose(20, 1) * choose(18, 1) * choose(25, 1)
ways
## [1] 9000
Suppose that three runners from team A and three runners from Team B participate in a race. If all six runners have equal ability and there are no ties, what is the probability that the three runners from Team A will finish first, second, and third and the three runners from Team B will finish fourth, fifth, and sixth? 3! * 3! / 6!
factorial(3) * factorial(3) / factorial(6)
## [1] 0.05
Suppose that a box contains one blue card and four red cards, which are labeled A,B,C, and D. Suppose also that two of these five cards are selected at random, without replacement.
If it is known that card A has been selected, what is the probability that both cards are red? P (both_red | A) = 3/4
If it is known that at least one red card has been selected, what is the probability that both cards are red? P(both_red | at_least_one_red) = (4!/2!2!) / ((5!/2!3!)-1) = 6/9 = 2/3
# (a)
p_a <- 3/4
# (b)
total <- choose(5, 2)
both_red <- choose(4, 2)
at_least1_red <- total - choose(1, 2) #result is 0
p_b <- both_red / at_least1_red
p_a
## [1] 0.75
p_b
## [1] 0.6
In a certain city, 30% of the people are Conservative, 50% are Liberals, and 20% are Independents. Records show that in a particular elections, 65% of the Conservatives voted, 82% of the Liberals voted, and 50% of the Independents voted. If a person in the city is selected at random and it is learned that she did not vote in the last election, what is the probability that she is a Liberal? $$ \[ P(C) = 0.30, \quad P(L) = 0.50, \quad P(I) = 0.20 \]
\[ P(\text{NV} \mid C) = 1 - 0.65 = 0.35, \quad P(\text{NV} \mid L) = 1 - 0.82 = 0.18, \quad P(\text{NV} \mid I) = 1 - 0.50 = 0.50 \]
\[ P(\text{NV}) = 0.30(0.35) + 0.50(0.18) + 0.20(0.50) = 0.295 \]
\[ P(L \mid \text{NV}) = \frac{P(L)\,P(\text{NV}\mid L)}{P(\text{NV})} = \frac{0.50 \times 0.18}{0.295} = \frac{0.09}{0.295} \approx 0.305 \]
$$ ### Question 7
Attached with the assignment is the reproduction dataset from Elkjær
and Iverson (2023), named data_analysis_final.dta. Here are
some variables of interest:
| Variable name | Description |
|---|---|
year |
Year |
country |
Country |
red_t1ls |
Relative transfer rate to the top 1% |
red_m20ls |
Relative transfer rate to the middle 20% |
red_b20ls |
Relative transfer rate to the bottom 20% |
pre_p99p100 |
Pretax income share of the top 1% |
pre_p40p60 |
Pretax income share of the middle 20% |
pre_p0p20 |
Pretax income share of the bottom 20% |
cum_left |
Cummulative share of government-controlled parliamentary seats held by left parties since 1980 |
educatt_minimal |
Share of population attending no more than secondary education |
Create a new dataframe such that:
inequality_top;inequality_bottom;inequality_top and
educatt_minimal.library(haven)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(tidyr)
data <- read_dta("data_analysis_final.dta")
df <- data %>%
select(year,country,red_t1ls, red_m20ls, red_b20ls,pre_p99p100, pre_p40p60, pre_p0p20,cum_left,educatt_minimal) %>%
rename(
transfer_top1 = red_t1ls,
transfer_mid20 = red_m20ls,
transfer_bot20 = red_b20ls,
pretax_top1 = pre_p99p100,
pretax_mid20 = pre_p40p60,
pretax_bot20 = pre_p0p20)%>%
mutate(inequality_top = pretax_top1 / pretax_mid20,inequality_bottom = pretax_mid20 / pretax_bot20) %>%
drop_na(inequality_top, educatt_minimal)
head(df)
## # A tibble: 6 × 12
## year country transfer_top1 transfer_mid20 transfer_bot20 pretax_top1
## <dbl> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 2004 Austria -36.5 12.7 47.6 0.112
## 2 2005 Austria -38.8 13.5 50.0 0.110
## 3 2006 Austria -35.5 13.9 50.6 0.130
## 4 2007 Austria -40.9 12.5 49.5 0.112
## 5 2008 Austria -38.0 12.9 50.5 0.120
## 6 2009 Austria -45.9 13.6 52.1 0.106
## # ℹ 6 more variables: pretax_mid20 <dbl>, pretax_bot20 <dbl>, cum_left <dbl>,
## # educatt_minimal <dbl>, inequality_top <dbl>, inequality_bottom <dbl>
From the dataset you created in question X, compute the following:
educatt_minimal) for France;educatt_minimal) for Finland;inequality_top for each country included in the dataset.
Make sure to save these results in an object named gdat.
(Hint: use the byvar argument in the
DAMisc::sumStats() function).library(DAMisc)
## Registered S3 method overwritten by 'broom':
## method from
## nobs.multinom clarkeTest
library(tidyr)
library(dplyr) # Loading this is now optional, but good practice
df_means <- df %>%
dplyr::filter(country %in% c("Finland", "France")) %>%
group_by(country) %>%
summarize(mean_educatt = mean(educatt_minimal, na.rm = TRUE))
print(df_means)
## # A tibble: 2 × 2
## country mean_educatt
## <chr> <dbl>
## 1 Finland 23.3
## 2 France 30.8
gdat <- DAMisc::sumStats(data = df, vars = "inequality_top", byvar = "country")
print(gdat)
## # A tibble: 17 × 12
## variable country mean sd iqr min q25 q50 q75 max n
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
## 1 inequality_t… Austria 0.719 0.102 0.0399 0.520 0.713 0.718 0.753 0.918 13
## 2 inequality_t… Belgium 0.538 0.0385 0.0283 0.457 0.521 0.535 0.549 0.627 13
## 3 inequality_t… Denmark 0.687 0.0800 0.116 0.543 0.641 0.687 0.757 0.812 13
## 4 inequality_t… Finland 0.621 0.0755 0.141 0.518 0.559 0.602 0.700 0.755 13
## 5 inequality_t… France 0.623 0.0470 0.0601 0.541 0.594 0.620 0.654 0.704 14
## 6 inequality_t… Germany 0.758 0.145 0.259 0.526 0.639 0.703 0.898 0.943 25
## 7 inequality_t… Greece 0.696 0.152 0.259 0.459 0.575 0.740 0.834 0.889 13
## 8 inequality_t… Ireland 0.741 0.0984 0.159 0.583 0.657 0.745 0.816 0.886 14
## 9 inequality_t… Italy 0.490 0.0216 0.0227 0.459 0.480 0.484 0.502 0.545 13
## 10 inequality_t… Nether… 0.409 0.0275 0.0221 0.375 0.391 0.410 0.413 0.483 13
## 11 inequality_t… Norway 0.728 0.0898 0.108 0.571 0.689 0.749 0.797 0.844 13
## 12 inequality_t… Portug… 0.788 0.0488 0.0671 0.700 0.756 0.796 0.823 0.851 13
## 13 inequality_t… Spain 0.744 0.0328 0.0269 0.692 0.732 0.739 0.759 0.807 13
## 14 inequality_t… Sweden 0.611 0.0814 0.132 0.514 0.537 0.587 0.670 0.754 13
## 15 inequality_t… Switze… 0.735 0.0730 0.115 0.612 0.683 0.711 0.798 0.867 13
## 16 inequality_t… United… 0.968 0.0786 0.139 0.860 0.905 0.950 1.04 1.10 14
## 17 inequality_t… United… 1.22 0.295 0.483 0.701 1.01 1.25 1.49 1.64 38
## # ℹ 1 more variable: nNA <int>
Produce a bar plot representing the average top-end
inequality rate per country. Use the gdat object you
produced in the previous question.
Since the gdat object was created with DAMisc::sumStats(), it already has a column with the mean
library(ggplot2)
ggplot(gdat, aes(x = mean, y = country)) +
geom_col() +
labs(
x = "Average inequality rate -Top 1% or Mid 20%)",
y = "Country",
title = "Average Top-End Inequality Rate by Country"
) +
theme_light()