The dataset teengamb concerns a study of teenage gambling in Britain. Make a numerical and graphical summary of the data, commenting on any features that you find interesting. Limit the output you present to a quantity that a busy reader would find sufficient to get a basic understanding of the data.
data("teengamb", package = "faraway")
teengamb$sex <- factor(teengamb$sex)
levels(teengamb$sex) <- c("male","female")
# distribution of gambling
ggplot(teengamb, aes(x = gamble)) +
geom_histogram(binwidth = 10, fill = "blue", color = "black") +
labs(x = "Gamble Amount", y = "Frequency")
# sex
table(teengamb$sex)
##
## male female
## 28 19
ggplot(teengamb, aes(x = sex, y = gamble, fill = sex)) +
geom_boxplot() +
labs(title = "Gambling Amount by Sex", x = "Sex", y = "Gamble Amount")
# parental status
ggplot(teengamb, aes(x = status, y = gamble)) +
geom_point() +
geom_smooth(method = "lm", col = "blue") +
labs(title = "Gambling Amount by Parental Status", x = "Status", y = "Gamble Amount")
## `geom_smooth()` using formula = 'y ~ x'
# individual income
ggplot(teengamb, aes(x = income, y = gamble)) +
geom_point() +
geom_smooth(method = "lm", col = "blue") +
labs(title = "Gambling Amount by Income", x = "Income", y = "Gamble Amount")
## `geom_smooth()` using formula = 'y ~ x'
# verbal ability
ggplot(teengamb, aes(x = verbal, y = gamble)) +
geom_point() +
geom_smooth(method = "lm", col = "blue") +
labs(title = "Gambling Amount by Verbal Ability", x = "Verbal Score", y = "Gamble Amount")
## `geom_smooth()` using formula = 'y ~ x'
Based on the analysis of the teengamb dataset, the
following key insights were observed:
In conclusion, these findings suggest that gender and income are significant predictors of gambling behavior, while verbal ability does not have a notable impact.
ggplot(teengamb, aes(x = income, y = gamble)) +
geom_point(aes(color = sex), size = 3, alpha = 0.6, position = position_jitter(width = 0.3)) +
geom_smooth(aes(group = 1), method = "lm", col = "blue", se = FALSE, linetype ="solid") +
labs(
title = "Gambling Amount by Income and Gender",
x = "Income",
y = "Gambling Amount",
color = "Gender"
) +
theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'
The dataset prostate is from a study on 97 men with prostate cancer who were due to receive a radical prostatectomy. Make a numerical and graphical summary of the data as in the first question. ote: similarly, in the future we will perform regression by treating lpsa as the response, and various subsets of other variables as predictors.
data("prostate", package = "faraway")
# log(cancer volume)
ggplot(prostate, aes(x = lcavol, y = lpsa)) +
geom_point() +
geom_smooth(method = "lm", col = "blue") +
labs(x = "lcavol", y = "lpsa")
## `geom_smooth()` using formula = 'y ~ x'
# log(prostate weight)
ggplot(prostate, aes(x = lweight, y = lpsa)) +
geom_point() +
geom_smooth(method = "lm", col = "blue") +
labs(x = "lweight", y = "lpsa")
## `geom_smooth()` using formula = 'y ~ x'
# age
ggplot(prostate, aes(x = age, y = lpsa)) +
geom_point() +
geom_smooth(method = "lm", col = "blue") +
labs(x = "Age", y = "lpsa")
## `geom_smooth()` using formula = 'y ~ x'
# log(benign prostatic hyperplasia amount)
ggplot(prostate, aes(x = lbph, y = lpsa)) +
geom_point() +
geom_smooth(method = "lm", col = "blue") +
labs(x = "lbph", y = "lpsa")
## `geom_smooth()` using formula = 'y ~ x'
# seminal vesicle invasion
ggplot(prostate, aes(x = svi, y = lpsa)) +
geom_point() +
geom_smooth(method = "lm", col = "blue") +
labs(x = "svi", y = "lpsa")
## `geom_smooth()` using formula = 'y ~ x'
# log(capsular penetration)
ggplot(prostate, aes(x = lcp, y = lpsa)) +
geom_point() +
geom_smooth(method = "lm", col = "blue") +
labs(x = "lcp", y = "lpsa")
## `geom_smooth()` using formula = 'y ~ x'
# Gleason score
ggplot(prostate, aes(x = gleason, y = lpsa)) +
geom_point() +
geom_smooth(method = "lm", col = "blue") +
labs(x = "Gleason Score", y = "lpsa")
## `geom_smooth()` using formula = 'y ~ x'
# percentage Gleason scores 4 or 5
ggplot(prostate, aes(x = pgg45, y = lpsa)) +
geom_point() +
geom_smooth(method = "lm", col = "blue") +
labs(x = "pgg45", y = "lpsa")
## `geom_smooth()` using formula = 'y ~ x'
Based on the analysis of the prostate dataset, the
following key insights were observed:
In conclusion, things like cancer volume, weight, svi, and Gleason score are important to look at when figuring out lpsa.**
(Proposition 6.2 of Review of Matrix Algebra) Let \[A \in \mathbb{R}^{n \times n}\] be a real symmetric and idempotent matrix, and {λ1, …, λn} its eigenvalues. Prove that: (a) λi iseither0or1forall1≤i≤n; (b) tr(A) = rank(A); (c) rank(A) + rank(I − A) = n.
knitr::include_graphics("HW1_3.jpeg")
knitr::include_graphics("HW1_4.jpeg")