New names:
Rows: 142 Columns: 4
── Column specification
──────────────────────────────────────────────────────── Delimiter: "," dbl
(4): ...1, ...2, Exercise, BMI
ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
Specify the column types or set `show_col_types = FALSE` to quiet this message.
• `` -> `...1`
• `...1` -> `...2`
ANS: I predict that exercise time and BMI have a negative correlation. But at relatively low BMI, the regression line will flatten, because people who exercise a lot won’t keep getting thinner—they tend to keep their weight in a healthy range.
cor(exercise_data$Exercise, exercise_data$BMI)
[1] -0.06447185
What the output indicates?
ANS: Very weak negative correlation, alomost none.
ggplot(exercise_data, aes(x = Exercise, y = BMI)) +geom_point(alpha =0.7) +labs(x ="Exercise", y ="BMI")
???A dinosaur?
Question 2
library(causact)
WARNING: The 'r-causact' Conda environment does not exist. To use the 'dag_numpyro()' function, you need to set up the 'r-causact' environment. Run install_causact_deps() when ready to set up the 'r-causact' environment.
Attaching package: 'causact'
The following objects are masked from 'package:stats':
binomial, poisson
The following objects are masked from 'package:base':
beta, gamma
CPI2017: A integer on a scale of 0_100, the smaller this value the more corrupted a country is.
HDI2017: A measurement of a nation’s level of developement, consists of many criterias such as education and economy.
Question 3
ggplot(corruptDF, aes(x = HDI2017, y = CPI2017)) +geom_point(alpha =0.7)+labs(title ="HDI vs CPI (2017)",x ="Human Development Index (2017)",y ="Corruption Perceptions Index (2017)" )
Describe the relationship that you see.
ANS: There is a strong positive correlation between HDI2017 and CPI2017.
Question 4
ggplot(corruptDF, aes(x = HDI2017, y = CPI2017)) +geom_point(alpha =0.7) +geom_smooth(method ="lm", se =TRUE) +geom_smooth(method ="gam", formula = y ~s(x, k =5), se =FALSE, size =1) +labs(title ="HDI vs CPI (2017)",x ="Human Development Index (2017)",y ="Corruption Perceptions Index (2017)" )
Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.
`geom_smooth()` using formula = 'y ~ x'
What are the differences?
ANS: method = lm creates a straight line while method = gam creates a smooth curve.
Which one do you prefer?
ANS: I prefer the GAM because it’s flatter sections capture the part where the correlation is relatively weak.
Question 5
ggplot(corruptDF, aes(x = HDI2017, y = CPI2017, color = region)) +geom_point(alpha =0.6) +geom_smooth(aes(fill = region),method ="gam", formula = y ~s(x, k =5),se =TRUE, alpha =0.2, linewidth =1) +labs(title ="HDI vs CPI (2017) by Region",x ="HDI2017", y ="CPI2017")
What do you see?
ANS: Colored dots with correspond colored lines all overlapping together.
Are patterns clear or is the graph too cluttered?
ANS: Too cluttered, the GAM lines overlap making pattern unrecognized.
What would be another way to get these trends by region but in a way to would be more legible?
ANS: Facet, because all regions have their own panel while they shared axes, which makes easier for comparison of shape/strength.
ggplot(corruptDF, aes(x = HDI2017, y = CPI2017)) +geom_point(alpha =0.6) +geom_smooth(method ="gam", formula = y ~s(x, k =5), se =FALSE, linewidth =1) +facet_wrap(~ region, ncol =3, scales ="fixed") +labs(title ="HDI vs CPI (2017) by Region", x ="HDI2017", y ="CPI2017")
Question 6
ggplot(corruptDF, aes(x = HDI2017, y = CPI2017)) +geom_point(alpha =0.6) +geom_smooth(method ="gam", formula = y ~s(x, k =5), se =FALSE, linewidth =1) +scale_x_reverse() +facet_wrap(~ region, ncol =3) +labs(title ="HDI vs CPI (2017) — Faceted, X-axis Reversed",x ="HDI2017 (reversed)", y ="CPI2017")
Question 7
final_plot <-ggplot(corruptDF, aes(x = HDI2017, y = CPI2017)) +geom_point(alpha =0.6) +geom_smooth(method ="gam", formula = y ~s(x, k =5), se =FALSE, linewidth =1) +facet_wrap(~ region, ncol =3, scales ="fixed") +labs(title ="Human Development and Corruption Perception (2017)",subtitle ="Trends by Region",x ="Human Development Index",y ="Corruption Perceptions Index",caption ="Sources:\nTransparency International CPI 2017 (CC BY-ND 4.0)\nUNDP HDI (accessed Oct 1, 2018)\nWorld Bank population data (accessed Oct 1, 2018).")+theme(plot.caption =element_text(hjust =0), plot.caption.position ="plot", )