Basic mathematical operations:
#This is a chunk. This is where R code goes.
An ``object’’ is like a jar or container. An object can be anything: a date set, a number, a function, a subset of data, and so on and son.
We can manipulate objects, allowing them to work for us.
Code: four is an ``object’’ containing the number 4
#This is a chunk
The “object” is the number four.
#This is a chunk
In turn, we can turn this into an object. On numerical objects, we can apply functions.
#This is a chunk
The logic of functions: \(function\_name(argument)\)
\(\sqrt4\)
\(4^2\)
\(\sqrt{century}\)
Connecting migrant deaths to temperature:
Daily temperatures recorded based on a weather station in Tucson, AZ
Source: https://www.wunderground.com/history/monthly/us/az/tucson/KTUS/date/2022-4
Let’s look at the data
mdy="https://raw.githubusercontent.com/mightyjoemoon/POL51/main/monthlydeathsbyyear_wtemps.csv"
mdy<-read_csv(url(mdy))
summary(mdy)
## year month obs yearmonth remainsmonth
## Min. :2020 Min. : 1.00 Min. :1 Min. :202001 Min. : 3.00
## 1st Qu.:2021 1st Qu.: 3.75 1st Qu.:1 1st Qu.:202104 1st Qu.:11.00
## Median :2022 Median : 6.50 Median :1 Median :202206 Median :14.00
## Mean :2022 Mean : 6.50 Mean :1 Mean :202206 Mean :15.85
## 3rd Qu.:2023 3rd Qu.: 9.25 3rd Qu.:1 3rd Qu.:202309 3rd Qu.:18.25
## Max. :2024 Max. :12.00 Max. :1 Max. :202412 Max. :44.00
## NA's :239 NA's :239 NA's :239 NA's :239 NA's :239
## avg_temp number_dead ...8 ...9 ...10
## Min. :50.74 Min. : 0.000 Mode:logical Mode:logical Mode:logical
## 1st Qu.:57.95 1st Qu.: 2.000 NA's:299 NA's:299 NA's:299
## Median :72.04 Median : 4.000
## Mean :71.79 Mean : 5.717
## 3rd Qu.:84.84 3rd Qu.: 8.000
## Max. :93.93 Max. :29.000
## NA's :239 NA's :239
## ...11
## Mode:logical
## NA's:299
##
##
##
##
##
Describe central tendency of average temperature using the mean and median. Observe the use of “$”.
Plots of data are revealing in a way much better than tables. Measurement alert: how do we record migrant deaths? How do I handle this?
splot<- ggplot(mdy, aes(x=avg_temp, y=number_dead)) +
geom_point(size=2, shape=19, color="coral2") +
geom_smooth(se=FALSE, color="black", linewidth=.5) +
geom_smooth(method=lm, se=FALSE, color="black", linewidth=.5) +
labs(title="Number of contemporaneous deaths by month and average monthly temperature, \n2021-2025", y="Number of deaths", x="Average temperature") +
theme_classic()
splot
Connects an \(x\) variable to a \(y\) variable
Gets us thinking about correlation
Another way to do the same thing: bubble plot
splot<- ggplot(mdy, aes(x=avg_temp, y=number_dead)) +
geom_point(aes(size=number_dead), color="coral2") +
labs(title="Number of contemporaneous deaths by month and average monthly temperature, \n2021-2025", y="Number of deaths", x="Average temperature") +
theme_classic()+
theme(legend.position="none")
splot
#A little trick: mean-centering
mean(mdy$avg_temp, na.rm=TRUE)
## [1] 71.79149
mdy$mc_temp<- mdy$avg_temp - mean(mdy$avg_temp, na.rm=TRUE)
table(mdy$mc_temp)
##
## -21.0495485 -20.1656785 -19.5664885 -19.1611685
## 1 1 1 1
## -18.8664885 -18.6043885 -18.5237485 -18.1785885
## 1 1 1 1
## -17.6986285 -17.2052785 -16.4656785 -15.8785885
## 1 1 1 1
## -15.4570085 -14.5734885 -14.0579185 -13.7727785
## 1 1 1 1
## -13.3974885 -12.6882585 -12.2334285 -11.0495485
## 1 1 1 1
## -10.7301985 -9.7602385 -8.1052785 -7.4048185
## 1 1 1 1
## -6.4430985 -3.7248185 -1.4081585 -1.0366485
## 1 1 1 1
## -0.830198499999995 -0.585038499999996 1.0917415 1.1225115
## 1 1 1 1
## 4.29288150000001 4.5375415 6.35238150000001 6.5633515
## 1 1 1 1
## 6.68270149999999 8.0652815 8.9304515 10.0375415
## 1 1 1 1
## 10.5116315 12.0245115 12.8933615 12.9385115
## 1 1 1 1
## 12.9485115 13.3620815 14.1052815 14.5236615
## 1 1 1 1
## 15.2525115 16.1051815 16.5539615 17.0903315
## 1 1 1 1
## 17.5328815 17.6439415 17.9647615 18.9485115
## 1 1 1 1
## 19.0428015 19.6526315 19.7085115 22.1427215
## 1 1 1 1
#My first regression (maybe not)
myreg <- lm(number_dead ~ mc_temp, data=mdy)
summary(myreg)
##
## Call:
## lm(formula = number_dead ~ mc_temp, data = mdy)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.4355 -2.6748 0.0672 1.7763 18.1509
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.71667 0.55609 10.28 0.000000000000011 ***
## mc_temp 0.27086 0.03983 6.80 0.000000006335887 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.307 on 58 degrees of freedom
## (239 observations deleted due to missingness)
## Multiple R-squared: 0.4436, Adjusted R-squared: 0.434
## F-statistic: 46.25 on 1 and 58 DF, p-value: 0.000000006336
For some of you, you probably just “ran” your first linear regression model.
\(y=mx + b\)
```