library(haven)
setwd("/Users/isaiahmireles/Desktop")
dat <- read_dta("SamplingPrac/usfacts.dta")
dat2 <- read_dta("SamplingPrac/usfacts.dta")
# class(dat) # understand obj type
length(dat$state)
## [1] 51
Now I may b dumb but last I checked there are 50…
dat
print("found it : ")
## [1] "found it : "
dat[dat$state=="District of Columbia",]
# which(dat$state=="District of Columbia")
Alright I found it.
dat <- dat[-9,]
dat2 <- dat2[-9,]
set.seed(22)
s2 <- sample(dat2$state, 5,replace = T)
s2
## [1] "Pennsylvania" "Florida" "Mississippi" "Georgia" "New Jersey"
# notice its 4 large, not 5
dat2[dat2$state%in%s2,]
As we can see, we’ve sample Ohio twice – not good. As we
can see, despite sampling 5, we have two.
Each has \(\frac{1}{N}\) chance of being chose each time.
Here we use sample() , where each state index has equal
prob. without replacement or in the scope
of the class :
\[ \pi_i=\frac{n}{N} \]
set.seed(123)
s <- sample(dat$state, 5)
s
## [1] "New Mexico" "Iowa" "Indiana" "Arizona" "Tennessee"
# subsetted df
dat <- dat[dat$state %in% s,]
Nothing makes sense without pictures :
library(tidyverse)
us_map <- map_data("state")
dat <-
dat |> select(state, home, area, density)|>
mutate(state = tolower(state))
map_df <-
us_map |>
right_join(dat, by = c("region" = "state"))
map_df <- map_df |> select(-subregion) #dont know that
library(tidyr)
library(ggplot2)
map_long <- map_df %>%
pivot_longer(
cols = c(density, area, home),
names_to = "variable",
values_to = "value"
)
library(patchwork) #might combine plts
p1 <- ggplot(map_df, aes(long, lat, group = group, fill = density)) +
geom_polygon(color = "white") +
coord_fixed(1.3) +
scale_fill_viridis_c(option = "C", name = "Density") +
ggtitle("Population Density")
p2 <- ggplot(map_df, aes(long, lat, group = group, fill = area)) +
geom_polygon(color = "white") +
coord_fixed(1.3) +
scale_fill_distiller(palette = "Blues", name = "Land Area") +
ggtitle("Land Area")
p3 <- ggplot(map_df, aes(long, lat, group = group, fill = home)) +
geom_polygon(color = "white") +
coord_fixed(1.3) +
scale_fill_distiller(palette = "Reds", name = "Homes") +
ggtitle("Number of Homes")
p1
p2
p3
Recall that the general form for a confidence interval is :
\[ \hat{\theta}\pm\text{C}_{\text{val}}*\text{S.E.} \]
And under the same confidence,
critical value remains constant!
So the same sample statistic varies only by \(\text{S.E.}\) . for the estimate of \(\mu\) is \(\bar{x}\). Meaning the standard errors are :
\[ \text{S.E.}_{\text{Inf}} = \sqrt{\hat{V}_{\bar{x}}} \]
\[ \text{S.E.}_{\text{Inf}} = \sqrt{\hat{V}_{\bar{x}}*\text{FPC}} \]
Meaning :
Approximation Investigation ( \(\frac{N-n}{N-1}\approx 1-\frac{n}{N}\) )
Therefore, the only time they are the same with a constant \(N\) is when \(n=1\). Meaning when the finite population is sampled, its variance is equal when you only take 1 value from the finite population – otherwise,