The dataset contains the contaacts which is devices within 5m within a 5min window. So 5m per 5min will be count as one obs, and if the same two devices are within 5m over a 10min window continuously, they appear twice and therefore would be have two records in the dataset.
Our response here:
unique: the count of unique device pairings at a polygon in a day.
intense: reduces the window of what counts as a contact to 2m in 90s
intense_unique: first reduce the intense contact then unique
Not_household takes the unique contacts and removes device pairings that are likely from the same household. (We have distances to common evening location for most devices.)
franklin_recent_df <- read_csv("data/contact/oh_franklin_20210206_20210221.csv")
missing_info()
## # A tibble: 19 x 3
## attribute missing_number missing_pct
## <chr> <dbl> <dbl>
## 1 sqft 247084 1
## 2 numstories 247084 1
## 3 intense 115788 0.469
## 4 intense_unique 115788 0.469
## 5 not_household 20829 0.0843
## 6 usedesc 8031 0.0325
## 7 zoning_description 8031 0.0325
## 8 X1 0 0
## 9 address 0 0
## 10 date 0 0
## 11 all 0 0
## 12 unique 0 0
## 13 city 0 0
## 14 szip 0 0
## 15 lbcs_activity_desc 0 0
## 16 lbcs_function_desc 0 0
## 17 census_blockgroup 0 0
## 18 lat 0 0
## 19 lon 0 0
sqft and numstories are useless.
franklin_recent_df <- franklin_recent_df %>%
select(-sqft, -numstories)
Fill in the missing value of intense contact count as 0. Then filter out the obs with missing data.
franklin_recent_df<-
franklin_recent_df %>%
replace_na(
list(intense = 0,
intense_unique = 0)
) %>%
na.omit()
franklin_recent_df %>%
group_by(lbcs_function_desc, lbcs_activity_desc) %>%
summarise_at(vars(unique, intense_unique), .funs = list(sum)) %>%
mutate(intense_pct = scales::percent(intense_unique/unique)) %>%
arrange(-unique, -intense_unique)
## # A tibble: 7 x 5
## # Groups: lbcs_function_desc [7]
## lbcs_function_desc lbcs_activity_desc unique intense_unique intense_pct
## <chr> <chr> <dbl> <dbl> <chr>
## 1 Residence or accommoda… Residential 6.42e5 340610. 53%
## 2 unknown unknown 5.53e5 192951. 35%
## 3 Education, public admi… Social, institution… 1.07e5 40322. 38%
## 4 General sales or servi… Shopping, business,… 5.51e4 20951. 38%
## 5 Private household Residential 3.74e4 14280. 38%
## 6 Arts, entertainment, a… Leisure 1.24e4 4998. 40%
## 7 Transportation, commun… Travel or movement 4.70e1 18 38%
Contact are mostly Residential, either from residence or private household (65%). Apart from the unkowns, assuming the unknowns are uniformly distributed to the rest of the sectors, we should be paying attention to:
General sales or services (Shopping, business, or trade)
Education, public admin., health care (Social, institutional, or infrastructure-related)
Arts, entertainment, and recreation locations have low number of contact.
location_sum <- franklin_recent_df %>%
group_by(lat, lon, address, szip, city, lbcs_function_desc) %>%
summarise_at(vars(unique, intense_unique), .funs = list(sum))
location_sum %>%
#filter(unique > 50) %>%
ggplot(aes(log(unique))) +
geom_density()
Most are below 150, some extreme are above that. Let’s check it out.
location_sum %>%
filter(unique >= 150) %>%
ggplot(aes(log(unique))) +
geom_density()
get_pop_content <- function(address, city, szip,
unique, intense_unique, lbcs_function_desc) {
content <-
paste(
tags$h4(tags$strong(city)),
paste(address, szip), br(),
tags$strong("Location Type: "), tags$u(lbcs_function_desc), br(),
tags$strong("Unique Contact: "), tags$u(unique), " ",
tags$strong("Intense Contact: "), tags$u(intense_unique)
)
return(content)
}
location_info <-
location_sum %>%
filter(unique >= 150) %>%
ungroup()
popup_content <-
location_info %>%
select(address, city, szip, unique, intense_unique, lbcs_function_desc) %>%
pmap_chr(get_pop_content)
location_info <-
location_info %>%
mutate(popup_content = popup_content)
location_info %>%
leaflet() %>%
addTiles() %>%
addCircleMarkers(
lat = ~lat,
lng = ~lon,
opacity = 0,#~formattable::normalize(unique, min = 0, max = 1),
#clusterOptions = markerClusterOptions(),
radius = ~formattable::normalize(intense_unique, min = 10, max = 30),
fillOpacity = ~formattable::normalize(unique, min = 0, max = 1),
popup = ~popup_content)
get rid of the extreme value: The Orchard apartment
Now clearly we have the most number of contact on campus.
# location_info <- location_info %>%
# filter(address != "5353 WILCOX RD") %>%
# mutate(group = cut_interval(unique, 4))
# locatin_groups <- location_info %>% distinct(group)
leaflet(location_info %>% filter(address != "5353 WILCOX RD")) %>%
addTiles() %>%
addCircleMarkers(
lat = ~lat,
lng = ~lon,
opacity = 0,#~formattable::normalize(unique, min = 0, max = 1),
#clusterOptions = markerClusterOptions(),
radius = ~formattable::normalize(intense_unique, min = 10, max = 30),
fillOpacity = ~formattable::normalize(unique, min = 0, max = 1),
popup = ~popup_content)
Another extreme point: 1932 COLLEGE RD 43210, OSU campus
leaflet(location_info %>% filter(address != "5353 WILCOX RD" &
address != "1932 COLLEGE RD")
) %>%
addTiles() %>%
addProviderTiles(providers$Stamen.Toner) %>%
addCircleMarkers(
lat = ~lat,
lng = ~lon,
opacity = 0,#~formattable::normalize(unique, min = 0, max = 1),
#clusterOptions = markerClusterOptions(),
radius = ~formattable::normalize(intense_unique, min = 10, max = 30),
fillOpacity = ~formattable::normalize(unique, min = 0, max = 1),
popup = ~popup_content)
location_info %>%
filter(address != "5353 WILCOX RD",
address != "1932 COLLEGE RD") %>%
slice_max(unique, n = 100) %>%
select(address, city, type = lbcs_function_desc,
unique_contact = unique, unique_intense_contact = intense_unique) %>%
DT::datatable()