The purpose of this Exploratory Data Analysis is to provide an overview of what data are available from the Meetup API and to start thinking about how to use these data to answer questions we have about R-Ladies Meetups with a focus on “Health” metrics.
In addition, this notebook seeks to explore different Meetup API workflows in order to improve the MeetupR library. R-Ladies has developed a package called MeetupR to streamline Meetup analysis. While this notebook calls the Meetup API directly to document API calls and parameters, then results should be compared to the MeetupR implementation. If there are possible improvements then patches will be proposed!
Additionally this work can be integrated into the R-Ladies meetup dashboard that shows the latest Meetup stats (useful for conference presentations and other status reports).
This section contains API calls made to the Meetup API along with parameters. Some of the API calls experience throttling at high volumes and need to be improved. This made the workflow a bit counterintuitive.
dir.create(file.path(params$output_folder))
dir.create(file.path(params$rds_folder))
dir.create(file.path(params$archive_folder))
get_meetups <- function (url, query) {
req <- GET(url, query=query)
print(paste(req$url))
json <- content(req, as = "text")
things <- fromJSON(json, flatten=TRUE)
return(things)
}
Identifying Meetup groups should be straightforward because R-Ladies exists as a Meetup topic. Unfortunately not every group is using this topic. This uses the “find_groups” function from the Meetup API.
get_rladies_groups <- function (folder) {
groups_url <- "https://api.meetup.com/find/groups"
groups_query_params <- list(
key=params$meetup_api_key,
sign=TRUE,
page=200,
radius="global")
# by topic
meetup_groups_topic <- get_meetups(groups_url, append(groups_query_params, c(topic_id=1513883, order="members")))
# by text + category
meetup_groups_text <- get_meetups(groups_url, append(groups_query_params, c(text="r-ladies", category=34)))
meetup_groups_aggr <- bind_rows(meetup_groups_topic, meetup_groups_text %>% anti_join(meetup_groups_topic, by="id"))
meetup_groups <- meetup_groups_aggr %>% filter(str_detect(name, "[Rr]([ -]?)[Ll]adies"))
meetup_groups <- meetup_groups %>%
mutate(events_url = paste("https://api.meetup.com/", urlname, "/events", sep="")) %>%
select(-meta_category.category_ids) # remove lists
write_csv(meetup_groups, paste(folder, "rladies_meetup_groups.csv", sep="/"),
na = "")
write_csv(meetup_groups_topic %>% select(-meta_category.category_ids),
paste(folder, "rladies_meetup_groups_topic.csv", sep="/"),
na = "")
write_csv(meetup_groups_text %>% select(-meta_category.category_ids),
paste(folder, "rladies_meetup_groups_text.csv", sep="/"),
na = "")
return(meetup_groups)
}
I created an account and subscribed it to the found groups. This checks that it is subscribed to all current groups. A future version of this should attempt to join via the Meetup API (depends on join questions).
get_self_groups <- function (folder) {
groups_url <- "https://api.meetup.com/self/groups"
groups_query_params <- list(
key=params$meetup_api_key,
sign=TRUE,
page=200,
fields="approved")
self_groups <- get_meetups(groups_url, groups_query_params)
self_groups <- self_groups %>%
mutate(events_url = paste("https://api.meetup.com/", urlname, "/events", sep="")) %>%
select(-meta_category.category_ids) # remove lists
write_csv(self_groups, paste(folder, "rladies_self_groups.csv", sep="/"),
na = "")
return(self_groups)
}
Names and other common identifying information about members has been removed. We are only interested in identifying unique members across groups and events, not who they are.
get_rladies_members <- function (meetup_groups, folder) {
members_query_params <- list(
key=params$meetup_api_key,
sign=TRUE,
page=200,
fields=paste("memberships", "topics", "gender", sep=",") # these can only go with individual profile calls
)
dir.create(file.path(paste(folder, "rladies_members", sep="/")))
meetup_members <- data_frame()
for (n in 1:nrow(meetup_groups)) {
members_url <- str_replace(meetup_groups[n,]["events_url"], "events", "members")
num_members <- as.numeric(meetup_groups[n,]["members"])
for (i in 1:ceiling(num_members/200)) {
print(paste("Trying url:", members_url))
members <- get_meetups(members_url, append(members_query_params, c(offset=i)))
if(length(members)== 0) {
next()
}
# remove identifying info
members <- members %>%
select(-name)
members$group_profile.answers <- NULL
meetup_members <- bind_rows(meetup_members, members)
write_csv(meetup_members,
paste0(folder, "/rladies_members/_meetup_members_", n, ".csv"),
na = "")
}
}
# remove identifying info
meetup_members <- meetup_members %>%
select(-group_profile.intro, -bio, -email, -group_profile.title)
write_csv(meetup_members, paste(folder, "rladies_meetup_members.csv", sep="/"),
na = "")
}
Depends on querying account being a member of the meetups.
get_rladies_calendar <- function (folder) {
calendar_url <- "https://api.meetup.com/self/calendar"
events_query_params <- list(
key=params$meetup_api_key,
sign=TRUE,
page=200,
order="time",
radius="global")
get_events <- function (url, query) {
req <- GET(url, query=query)
print(paste(req$url))
json <- content(req, as = "text")
events <- fromJSON(json, flatten=TRUE)
return(events)
}
meetup_events_calendar <- get_events(calendar_url, events_query_params)
write_csv(meetup_events_calendar, paste(folder, "rladies_meetup_calendar.csv", sep="/"),
na = "")
return(meetup_events_calendar)
}
Per group query - note that this gets throttled so here it is limited to groups the querying account does not have membership to.
get_rladies_future <- function (meetup_groups, folder) {
future_events_query_params <- list(
key=params$meetup_api_key,
sign=TRUE,
page=200,
status="upcoming",
fields=paste("comment_count", "photo_album", sep=", ")
)
dir.create(file.path(paste(folder, "rladies_meetup_events", sep="/")))
meetup_events_future <- data_frame()
for (n in 1:nrow(meetup_groups)) {
events_url <- meetup_groups[n,]["events_url"]
if(is.na(events_url)) {
print(paste("No events url for", n, meetup_groups[n,]["name"]))
next()
}
print(paste("Trying url:", events_url$events_url))
meetups <- get_meetups(events_url$events_url, future_events_query_params)
if(length(meetups)== 0) {
next()
}
meetups <- meetups %>% mutate(photo_album.photo_sample=NA)
meetup_events_future <- bind_rows(meetup_events_future, meetups)
write_csv(meetup_events_future,
paste0(folder, "/rladies_meetup_events/_meetup_events_future_", n, ".csv"),
na = "")
}
write_csv(meetup_events_future, paste(folder, "rladies_meetup_events_future.csv", sep="/"),
na = "")
return(meetup_events_future)
}
Meetup provides an upcoming_events endpoint that is convenient but may not be comprehensive. Let’s use it and compare our results to the group by group approach above.
get_rladies_upcoming <- function (folder) {
find_upcoming_url <- "https://api.meetup.com/find/upcoming_events"
# change to get upcoming meetups for all meetups in the list
events_query_params <- list(
key=params$meetup_api_key,
sign=TRUE,
page=200,
order="time",
radius="global")
get_group_events <- function (url, query) {
req <- GET(url, query=query)
print(paste(req$url))
json <- content(req, as = "text")
groups <- fromJSON(json, flatten=TRUE)
return(groups$events)
}
# I tried "topic" but no R-Ladies results were returned, so this field doesn't apply
meetup_events_text <- get_group_events(find_upcoming_url, append(events_query_params, c(text="rladies")))
# just in case anything slipped in there
meetup_events_upcoming <- meetup_events_text %>% filter(str_detect(group.name, "[Rr](-?)[Ll]adies"))
write_csv(meetup_events_upcoming, paste(folder, "rladies_meetup_upcoming_events.csv", sep="/"),
na = "")
return(meetup_events_upcoming)
}
Get all past events for all groups. This call is expensive and gets throttled.
# Run this 2nd when updating archives
# TODO this currently throttles out and has to be manipulated manually to run
get_rladies_past_all <- function (folder) {}
folder=params$archive_folder
meetup_groups <- read_csv(paste(folder, "rladies_meetup_groups.csv", sep="/"))
past_events_query_params <- list(
key=params$meetup_api_key,
sign=TRUE,
page=200,
status="past",
fields=paste("comment_count", "photo_album", sep=", ")
)
dir.create(file.path(paste(folder, "rladies_meetup_events", sep="/")))
meetup_events_past <- data_frame()
for (n in 73:nrow(meetup_groups)) {
events_url <- meetup_groups[n,]["events_url"]
print(paste("Trying url:", events_url$events_url))
meetups <- get_meetups(events_url$events_url, past_events_query_params)
if(length(meetups)== 0) {
next()
}
meetups <- meetups %>% mutate(photo_album.photo_sample=NA)
meetup_events_past <- bind_rows(meetup_events_past, meetups)
write_csv(meetup_events_past, paste0(folder, "/rladies_meetup_events/_meetup_events_past_", n, ".csv"),
na = "")
}
write_csv(meetup_events_past, paste(folder,"rladies_meetup_events_past.csv", sep="/"),
na = "")
# copy entire archive folder to latest, don't try to manage individual file updates
system2("rm", args = c("-r", params$output_folder))
system2("cp", args = c("-R", paste(params$archive_folder), paste(params$output_folder)))
Combine the previous past events pull for a group with the previous upcoming events pull where the event has occurred in the past. If all previous upcoming events occurred in the past, then we’ll need to pull past events for the group depending on the time interval.
Check this against the updated group list to find any groups that need all events pulled. Only pull all events for those groups.
get_rladies_past <- function() {
# TODO
}
Get a list of member id’s that RSVP’ed to past meetups.
# Run this 3rd when updating archives
base_url <- "https://api.meetup.com"
folder=params$archive_folder
meetup_events_past <- read_csv(paste(params$output_folder,"rladies_meetup_events_past.csv", sep="/"))
member_rsvps_query_params <- list(
key=params$meetup_api_key,
sign=TRUE,
response="yes"
)
dir.create(file.path(paste(folder, "rladies_member_rsvps", sep="/")))
meetup_member_rsvps <- data_frame()
for (n in 458:nrow(meetup_events_past)) {
rsvps_url <- paste(base_url, meetup_events_past[n,]["group.urlname"], "events", meetup_events_past[n,]["id"], "rsvps",
sep="/")
print(paste("Trying url:", rsvps_url))
rsvps <- get_meetups(rsvps_url, member_rsvps_query_params)
if(length(rsvps)== 0) {
next()
}
if("errors" %in% names(rsvps)) {
print(paste(n, ":", rsvps$errors))
break()
}
rsvps <- rsvps %>%
select(member.id, created, updated, member.event_context.host, event.yes_rsvp_count, event.id, group.name) %>%
mutate(member_rsvps = n())
meetup_member_rsvps <- bind_rows(meetup_member_rsvps, rsvps)
write_csv(meetup_member_rsvps, paste0(folder, "/rladies_member_rsvps/_meetup_member_rsvps_", n, ".csv"),
na = "")
}
write_csv(meetup_member_rsvps, paste(folder,"rladies_member_rsvps.csv", sep="/"),
na = "")
# copy entire archive folder to latest, don't try to manage individual file updates
system2("rm", args = c("-r", params$output_folder))
system2("cp", args = c("-R", paste(params$archive_folder), paste(params$output_folder)))
Create a new archive folder and get the latest data.
# Run this first when updating archives
dir.create(file.path(params$archive_folder))
# TODO add try/catch for throttling
meetup_groups <- get_rladies_groups(params$archive_folder)
self_groups <- get_self_groups(params$archive_folder)
pending_groups <- meetup_groups %>%
anti_join(self_groups, by="id")
get_rladies_upcoming(params$archive_folder)
meetup_events_calendar <- get_rladies_calendar(params$archive_folder)
if (nrow(pending_groups) > 0) {
meetup_events_future <- get_rladies_future(pending_groups, params$archive_folder)
} else {
meetup_events_future <- data_frame()
}
meetup_events_current <- bind_rows(meetup_events_future, meetup_events_calendar)
write_csv(meetup_events_current, paste(params$archive_folder,"rladies_meetup_events_current.csv", sep="/"),
na = "")
# Adding this for reference, need to check that it still works
meetup_members <- get_rladies_members(meetup_groups, params$archive_folder)
system2("rm", args = c("-r", params$output_folder))
system2("cp", args = c("-R", paste(params$archive_folder), paste(params$output_folder)))
meetup_groups <- read_csv(paste(params$output_folder, "rladies_meetup_groups.csv", sep="/"))
Check group membership for the retrieval account.
self_groups <- read_csv(paste(params$output_folder, "rladies_self_groups.csv", sep="/"))
pending_groups <- meetup_groups %>%
anti_join(self_groups, by="id") %>%
select(id, name, join_mode, link, events_url) %>%
arrange(join_mode)
All Pending Groups:
Groups the retrieval account does not belong to. Data from these groups will not be in this report. If this report has any groups listed here, have the retrieval account join them and then rerun the analysis above.
pending_groups
## # A tibble: 5 x 5
## id name join_mode
## <int> <chr> <chr>
## 1 27269070 R-Ladies Lima open
## 2 27443387 R-Ladies Amsterdam open
## 3 27283931 R-Ladies Santa Barbara open
## 4 27443569 R-Ladies Loughborough open
## 5 27456719 R-Ladies São Paulo open
## # ... with 2 more variables: link <chr>, events_url <chr>
Pending Groups Not Requiring Approval:
These are groups the retrieval account can join and have immediate access to. The ones requiring approval take longer because a person has to approve the membership.
pending_groups %>% filter(join_mode != "approval")
## # A tibble: 5 x 5
## id name join_mode
## <int> <chr> <chr>
## 1 27269070 R-Ladies Lima open
## 2 27443387 R-Ladies Amsterdam open
## 3 27283931 R-Ladies Santa Barbara open
## 4 27443569 R-Ladies Loughborough open
## 5 27456719 R-Ladies São Paulo open
## # ... with 2 more variables: link <chr>, events_url <chr>
Groups that use the official “r-ladies” Meetup topic. These will show up on this page - [https://www.meetup.com/topics/r-ladies/]
meetup_groups_topic <- read_csv(paste(params$output_folder, "rladies_meetup_groups_topic.csv", sep="/"))
A text search for “rladies” did not return any results. Including the tech category id (34) excludes extra results.
meetup_groups_text <- read_csv(paste(params$output_folder, "rladies_meetup_groups_text.csv", sep="/"))
Groups that matched a text search for R-Ladies but are not using the Meetup topic.
# group + organizer to contact
meetups_missing_topic <- meetup_groups_text %>% anti_join(meetup_groups_topic, by="id")
meetups_missing_topic %>% filter(str_detect(name, "[Rr]([ -]?)[Ll]adies")) %>%
arrange(-members) %>%
select(name, localized_location, organizer.name, members, created)
## # A tibble: 14 x 5
## name
## <chr>
## 1 Spotkania Entuzjastów R-Warsaw RUG Meetup & R-Ladies Warsaw
## 2 R-Ladies Taipei
## 3 R-Ladies Budapest
## 4 R-Ladies Lisboa
## 5 R-Ladies İzmiR
## 6 RLadiesCDMX
## 7 R Ladies - Twin Cities
## 8 R-Ladies Adelaide
## 9 R-Ladies St. Louis
## 10 R-Ladies Bogotá
## 11 R-Ladies Connecticut
## 12 R-Ladies São Paulo
## 13 R-Ladies Buffalo
## 14 R-Ladies Americana
## # ... with 4 more variables: localized_location <chr>,
## # organizer.name <chr>, members <int>, created <dbl>
Membership numbers only show the current totals. To track membership over time we need to see when members joined the group. Additionally, we are interested in discovering repeat RSVPs as well as common group membership.
Proposed Improvement: Determine number of members over time based on when members joined the group. This will be done in a Membership focused version of this analysis.
Identifying which groups are regularly scheduling Meetups will help us to see who is the most active. Groups with gaps in Meetup activity or with no events should be highlighted so we can reach out to their organizers.
meetup_events_past <- read_csv(paste(params$output_folder,"rladies_meetup_events_past.csv", sep="/"))
meetup_groups <- read_csv(paste(params$output_folder, "rladies_meetup_groups.csv", sep="/"))
meetup_events_past_merge <- meetup_events_past %>%
inner_join(meetup_groups %>%
select(urlname, city, country, state, members, timezone),
by=c("group.urlname"="urlname")) %>%
separate(timezone, c("region"), extra="drop", fill="left")
meetup_events_past_merge <- meetup_events_past_merge %>%
mutate(
group.created=as.POSIXct(group.created/1000, tz="UTC", origin="1970-01-01"),
created=as.POSIXct(created/1000, tz="UTC", origin="1970-01-01"))
saveRDS(meetup_events_past_merge, paste(params$rds_folder, "meetup_events_past_merge.Rds", sep="/"))
How many events has each meetup had?
past_events_freq <- meetup_events_past_merge %>%
group_by(group.name) %>%
summarise(num_events=n(),
lat=first(group.lat), lon=first(group.lon),
location=first(group.localized_location),
country=first(country),
state=first(state),
city=first(city),
members=first(members),
region=first(region)) %>%
mutate(num_events_log = round(log(num_events))) %>%
group_by(num_events_log) %>%
mutate(events_min_max = ifelse(min(num_events) == max(num_events),
paste(min(num_events)),
paste(min(num_events), "-", max(num_events))))
ggplot(past_events_freq, aes(x=reorder(location, num_events), y=num_events)) +
geom_bar(stat="identity", aes(fill=region)) +
theme_few() +
coord_flip() +
labs(x="Chapter", y="Events", title="Number of Events")
How many meetups are having how many events? What is the most common event frequency amongst Meetup groups? What’s “normal”? This is only looking at number of events and does not consider other factors like Meetup age. The majority of groups have had between 1 and 4 events.
This uses a log of the total number of events to group into buckets, then the min/max of the values found in each bucket are reported on the X-axis. I’m sure there’s a magical R function to do this instead of the way I’m doing it.
ggplot(past_events_freq, aes(x=reorder(events_min_max, num_events_log))) +
geom_density() +
theme_few() +
xlab("Events per Meetup (Log)")
What is the “age” of meetups based on their first event? Rather than considering the date the Meetup was created, looking at when the first event happened may be a more realistic indicator of age.
past_event_age <- meetup_events_past_merge %>%
group_by(group.name) %>%
summarise(first_meetup = as_datetime(min(local_date, na.rm=FALSE)),
location=first(group.localized_location),
region=first(region),
group.created=first(group.created)) %>%
mutate(first_meetup_age=difftime(Sys.Date(), first_meetup, unit="days"),
location=reorder(location, first_meetup_age))
ggplot(past_event_age, aes(x=location, y=first_meetup_age)) +
geom_bar(stat="identity", aes(fill=region)) +
theme_few() +
coord_flip() +
scale_y_continuous(breaks = pretty(past_event_age$first_meetup_age, n = 10)) +
labs(x="Chapter", y="Days Since First Meetup", title="Meetup Age by First Event")
How big of a difference do we see between the first event and the created by date? If the creation date was used to determine age, how might the ranking of meetup groups by age be affected?
created_vs_first <- past_event_age %>%
mutate(
group_age = difftime(Sys.Date(), group.created, units="days"),
created_interval=difftime(first_meetup, group.created, units="days"),
location=reorder(location, -created_interval))
ggplot(created_vs_first, aes(x=location, y=created_interval)) +
geom_bar(stat="identity", aes(fill=region)) +
theme_few() +
coord_flip() +
labs(x="Chapter", y="Days", title="Days Between Creation and First Event")
created_vs_first_melt <- created_vs_first %>%
mutate(location=reorder(location,group_age)) %>%
select(location, group_age, created_interval) %>%
melt()
ggplot(created_vs_first_melt, aes(x=location, y=value)) +
geom_bar(stat="identity", position="dodge", aes(fill=variable)) +
theme_few() +
coord_flip() +
labs(x="Chapter", y="Days", fill= "Days Type", title="Meetup Age vs Days Between Creation and First Meetup")
A proposed metric would be to identify what groups have yet to have a first meetup. This could be an anti-join between the group list and the event summary.
How much time passes between the Meetup events? Are some groups regularly meeting every month? What is the typical time between Meetup groups? This metric needs to be explored further.
This plot is hard to read and I don’t think this is the best metric to use as is. The idea here was just to show the overall distribution for all of the groups. A different visualization might be better here. Something to explore further might be looking for consistency in time between events and determining what’s “normal” for each individual meetup.
event_intervals <- meetup_events_past_merge %>%
group_by(group.name) %>%
arrange(local_date) %>%
mutate(
prev_event = lag(local_date),
days_between = difftime(local_date, prev_event, unit="days")
) %>%
select(group.name, group.localized_location, region, days_between, local_date, prev_event, id)
event_intervals_curr <- event_intervals %>% filter(local_date > as_datetime("2016-01-01"))
ggplot(event_intervals_curr, aes(x=local_date, y=days_between)) +
geom_bar(stat="identity", position="dodge", aes(fill=group.localized_location), show.legend = FALSE)+
theme_few() +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
scale_y_continuous() +
scale_x_date(date_breaks = "1 month") +
facet_wrap(~ region, scales="free") +
labs(x="Event Month", y="Days Since Last Meetup", fill= "Region", title="Days Between Meetups")
ggplot(event_intervals_curr %>% filter(region %in% c("US", "Canada")), aes(x=local_date, y=days_between)) +
geom_bar(stat="identity", position="dodge", aes(fill=group.localized_location)) +
theme_few() +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
guides(fill=guide_legend(ncol = 1)) +
scale_y_continuous() +
scale_x_date(date_breaks = "1 month") +
labs(x="Month", y="Days Since Last Meetup", fill= "Region", title="Days Between Meetups (North America)")
ggplot(event_intervals_curr %>% filter(region=="Europe"), aes(x=local_date, y=days_between)) +
geom_bar(stat="identity", position="dodge", aes(fill=group.localized_location)) +
theme_few() +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
guides(fill=guide_legend(ncol = 1)) +
scale_y_continuous() +
scale_x_date(date_breaks = "1 month") +
labs(x="Month", y="Days Since Last Meetup", fill= "Region", title="Days Between Meetups (Europe)")
ggplot(event_intervals_curr %>% filter(region %in% c("Australia", "Asia", "Africa")), aes(x=local_date, y=days_between)) +
geom_bar(stat="identity", position="dodge", aes(fill=group.localized_location)) +
theme_few() +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
scale_y_continuous() +
scale_x_date(date_breaks = "1 month") +
labs(x="Month", y="Days Since Last Meetup", fill= "Region", title="Days Between Meetups (APAC)")
ggplot(event_intervals_curr %>% filter(region=="America"), aes(x=local_date, y=days_between)) +
geom_bar(stat="identity", position="dodge", aes(fill=group.localized_location)) +
theme_few() +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
scale_y_continuous() +
scale_x_date(date_breaks = "1 month") +
labs(x="Month", y="Days Since Last Meetup", fill= "Region", title="Days Between Meetups (Latin America)")
This looks for the most common event intervals to compare across groups. A density plot might also be useful here to see what intervals are the most common.
event_intervals_summary <- event_intervals_curr %>%
group_by(group.name) %>%
mutate(num_meetups = n()) %>%
filter(!is.na(days_between)) %>%
mutate(days_between_log = round(log(as.numeric(days_between + 1)))) %>%
group_by(days_between_log) %>%
mutate(days_between_min_max = ifelse(min(days_between) == max(days_between),
paste(min(days_between)),
paste(min(days_between), "-", max(days_between)))) %>%
group_by(group.name, days_between_log) %>%
summarise(
days_between_freq = n(),
days_between_min = min(days_between),
days_between_max = max(days_between),
num_meetups = first(num_meetups),
days_between_min_max = first(days_between_min_max),
location = first(group.localized_location),
region = first(region)
)
# TODO freq % -> # meetups in days between bucket / total # meetups
# which.max just returns one row, we want all max rows
event_intervals_summary_top <- event_intervals_summary %>%
group_by(group.name) %>%
slice(which(days_between_freq == max(days_between_freq)))
event_intervals_summary_top <- event_intervals_summary_top %>%
mutate(days_between_min_max=)
ggplot(event_intervals_summary_top, aes(x=reorder(location, num_meetups), y=reorder(days_between_min_max, days_between_log))) +
geom_bar(stat="identity", position="dodge", aes(fill=region)) +
theme_few() +
coord_flip() +
labs(x="Chapter", y="Days", fill= "Region", title="Days Between Meetups - Most Frequent")
Related to intervals above. In a given time period, how many meetups does each group have? This is mentioned as a proposed metric below and will be part of a future version of this analysis focused on Events.
The Upcoming Events endpoint appears to be pretty limited in what it returns. These are groups that are missing from the upcoming events data that was retrieved.
meetup_events_upcoming <- read_csv(paste(params$output_folder, "rladies_meetup_upcoming_events.csv", sep="/"))
meetup_events_upcoming <- meetup_events_upcoming %>% mutate(id=id)
meetup_events_current <- read_csv(paste(params$output_folder, "rladies_meetup_events_current.csv", sep="/"))
meetup_events_missing <- meetup_events_current %>% anti_join(meetup_events_upcoming, by="id") %>% unique()
meetup_events_missing %>% arrange(local_date) %>% select(group.name, name, local_date)
## # A tibble: 30 x 3
## group.name
## <chr>
## 1 R-Ladies San Francisco
## 2 R-Ladies Melbourne
## 3 R-Ladies Ames
## 4 R-Ladies Belgrade
## 5 R-Ladies Chicago
## 6 R-Ladies Tucson AZ
## 7 R-Ladies Lisboa
## 8 R-Ladies Sarasota
## 9 R-Ladies Melbourne
## 10 R Ladies - Twin Cities
## # ... with 20 more rows, and 2 more variables: name <chr>,
## # local_date <date>
The upcoming_events endpoint could be useful as a check for any groups that were missed via the find groups endpoint. This should be zero, but if anything shows up, it could be worth double checking the group!
meetup_events_current <- read_csv(paste(params$output_folder, "rladies_meetup_events_current.csv", sep="/"))
meetup_events_upcoming <- read_csv(paste(params$output_folder, "rladies_meetup_upcoming_events.csv", sep="/"))
meetup_events_current <- meetup_events_current %>% mutate(id=as.character(id))
meetup_events_upcoming <- meetup_events_upcoming %>% mutate(id=as.character(id))
not_in_groups <- meetup_events_upcoming %>% anti_join(meetup_events_current, by="id")
# check group id
missing_groups <- meetup_events_upcoming %>% anti_join(meetup_groups, by=c("group.name"="name"))
missing_groups
## # A tibble: 0 x 47
## # ... with 47 variables: created <dbl>, duration <int>, id <chr>,
## # name <chr>, status <chr>, time <dbl>, local_date <date>,
## # local_time <time>, updated <dbl>, utc_offset <int>,
## # waitlist_count <int>, yes_rsvp_count <int>, link <chr>,
## # description <chr>, visibility <chr>, how_to_find_us <chr>,
## # rsvp_limit <int>, rsvp_close_offset <chr>, rsvp_open_offset <chr>,
## # venue.id <int>, venue.name <chr>, venue.lat <dbl>, venue.lon <dbl>,
## # venue.repinned <lgl>, venue.address_1 <chr>, venue.address_2 <chr>,
## # venue.city <chr>, venue.country <chr>,
## # venue.localized_country_name <chr>, venue.zip <int>,
## # venue.state <chr>, group.created <dbl>, group.name <chr>,
## # group.id <int>, group.join_mode <chr>, group.lat <dbl>,
## # group.lon <dbl>, group.urlname <chr>, group.who <chr>,
## # group.localized_location <chr>, group.region <chr>, fee.accepts <chr>,
## # fee.amount <int>, fee.currency <chr>, fee.description <chr>,
## # fee.label <chr>, fee.required <lgl>
How can we best determine Meetup membership? While counting members may not be the best estimate, considering RSVP’s may shed more light on this. Not everyone who RSVP’s actually shows up, so we may want to look at RSVP trends to determine a better way to adjust the count.
How many people RSVP to the meetups?
past_rsvps <- meetup_events_past_merge %>%
group_by(group.name) %>%
mutate(
yes_rsvp_pct = round(yes_rsvp_count/members, 2), # Problem: based on current membership, not membership at the time
yes_rsvp_log = round(log(yes_rsvp_count)),
event_month=month(created, label=TRUE),
event_ym=format(created, "%Y-%m-01") # TODO use lubridate
) %>%
group_by(group.name, yes_rsvp_log) %>%
mutate(yes_rsvp_log_freq = n()) %>%
ungroup() %>%
group_by(group.name, yes_rsvp_pct) %>%
mutate(yes_rsvp_pct_freq = n())
past_rsvp_summary <- past_rsvps %>%
group_by(event_ym, group.name) %>%
summarise(yes_rsvp_sum_grp=sum(yes_rsvp_count), region=first(region))
ggplot(past_rsvp_summary, aes(x=event_ym, y=yes_rsvp_sum_grp)) +
geom_bar(stat="identity", aes(fill=region)) +
theme_few() +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
labs(x="Event Month", y="Yes RSVPs", fill="Region", title="Past Yes RSVPs by Region")
ggsave("png/rsvp_by_region.png")
## Saving 12 x 6 in image
How many yes RSVP’s does each group typically get? This plot is ordered by number of members reported by Meetup to show how misleading number of members can be when considering who actually engaged with the group.
past_rsvp_freq <- past_rsvps %>%
group_by(group.name) %>%
summarize(num_events=n(),
rsvps_sum = sum(yes_rsvp_count),
rsvps_med = round(median(yes_rsvp_count)),
rsvps_min = min(yes_rsvp_count),
rsvps_max = max(yes_rsvp_count),
rsvps_mode = yes_rsvp_log[which.max(yes_rsvp_log_freq)],
rsvps_pct_mode = yes_rsvp_pct[which.max(yes_rsvp_pct_freq)],
lat=first(group.lat), lon=first(group.lon),
location=first(group.localized_location),
country=first(country),
state=first(state),
city=first(city),
members=first(members),
region=first(region)) %>%
mutate(location=reorder(location, -members))
ggplot(past_rsvp_freq, aes(x=location, y=rsvps_pct_mode)) +
geom_bar(stat="identity", aes(fill=region)) +
theme_few() +
scale_y_continuous(labels=percent) +
coord_flip() +
labs(x="Chapter", y="% of Unique Members who RSVPed at least once", fill="Region",
title="Yes RSVPs - Most Frequent, Ordered by # Members")
Overall RSVPS per Meetup per group (over time)
past_rsvps_month_curr <- past_rsvps %>%
filter(str_detect(local_date, "2017")) # TODO can we use lubridate here?
ggplot(past_rsvps_month_curr, aes(x=event_month, y=yes_rsvp_count)) +
geom_bar(stat="identity", position="dodge", aes(fill=group.localized_location), show.legend = FALSE) +
theme_few() +
facet_wrap(~ region, scales="free") +
labs(x="Event Month", y="# Yes RSVPs", fill="Region", title="Yes RSVPs by Chapter")
ggsave("png/rsvp_by_chapter.png")
## Saving 12 x 6 in image
ggplot(past_rsvps_month_curr %>% filter(region %in% c("US", "Canada")), aes(x=event_month, y=yes_rsvp_count)) +
geom_bar(stat="identity", position="dodge", aes(fill=group.localized_location)) +
theme_few() +
labs(x="Event Month", y="# Yes RSVPs", fill="Chapter", title="Yes RSVPs by Chapter (North America)") +
guides(fill=guide_legend(ncol = 1))
ggplot(past_rsvps_month_curr %>% filter(region=="Europe"), aes(x=event_month, y=yes_rsvp_count)) +
geom_bar(stat="identity", position="dodge", aes(fill=group.localized_location)) +
theme_few() +
labs(x="Event Month", y="# Yes RSVPs", fill="Chapter", title="Yes RSVPs by Chapter (Europe)")
ggplot(past_rsvps_month_curr %>% filter(region %in% c("Australia", "Asia", "Africa")), aes(x=event_month, y=yes_rsvp_count)) +
geom_bar(stat="identity", position="dodge", aes(fill=group.localized_location)) +
theme_few() +
labs(x="Event Month", y="# Yes RSVPs", fill="Chapter", title="Yes RSVPs by Chapter (APAC)")
ggplot(past_rsvps_month_curr %>% filter(region=="America"), aes(x=event_month, y=yes_rsvp_count)) +
geom_bar(stat="identity", position="dodge", aes(fill=group.localized_location)) +
theme_few() +
labs(x="Event Month", y="# Yes RSVPs", fill="Chapter", title="Yes RSVPs by Chapter (Latin America)")
Percent RSVPs - depends on total membership which only reports an overall total, not the total at the time of the event.
ggplot(past_rsvps_month_curr, aes(x=event_month, y=yes_rsvp_pct)) +
geom_bar(stat="identity", position="dodge", aes(fill=group.localized_location), show.legend = FALSE) +
facet_wrap(~ region) +
theme_few() +
labs(x="Event Month", y="% Members RSVPed", fill="Chapter", title="% Members RSVPs by Chapter")
ggsave("png/rsvp_pct_members.png")
## Saving 12 x 6 in image
ggplot(past_rsvps_month_curr %>% filter(region %in% c("US", "Canada")), aes(x=event_month, y=yes_rsvp_pct)) +
geom_bar(stat="identity", position="dodge", aes(fill=group.localized_location)) +
scale_y_continuous(labels=percent) +
theme_few() +
guides(fill=guide_legend(ncol=1)) +
labs(x="Event Month", y="% of Current Members RSVPed Yes", fill="Region", title="Yes RSVPs by Chapter (North America)")
ggplot(past_rsvps_month_curr %>% filter(region=="Europe"), aes(x=event_month, y=yes_rsvp_pct)) +
geom_bar(stat="identity", position="dodge", aes(fill=group.localized_location)) +
scale_y_continuous(labels=percent) +
theme_few() +
labs(x="Event Month", y="% of Current Members RSVPed Yes", fill="Region", title="Yes RSVPs by Chapter (Europe)")
ggplot(past_rsvps_month_curr %>% filter(region %in% c("Australia", "Asia", "Africa")), aes(x=event_month, y=yes_rsvp_pct)) +
geom_bar(stat="identity", position="dodge", aes(fill=group.localized_location)) +
scale_y_continuous(labels=percent) +
theme_few() +
labs(x="Event Month", y="% of Current Members RSVPed Yes", fill="Region", title="Yes RSVPs by Chapter (APAC)")
ggplot(past_rsvps_month_curr %>% filter(region=="America"), aes(x=event_month, y=yes_rsvp_pct)) +
geom_bar(stat="identity", position="dodge", aes(fill=group.localized_location)) +
scale_y_continuous(labels=percent) +
theme_few() +
labs(x="Event Month", y="% of Current Members RSVPed Yes", fill="Region", title="Yes RSVPs by Chapter (Latin America)")
How many events do members RSVP to? How many members RSVP to more than half of the events?
We should also adjust this for event cadence. For example how many events does the meetup have per month and how many members RSVP to at least one meetup per month.
What proportion of events has each member RSVP’ed to? How many members RSVP to more than half of the events?
We should also adjust this for event cadence. For example how many events does the meetup have per month and how many members RSVP to at least one meetup per month.
meetup_events_past_merge <- readRDS(paste(params$rds_folder, "meetup_events_past_merge.Rds", sep="/"))
member_rsvps <- read.csv(paste(params$output_folder,"rladies_member_rsvps.csv", sep="/"))
meetup_events_past_merge <- meetup_events_past_merge %>%
group_by(group.name) %>%
mutate(num_events = n_distinct(id))
member_rsvps <- member_rsvps %>%
mutate(
chapter = ifelse(str_detect(group.name, "Warsaw"), "Warsaw",
str_replace(group.name, "R[ -]?[Ll]adies( )?(- )?", "")),
created=as.POSIXct(created/1000, tz="UTC", origin="1970-01-01"),
updated=as.POSIXct(updated/1000, tz="UTC", origin="1970-01-01"),
rsvp_guests=event.yes_rsvp_count-member_rsvps # current theory to explain discrepancy
) %>%
group_by(chapter) %>%
mutate(
num_events = n_distinct(event.id),
rsvps_unique = n_distinct(member.id),
total_rsvps = sum(member_rsvps)
)
# check events totals
# look for missing events from member_rsvps
num_events_check <- meetup_events_past_merge %>%
select(id, num_events, yes_rsvp_count) %>%
anti_join(member_rsvps, by=c("id" = "event.id")) %>%
filter(yes_rsvp_count > 0)
# This should be zero rows
num_events_check
## # A tibble: 0 x 4
## # Groups: group.name [0]
## # ... with 4 variables: group.name <chr>, id <chr>, num_events <int>,
## # yes_rsvp_count <int>
# proportion of events rsvp'ed to by members in list
member_rsvps_pct <- member_rsvps %>%
group_by(chapter, member.id) %>%
summarise(
num_rsvps=n(),
num_events=first(num_events),
rsvps_unique=first(rsvps_unique),
rsvps_all_pct=round(num_rsvps/first(total_rsvps),4),
rsvps_events_pct=round(num_rsvps/num_events, 2)
)
member_rsvps_pct_summary <- member_rsvps_pct %>%
group_by(chapter, num_rsvps) %>%
summarise(
num_members = n(),
rsvps_events_pct = first(rsvps_events_pct),
pct_rnd = round(rsvps_events_pct, 1),
rsvps_all_pct = first(rsvps_all_pct),
num_events = first(num_events),
num_events_log = round(log(num_events)),
rsvps_unique = first(rsvps_unique),
members_pct=round(num_members/rsvps_unique, 2)
)
How many events do members RSVP to? Below we use proportion to make them comparable.
ggplot(member_rsvps_pct %>%
filter(num_events > 1) %>%
select(member.id, chapter, rsvps_events_pct), aes(x=rsvps_events_pct)) +
geom_density() +
xlab("Member RSVP Events %") +
theme_few() +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
scale_x_continuous(breaks = c(0, .25, .50, .75, 1), labels=percent) +
facet_wrap(~chapter, scales = "free")
When we think of members, we think of people who regularly attend R-Ladies events. One metric we could consider in addition to total members (people who have signed up) is “active members”.
The plot below uses the inverse log to determine a scaled proportion to check RSVP’s against. A group with a smaller number of events should expect more active members to have attended a higher proportion of those events (log(2) = 1 -> 1/1 = 100%, log(42) = 4 -> 1/4 = 25%)). This method also filters out groups with only one event (log(1) = 0 -> 1/0 = Inf). This is more of a “throwing spagetti on a wall” method to see if anything sticks. A better method to determine true “active members” RSVP proportion should be explored.
ggplot(member_rsvps_pct_summary %>%
filter(rsvps_events_pct >= 1/num_events_log),
aes(x=reorder(chapter, num_events), y=num_members)) +
geom_bar(aes(fill=reorder(paste0(round(rsvps_events_pct,1)*100, "%"), -rsvps_events_pct)),
stat="identity", position="stack") +
theme_few() +
labs(x="Chapter", y="# Members", fill="% Events RSVPed", title="Most Active Members by % Events RSVPed To") +
coord_flip()
ggsave("png/most_active_rsvp.png")
## Saving 12 x 6 in image
To better understand the significance of of the “active” membership numbers computed above, we convert them into a proportion using the total number of members who have RSVPed. What proportion of all members who RSVPed to at least one meetup do “active members” represent?
ggplot(member_rsvps_pct_summary %>%
filter(rsvps_events_pct >= 1/num_events_log),
aes(x=reorder(chapter, num_events), y=members_pct)) +
geom_bar(aes(fill=reorder(paste0(round(rsvps_events_pct,1)*100, "%"), -rsvps_events_pct)),
stat="identity", position="stack") +
theme_few() +
scale_y_continuous(labels=percent) +
labs(x="Chapter", y="% RSVPed Members", fill="% Events", title="% Most Active Members by % Events RSVPed To") +
coord_flip()
What percent of total members have ever RSVP’ed to an event and how does that compare to the total membership of the group? In most cases, the majority of group members do not RSVP to events.
The goal of this is to understand how total membership may not be respresentative of actual member engagement. RSVP’s are also not necessarily indicative of who is actually showing up to the events but it’s the best that Meetup has to offer that is a consistent measure across groups.
meetup_groups <- read_csv(paste(params$output_folder, "rladies_meetup_groups.csv", sep="/"))
meetup_groups <- meetup_groups %>%
mutate(
chapter = ifelse(str_detect(name, "Warsaw"), "Warsaw",
str_replace(name, "R[ -]?[Ll]adies( )?(- )?", ""))
)
meetup_rsvp_group_merge <- member_rsvps_pct_summary %>%
group_by(chapter) %>%
summarize(
rsvp_members = first(rsvps_unique),
rsvps = sum(num_rsvps),
num_events = first(num_events),
num_events_log = first(num_events_log),
active_members = sum(num_members[rsvps_events_pct >= 1/num_events_log]),
active_rsvps = sum(num_rsvps[rsvps_events_pct >= 1/num_events_log]),
single_members = sum(num_members[num_rsvps == 1])
) %>%
left_join(meetup_groups %>% select(chapter, members))
# percentage of members that RSVP'ed
# percentage of single rsvp members
# percentage of active members
# percentage of members who have never rsvped
meetup_rsvp_group_merge <- meetup_rsvp_group_merge %>%
mutate(
rsvp_members_pct = round(rsvp_members/members,2),
single_members_pct = round(single_members/members,2),
active_members_pct = round(active_members/members,2),
no_rsvp_members = members - rsvp_members,
no_rsvp_members_pct = round(no_rsvp_members/members,2),
rsvp_members_diff = no_rsvp_members - rsvp_members,
rsvp_members_diff_pct = round(rsvp_members_diff/members, 2)
)
rsvps_vs_members <- meetup_rsvp_group_merge %>%
select(chapter, rsvp_members_pct, single_members_pct, active_members_pct, no_rsvp_members_pct) %>%
melt() %>%
inner_join(meetup_rsvp_group_merge %>% select(chapter, members))
Do more members RSVP or not?
The plot below offers a side-by-side comparison of members who have rsvp’ed and members who have not, scaled in proportion to total membership.
ggplot(rsvps_vs_members %>% filter(variable %in% c("rsvp_members_pct", "no_rsvp_members_pct")),
aes(x=reorder(chapter, members), y=value)) +
geom_bar(aes(fill=variable), stat="identity", position="dodge") +
theme_few() +
scale_y_continuous(labels = percent) +
labs(x="Chapter", y="% Members", fill="Member Type", title="% Members RSVP vs No RSVPs, side by side") +
coord_flip()
This plot creates a single value for comparison by taking the difference between the number of members who have never RSVP’ed for a Meetup event and those who have RSVP’ed to at least one. A huge positive difference indicates the majority of members have not RSVP’ed to any of the events which indicates a high membership number may not be reflective of actual Meetup participation. A negative difference indicates the majority of members have RSVP’ed and indicates a high level of engagement; the reported membership number should be a good indicator of Meetup participation. Percentage of total members has been used to scale for comparison across Meetup groups with highly variable members.
ggplot(meetup_rsvp_group_merge, aes(x=reorder(chapter, members), y=rsvp_members_diff_pct)) +
geom_bar(aes(fill=chapter), stat="identity", position="dodge", show.legend = FALSE) +
theme_few() +
scale_y_continuous(labels=percent) +
labs(x="Chapter", y="No RSVP Members - RSVPed Members (Scaled via %)",
title="Members w/o RSVP vs RSVP Members - Ordered by # Members") +
coord_flip()
ggplot(meetup_rsvp_group_merge, aes(x=round(rsvp_members_diff_pct, 1))) +
geom_histogram(bins=12) +
theme_few() +
labs(x="No RSVP Members - RSVPed Members", title="Members w/o RSVP vs RSVP Members")
Because the majority of groups had around 40% or less difference between RSVPs and total membership, groups that went over that threshold are called out. If stratifying the groups based on this difference is useful, a threshhold weighted by total membership should probably be considered.
ggplot(meetup_rsvp_group_merge %>% filter(rsvp_members_diff_pct >= .4),
aes(x=reorder(chapter, rsvp_members_diff_pct), y=rsvp_members_diff_pct)) +
geom_bar(aes(fill=chapter), stat="identity", position="dodge", show.legend = FALSE) +
theme_few() +
scale_y_continuous(labels = percent) +
labs(x="Chapter", y="No RSVP Members - RSVPed Members (Scaled)", title="RSVP vs No RSVP - Groups w/ Big Diffs") +
coord_flip()
# should we adjust to an acceptable threshhold based on total members?
ggplot(meetup_rsvp_group_merge %>% filter(rsvp_members_diff_pct >= .4),
aes(x=reorder(chapter, rsvp_members_diff_pct), y=rsvp_members_diff)) +
geom_bar(aes(fill=chapter), stat="identity", position="dodge", show.legend = FALSE) +
theme_few() +
labs(x="Chapter", y="No RSVP Members - RSVPed Members (Actual)", title="RSVP vs No RSVP - Groups w/ Big Diffs") +
coord_flip()
The following plot compares members who have only RSVP’ed to a single event and those who have RSVP’ed to a significant number (determined above using total number of events the group has had). This comparison is probably more significant for groups that have had a large number of events. It could potentially be a Meetup health measure. If a significant number of members only RSVP once and then don’t RSVP again, it could be worth reaching out to the Meetup group to see if they need support.
ggplot(rsvps_vs_members %>% filter(variable %in% c("active_members_pct", "single_members_pct")),
aes(x=reorder(chapter, members), y=value)) +
geom_bar(aes(fill=variable), stat="identity", position="dodge") +
theme_few() +
scale_y_continuous(labels = percent) +
labs(x="Chapter", y="% Members", fill="Member Type", title="High RSVP Members vs Single Event RSVP Members") +
coord_flip()
This exploration went in a lot of different directions and touched on some interesting possibilities. Ultimately this work identified the following larger areas of “health”:
Identify R-Ladies Meetups that are not using the R-Ladies topics so we can ask them to add it to their Meetup.
Report of changes to group organization, specifically if an organizer or co-organizer steps down. When the last organizer steps down, Meetup offers the opportunity for anyone to step up as the new organizer.
Compare the “latest” meetup group list to last months to see if any groups disappeared. Also indicates how many new groups were created since the last report was run. This will be more relevant when these metrics are integrated into the dashboard and run at regular intervals.
How much time between meetup events groups tend to have.
How many events in a given time interval do groups tend to have? Month seems like a good starting point here.
A list/report of upcoming meetups.
R-Ladies Meetups that haven’t had a first event yet ordered time since creation, from high to low. Groups at the top of the list should be reached out to to see if they need help. This could also be helpful for identifying possible abandoned meetups.
Groups that haven’t had a Meetup in awhile. Also useful for identifying possible struggling chapters that need support.
The membership at past points in time can be reconstructed using member profile info. Specifically, a cumsum of when members joined the group (“joined”" column in the meetup_members data frame).
Trends for total meetup group membership reported by binning meetup groups into appropriate ranges. This will give a better view of trends. Currently this uses a rounded natural log to create the bins but other methods should be explored further.
Members that RSVP to a significant proportion of events. This can be both historical and current. “Current”" would focus on members that are “active” within a more recent time frame, and “historical”" would be RSVP trends since the group began.
Proportion of members that are RSVP’ing to events. This should give a better idea of “true” membership by showing how many people are engaging with the group. Proportion of active members at the time of RSVP should be used for comparing across Meetup groups. The historical membership values would come from the “Membership Change Over Time” metric mentioned above.
Ultimately logic for these metrics should be supported via the MeetupR package. Rather than doing a whole hog approach, it makes more sense to break the work into sub-topic areas proposed in this report (there will be overlap). For isolated EDA per topic area to further refine the proposed metrics and to integrate with MeetupR, the following order is proposed:
To track issues and milestones, see the following Github link: [https://github.com/rladies-pdx/meetup-analysis]