I would like to begin by giving my shout out to Jeff Gentry (twitteR package author), Hadley Wickham & David Kahle (ggmaps authors) & finally but not the least Lucas Puente who inspired me to write this paper…thanks guys!
So what is Digital Marketing? I am sure most people have heard of this term one way or another especially with the boom of Social Media. A quick google search yields the following wikipedia definition;
“Digital marketing is an umbrella term for the marketing of products or services using digital technologies, mainly on the Internet, but also including mobile phones, display advertising, and any other digital medium…”
Is your business or company Marketing department taking advantage of this inexpensive opportunity that could potentially change and ramp up your market share? Or is your strategy still operating on a dinasaur platform in an IoT age? If so, it’s time to convince your leadership team to adopt the dynamic, inexpensive & robust digital methods of marketing. After all, your business will have everything to gain & nothing to lose unless terminologies like revenue growth or expansion do not interest you or your company - which I am certain your leadership spends days or perhaps weeks on end devising plans to expand the company’s business portfolio. Social media platforms like Twitter, Facebook, LinkedIn, Google+ etc have provided inexpensive ways for businesses to easily & efficiently promote their product brand or services all at the click of a button!I can make an inference here that Sign Spinnners or Sign Rockers or Human Directionals (I don’t know what titles the advertising industry has given them) will all one day disappear as consumers would rather use electronic gadgets for ads than hope to catch the right sign off the road. OK maybe that was a wild inference but you get the point. My point is, digital display advertising through social media platforms has become or is the rapidly growing marketing culture due to its various benefits some of which I mentioned in my previous publication.
So then how can your business tap into social media marketing? In this blog post I will demonstrate how to do so using twitter in R. As always, I like to do this in a reproducible fashion so you are able to use code shown here with minor modifications to suit your need.
I randomly picked a Missouri based wine company (St James Winery) for the demonstration. Let’s say the marketing department would like to allocate the marketing resources more efficiently to their domestic (US) market-base & in order to that, they would like to identify which locations their exisiting & potential customers are in. Customers or potential customers here is literally being taken as the company’s twitter followers. Even though this assumption may not be the best, it has previously proved to be highly correlated with customer data captured internally by some companies. Even if there was not much correlation, I think you’d agree that people who subscribe to your twitter represent a subset of the population of those who have a higher likelihood to convert into your customers.So why not focus your marketing campaign on potential customers, right?
As shown in my pervious paper Web Scrapping, Text Mining & Sentiment Analysis, you need to establish connetion to the twitter API in order to access twitter data. In case you have not done so, start by creating a twitter account and under the developer’s console create your application & obtain your access info.
library(twitteR)
consumer_key <- "Enter your consumer key here"
consumer_secret <- "Enter your consumer secret here"
access_token <- "Enter your access token here"
access_secret <- "Enter your access secret here"
options(httr_oauth_cache=T)
setup_twitter_oauth(consumer_key,consumer_secret,access_token,access_secret)
## [1] "Using direct authentication"
After you’ve established connection, you’re ready to exract your business/company’s followers data! In this step you need point twitter to the specific user which is St James Winery in this case. After that’s done you can proceed to extracting the user’s followers in a few lines of code.
Stjameswinery <- getUser("stjameswinery")
#See this user's location:
location(Stjameswinery)
## [1] "540 State Rte B, St James, MO "
#Download data passing retryOnRateLimit arg to avoid twitter API imposing data pull limit
Stjameswinery_follower_IDs <- Stjameswinery$getFollowers(retryOnRateLimit=1500)
length(Stjameswinery_follower_IDs)
## [1] 2326
At the time of data extraction, St James Winery had 2349 followers so the output above is fairly close with a few misses.
We have just exracted a bunch of text data we need to manipulate, cleanse and transform it to a usable form. In this step, I will inspect my data & as in most of my analysis process I like to check for missing data at this point, I am particularly interested in identifying missing location as this would not help in geocoding those followers. As a preliminary step I will trasform the data to a data frame, a form thats more intuitive to me.
library(data.table)
#Transform to DF
Stjameswinery_followers_df = rbindlist(lapply(Stjameswinery_follower_IDs,as.data.frame))
#Perform a quick check
head(Stjameswinery_followers_df$location, 10)
## [1] "Pacific, MO" "Brisbane, Queensland" "Somewhere in Europe"
## [4] "" "Los Angeles, CA" "USA"
## [7] "Ogden, UT" "Hannibal, MO" "USA"
## [10] "St. Louis MO"
#Remove rows with empty location
Stjameswinery_followers_df<-subset(Stjameswinery_followers_df, location!="")
Geocoding is the process of converting addresses into geographic coordinates which can then be projected onto a map. I geocode the bio reported addresses into coordinates which I will project onto the map.
Note:I will admit at this point that part of the reason I chose to go with St James Winery is that the company’s twitter had less than 2500 followers! If you will be extracting a user’s followers exceeding 2500, you will need to register with Google. Presently, Google charges $0.50 per 1000 records extracted after you’ve exceeded the free 2500 records. And so for the case exceeding the threshold, Lucas Puente has modified the original geocode() function to include api_key = arg to associate Google API user (you) to your account. However, in my case here the original function will suffice.
library(ggmap)
geocode_apply <- function(x){
geocode(x, source = "google", output = "all")
}
geocode_results <- sapply(Stjameswinery_followers_df$location, geocode_apply, simplify = F)
length(geocode_results)
## [1] 1726
Not every address is accurately geocoded.If you take a look at some twitter bios, some users populate the location section with addresses that are not really locations!Like you may find the “Lou” for St. Louis MO & so forth. The Google API attempts to estimate the longititude and latitude for such addresses. To our rescue however, the API outputs a success message for each successfully geocoded address and gives a “OK” status. So I will filter out the geocoded addresses whose status is not “OK”…thanks Lucas!
condition_1 <- sapply(geocode_results, function(x) x["status"]=="OK")
geocode_results<-geocode_results[condition_1]
#Only keep locations with one match
condition_2 <- lapply(geocode_results, lapply, length)
condition_3<-sapply(condition_2, function(x) x["results"]=="1")
geocode_results<-geocode_results[condition_3]
#Check the number of successfully geocoded locations
length(geocode_results)
## [1] 1541
There’s been fairly a significant loss of data in the cleansing & transformation processes above. This means not every data point or geocoded address will be projected onto the map in the visualization. But the cleansing is necessary if we want to project accurate locations and generally perform anlysis on accurate representations of data. Now we are ready to project the locations onto the map. Remember at this point our primary objective is focussed on the domestic market so I will filter out non-US locations. To start with, I will turn the data back to a data frame then filter out non-US locations & finally project the subset locations onto the US map.
results_1 <-lapply(geocode_results, as.data.frame)
results_2 <-lapply(results_1,function(x) subset(x, select=c("results.formatted_address",
"results.geometry.location.lng",
"results.geometry.location.lat")))
#Format thes new data frames:
results_3 <-lapply(results_2,function(x) data.frame(Location=x[1,"results.formatted_address"],
lat=x[1,"results.geometry.location.lat"],
lng=x[2,"results.geometry.location.lng"]))
#Bind these data frames together:
results_4 <-rbindlist(results_3)
#Add info on the original (i.e. user-provided) location string:
results_5 <-results_4[,Original_Location := names(results_3)]
#Only keep American results:
american_results<-subset(results_5,
grepl(", USA", results_5$Location)==TRUE)
head(american_results,10)
## Location lat lng Original_Location
## 1: Pacific, MO 63069, USA 38.48200 -90.74152 Pacific, MO
## 2: Los Angeles, CA, USA 34.05223 -118.24368 Los Angeles, CA
## 3: Ogden, UT, USA 41.22300 -111.97383 Ogden, UT
## 4: Hannibal, MO 63401, USA 39.70838 -91.35848 Hannibal, MO
## 5: St. Louis, MO, USA 38.62700 -90.19940 St. Louis MO
## 6: Norman, OK, USA 35.22257 -97.43948 Norman, OK
## 7: Maryland, USA 39.04575 -76.64127 Maryland
## 8: St. Louis, MO, USA 38.62700 -90.19940 St Louis, MO
## 9: Licking, MO 65542, USA 37.49949 -91.85710 Licking, Missouri
## 10: St Robert, MO, USA 37.82810 -92.17767 St Robert, MO
#Remove entries that are too vague:
american_results$commas<-sapply(american_results$Location, function(x)
length(as.numeric(gregexpr(",", as.character(x))[[1]])))
american_results<-subset(american_results, commas==2)
#Drop the "commas" column:
american_results<-subset(american_results, select=-commas)
#Check number of successes:
nrow(american_results)
## [1] 1155
There are numerous projection methods you can utilize, after assessing different ones I found out the Albers projection was yielding the best looking projection. I will start by loading some necessary packages in order to perform the projection.
ipak <- function(pkg){
new.pkg <- pkg[!(pkg %in% installed.packages()[, "Package"])]
if (length(new.pkg))
install.packages(new.pkg, dependencies = TRUE)
sapply(pkg, require, character.only = TRUE)
}
packages <- c("maps", "mapproj", "splancs")
ipak(packages)
## maps mapproj splancs
## TRUE TRUE TRUE
albers_proj<-map("state", proj="albers", param=c(39, 45), col="#999999", fill=FALSE, bg=NA, lwd=0.2, add=FALSE, resolution=1)
points(mapproject(american_results$lng, american_results$lat), col=NA, bg="#00000030",pch=21, cex=0.5)
mtext("Map of Stjameswinery's Followers", side = 3, line = -3.5, outer = T, cex=1.5, font=3)
As I expected, the map shows the highest concentration of subscribers around Missouri. The concentration begins to reduce the further away you move from Missouri which makes sense considering St James Winery is located in Missouri. The company also seems to have a stronger customer-base North-East of the US compared to the West & South regions, you will also notice some States with zero subscribers. This visualization would be very helpful in prioritizing marketing campaigns as the company seeks to expand its market horizons & establish its wine product brands across the US. Ofcourse the outcome here is somehow under-estimated due to the loss of data encountered during the cleansing process but is a great & informative visualization. It would also be very beneficial for someone from St James Winery to validate these results using their internal customer data.
I have demonstrated how your business or company can use socal media to gain actionable customer insights. If you found this blog post helpful, or have any questions or comments please don’t hesitate to contact me via twitter [@PunkAnalytics24](https://twitter.com/PunkAnalytics24).
Thanks & Good Luck!