This dataset contains information regarding Starbucks and subsidiary store locations worldwide. This dataset is provided and updated by Starbucks Co. and is available for free on Kaggle. Our goals are to complete the following:
To import your dataset in R, the ‘read.csv’ function must be called. Note that other file types may be read into R (ex: .xlsx, .xls, .sav, ,sas7bdat, etc.) with the .csv extension on ‘read.csv changed to the apporopriate file type. It is good practice to assign your newly imported data as a variable. For this example, we will read the csv file in and assign it to the variable ’starbucks.’
Below we can see the summary function being called to display summary statistics. First call the summary statistics for the entire data of ‘starbucks’ and count how many countries the company is in.
summary(starbucks)
## Brand Store.Number
## Coffee House Holdings: 1 19773-160973: 2
## Evolution Fresh : 2 10001-99525 : 1
## Starbucks :25249 10003-98585 : 1
## Teavana : 348 10004-98481 : 1
## 10005-97691 : 1
## 10008-98261 : 1
## (Other) :25593
## Store.Name Ownership.Type
## Starbucks : 224 Company Owned:11932
## SPA : 6 Franchise : 317
## CentrO Ground Floor : 2 Joint Venture: 3976
## Division del Norte : 2 Licensed : 9375
## Lemessos Enaerios : 2
## Mabohai Shopping Mall: 2
## (Other) :25362
## Street.Address City
## Circular Building #6, Guard Post 8: 11 䏿µ·å¸: 542
## 2580 S 156th St : 7 Seoul : 243
## 201 World Way : 5 å京å¸: 234
## 3301 S 22nd Ave : 5 New York : 232
## 4100 George J Bean Pkwy : 5 London : 216
## 5757 Paradise Rd. : 5 (Other) :24132
## (Other) :25562 NA's : 1
## State.Province Country Postcode Phone.Number
## CA : 2821 US :13608 : 1521 : 6861
## TX : 1042 CN : 2734 0 : 101 773-686-6180 : 17
## ENG : 787 CA : 1468 310000 : 88 0 : 7
## WA : 757 JP : 1237 518000 : 70 4167763100 : 7
## 11 : 706 KR : 993 215000 : 64 704-359-4512 : 6
## FL : 694 GB : 901 (Other):23755 00351 211 147 6: 4
## (Other):18793 (Other): 4659 NA's : 1 (Other) :18698
## Timezone Longitude Latitude
## GMT-05:00 America/New_York :4889 Min. :-159.46 Min. :-46.41
## GMT-08:00 America/Los_Angeles:4194 1st Qu.:-104.67 1st Qu.: 31.24
## GMT-06:00 America/Chicago :2901 Median : -79.35 Median : 36.75
## GMT+08:00 Asia/Beijing :2731 Mean : -27.87 Mean : 34.79
## GMT+09:00 Asia/Tokyo :1237 3rd Qu.: 100.63 3rd Qu.: 41.57
## GMT+09:00 Asia/Seoul : 993 Max. : 176.92 Max. : 64.85
## (Other) :8655 NA's :1 NA's :1
Notice the problem? The summary statistics only reports the top 6 countries and groups the rest of the data in ‘other.’ To over come this, we must be more specific about the data we want to analyze. Now, how many countries is the brand in?
summary(starbucks$Country)
## AD AE AR AT AU AW AZ BE BG BH BN BO
## 1 144 108 18 22 3 4 19 5 21 5 4
## BR BS CA CH CL CN CO CR CW CY CZ DE
## 102 10 1468 61 96 2734 11 11 3 10 28 160
## DK EG ES FI FR GB GR GT HU ID IE IN
## 21 31 101 8 132 901 28 7 16 268 73 88
## JO JP KH KR KW KZ LB LU MA MC MX MY
## 17 1237 4 993 106 8 29 2 9 2 579 234
## NL NO NZ OM PA PE PH PL PR PT QA RO
## 59 17 24 12 5 89 298 53 24 11 18 27
## RU SA SE SG SK SV TH TR TT TW US VN
## 109 102 18 130 3 11 289 326 3 394 13608 25
## ZA
## 3
The following code reduces the clutter and returns the number of unique or distinct countries on the last line.
unique(starbucks$Country)
## [1] AD AE AR AT AU AW AZ BE BG BH BN BO BR BS CA CH CL CN CO CR CW CY CZ
## [24] DE DK EG ES FI FR GB GR GT HU ID IE IN JO JP KH KR KW KZ LB LU MA MC
## [47] MX MY NL NO NZ OM PA PE PH PL PR PT QA RO RU SA SE SG SK SV TH TR TT
## [70] TW US VN ZA
## 73 Levels: AD AE AR AT AU AW AZ BE BG BH BN BO BR BS CA CH CL CN CO ... ZA
To achieve our goal of creating a visual map of the locations, we will need to call the leaflet package.
library(leaflet)
Next, lets organize our data that will be on the ‘popup’ of our map.
df <- paste("Name:", starbucks$Store.Name, "<br>",
"Address:", paste(starbucks$Street.Address, starbucks$City,",", starbucks$Country, starbucks$Postcode), "<br>",
"Phone:", starbucks$Phone.Number)
Now, lets create our interactive map using the following code. Note that our newly created df is the popup arguemnt.
## Assuming "Longitude" and "Latitude" are longitude and latitude, respectively
## Warning in validateCoords(lng, lat, funcName): Data contains 1 rows with
## either missing or invalid lat/lon values and will be ignored
If done correctly, your results from counting the number of countries should be 73. Additionally, you should have generated the following map.
starbucks %>%
leaflet() %>%
addTiles() %>%
addMarkers(popup=df, clusterOptions=markerClusterOptions())
## Assuming "Longitude" and "Latitude" are longitude and latitude, respectively
## Warning in validateCoords(lng, lat, funcName): Data contains 1 rows with
## either missing or invalid lat/lon values and will be ignored