Introduction

This dataset contains information regarding Starbucks and subsidiary store locations worldwide. This dataset is provided and updated by Starbucks Co. and is available for free on Kaggle. Our goals are to complete the following:

  1. Answer the question ‘how many distinct or unqiue countries is Starbucks in?’
  2. Create a visual representation (map) of the locations worldwide using R.

Data Preparation

To import your dataset in R, the ‘read.csv’ function must be called. Note that other file types may be read into R (ex: .xlsx, .xls, .sav, ,sas7bdat, etc.) with the .csv extension on ‘read.csv changed to the apporopriate file type. It is good practice to assign your newly imported data as a variable. For this example, we will read the csv file in and assign it to the variable ’starbucks.’

Analyzing the Data

Below we can see the summary function being called to display summary statistics. First call the summary statistics for the entire data of ‘starbucks’ and count how many countries the company is in.

summary(starbucks)
##                    Brand             Store.Number  
##  Coffee House Holdings:    1   19773-160973:    2  
##  Evolution Fresh      :    2   10001-99525 :    1  
##  Starbucks            :25249   10003-98585 :    1  
##  Teavana              :  348   10004-98481 :    1  
##                                10005-97691 :    1  
##                                10008-98261 :    1  
##                                (Other)     :25593  
##                  Store.Name          Ownership.Type 
##  Starbucks            :  224   Company Owned:11932  
##  SPA                  :    6   Franchise    :  317  
##  CentrO Ground Floor  :    2   Joint Venture: 3976  
##  Division del Norte   :    2   Licensed     : 9375  
##  Lemessos Enaerios    :    2                        
##  Mabohai Shopping Mall:    2                        
##  (Other)              :25362                        
##                             Street.Address         City      
##  Circular Building #6, Guard Post 8:   11   上海市:  542  
##  2580 S 156th St                   :    7   Seoul    :  243  
##  201 World Way                     :    5   北京市:  234  
##  3301 S 22nd Ave                   :    5   New York :  232  
##  4100 George J Bean Pkwy           :    5   London   :  216  
##  5757 Paradise Rd.                 :    5   (Other)  :24132  
##  (Other)                           :25562   NA's     :    1  
##  State.Province     Country         Postcode              Phone.Number  
##  CA     : 2821   US     :13608          : 1521                  : 6861  
##  TX     : 1042   CN     : 2734   0      :  101   773-686-6180   :   17  
##  ENG    :  787   CA     : 1468   310000 :   88   0              :    7  
##  WA     :  757   JP     : 1237   518000 :   70   4167763100     :    7  
##  11     :  706   KR     :  993   215000 :   64   704-359-4512   :    6  
##  FL     :  694   GB     :  901   (Other):23755   00351 211 147 6:    4  
##  (Other):18793   (Other): 4659   NA's   :    1   (Other)        :18698  
##                           Timezone      Longitude          Latitude     
##  GMT-05:00 America/New_York   :4889   Min.   :-159.46   Min.   :-46.41  
##  GMT-08:00 America/Los_Angeles:4194   1st Qu.:-104.67   1st Qu.: 31.24  
##  GMT-06:00 America/Chicago    :2901   Median : -79.35   Median : 36.75  
##  GMT+08:00 Asia/Beijing       :2731   Mean   : -27.87   Mean   : 34.79  
##  GMT+09:00 Asia/Tokyo         :1237   3rd Qu.: 100.63   3rd Qu.: 41.57  
##  GMT+09:00 Asia/Seoul         : 993   Max.   : 176.92   Max.   : 64.85  
##  (Other)                      :8655   NA's   :1         NA's   :1

Notice the problem? The summary statistics only reports the top 6 countries and groups the rest of the data in ‘other.’ To over come this, we must be more specific about the data we want to analyze. Now, how many countries is the brand in?

summary(starbucks$Country)
##    AD    AE    AR    AT    AU    AW    AZ    BE    BG    BH    BN    BO 
##     1   144   108    18    22     3     4    19     5    21     5     4 
##    BR    BS    CA    CH    CL    CN    CO    CR    CW    CY    CZ    DE 
##   102    10  1468    61    96  2734    11    11     3    10    28   160 
##    DK    EG    ES    FI    FR    GB    GR    GT    HU    ID    IE    IN 
##    21    31   101     8   132   901    28     7    16   268    73    88 
##    JO    JP    KH    KR    KW    KZ    LB    LU    MA    MC    MX    MY 
##    17  1237     4   993   106     8    29     2     9     2   579   234 
##    NL    NO    NZ    OM    PA    PE    PH    PL    PR    PT    QA    RO 
##    59    17    24    12     5    89   298    53    24    11    18    27 
##    RU    SA    SE    SG    SK    SV    TH    TR    TT    TW    US    VN 
##   109   102    18   130     3    11   289   326     3   394 13608    25 
##    ZA 
##     3

Shortcut:

The following code reduces the clutter and returns the number of unique or distinct countries on the last line.

unique(starbucks$Country)
##  [1] AD AE AR AT AU AW AZ BE BG BH BN BO BR BS CA CH CL CN CO CR CW CY CZ
## [24] DE DK EG ES FI FR GB GR GT HU ID IE IN JO JP KH KR KW KZ LB LU MA MC
## [47] MX MY NL NO NZ OM PA PE PH PL PR PT QA RO RU SA SE SG SK SV TH TR TT
## [70] TW US VN ZA
## 73 Levels: AD AE AR AT AU AW AZ BE BG BH BN BO BR BS CA CH CL CN CO ... ZA

Creating the Map

To achieve our goal of creating a visual map of the locations, we will need to call the leaflet package.

library(leaflet)

Next, lets organize our data that will be on the ‘popup’ of our map.

df <- paste("Name:", starbucks$Store.Name, "<br>",
            "Address:", paste(starbucks$Street.Address, starbucks$City,",", starbucks$Country, starbucks$Postcode), "<br>",
            "Phone:", starbucks$Phone.Number)

Now, lets create our interactive map using the following code. Note that our newly created df is the popup arguemnt.

## Assuming "Longitude" and "Latitude" are longitude and latitude, respectively
## Warning in validateCoords(lng, lat, funcName): Data contains 1 rows with
## either missing or invalid lat/lon values and will be ignored

Results

If done correctly, your results from counting the number of countries should be 73. Additionally, you should have generated the following map.

starbucks %>%
  leaflet() %>%
  addTiles() %>%
  addMarkers(popup=df, clusterOptions=markerClusterOptions())
## Assuming "Longitude" and "Latitude" are longitude and latitude, respectively
## Warning in validateCoords(lng, lat, funcName): Data contains 1 rows with
## either missing or invalid lat/lon values and will be ignored