avaialable from (https://rpubs.com/staszkiewicz/EX_7_NA_EN)

Introduction

We will try to find out where Banks are physically located in the US.

For this we need the addresses of the banks (mainly headquarters) and information about their geographical location in the form of longitude and latitude (geocoding).

Spatial imaging It involves assigning an object a place in space. The method of many packages available for geocoding, we will use tmaptools.

#rm(list = ls()) # Gdyby była potrzeba wyczyszczenia środowiska
#install.packages("tmaptools") # jeśli po raz pierwszy instalujemy bibliotekę

library(tmaptools) # ładowanie pakietu
## Warning: pakiet 'tmaptools' został zbudowany w wersji R 4.1.3

As usual, we will use primary data from the article Audit fee and communication sentiment. Economic Research-Ekonomska Istraživanja. https://doi.org/10.1080/1331677X.2021.1985567 Staszkiewicz and Karkowska (2021).

# z bazowych funkcji systemu wczytamy klasyczny plik z csv ale w taki sposób
#   że wybierzemy z okienka umiejscowienie pliku na włanym komuterze 
# dlaego zagnieżdzamy polecenie "file.choose()"

bank <-  read.csv(file.choose())

As usual, we will use primary data from the article Audit fee and communication sentiment. Economic Research-Ekonomska Istraživanja. https://doi.org/10.1080/1331677X.2021.1985567 Staszkiewicz and Karkowska (2021).

bank2016<- bank[bank$Year_Ended == 2016,]

Note that we need a comma to indicate that we are selecting all columns.

Let’s compare the two objects bank and bank2016 well, we have 197 variables in both data frames, but we have reduced the number of observations (in this case banks) from 359.

Let’s try to geocode one address - the first bank in our new data frame, i.e. bank2016, for the sake of establishing attention

View(bank2016[1,])

What we can do is to combine the individual elements of the address into one object and assign longitude and latitude to such an address.

  1. combine the address
ASB<-bank2016[1,] # definujemy dane o banku

2 On a simple vector, we have the address data in columns 7 through 14, but for our purposes, the following elements are weighed, the state name, region and postal code i.e. items 12 through 14.

So we’ll use the paste() function with the collapse argument =’’ to collapse the separate cells into one, and we’ll assign them to the addressASB object like this:

addressASB<-paste(ASB[c(12:14)],collapse=' ') 

having already adre we need to assign longitude and latitude to it since we have run the library(tmaptools) bible then we can directly geocode theASB address

library(tmaptools)
gk<-geocode_OSM(addressASB)
gk
## $query
## [1] "WISCONSIN US Midwest 54301"
## 
## $coords
##         x         y 
## -88.49582  44.24636 
## 
## $bbox
##      xmin      ymin      xmax      ymax 
## -88.49677  44.24636 -88.49532  44.24637

Please note that geocoding returns us an object in the form of a list

class(gk)
## [1] "list"

and if I look at the structure of this list we have the following elements

str(gk)
## List of 3
##  $ query : chr "WISCONSIN US Midwest 54301"
##  $ coords: Named num [1:2] -88.5 44.2
##   ..- attr(*, "names")= chr [1:2] "x" "y"
##  $ bbox  : 'bbox' Named num [1:4] -88.5 44.2 -88.5 44.2
##   ..- attr(*, "names")= chr [1:4] "xmin" "ymin" "xmax" "ymax"
##   ..- attr(*, "crs")=List of 2
##   .. ..$ input: chr "EPSG:4326"
##   .. ..$ wkt  : chr "GEOGCRS[\"WGS 84\",\n    DATUM[\"World Geodetic System 1984\",\n        ELLIPSOID[\"WGS 84\",6378137,298.257223"| __truncated__
##   .. ..- attr(*, "class")= chr "crs"
  1. query: $query.
  2. length and width of $coords.
  3. and area $bbox.

In this example, we won’t use area, we only need longitude (x) and latitude (y).

So let’s add x and y as variables to our box. Here, experiment with how we address the elements in the list: e.g. gk, gk[1], gk$query,gk$query[1].

and we link our x and y to ASB. ASB<-c(ASB,gk$coords[1:2])

operations on the entire data frame

We can repeat the same reasoning on the whole data frame and assign our takeaways to the data

bank2016\(adr <- paste(bank2016\)Countyx,bank2016\(Region, bank2016\)Zipx, sep=” “)

bank2016$adr <- paste(bank2016$Region, bank2016$Zipx, sep=" ")

Now we’re going to geocode our address, this process will take some time and will nonetheless return some errors. Let’s save the results to an XY object

## No results found for "US Southwest 75201".
## No results found for "US Mid Atlantic 14203".
## No results found for "US Mid Atlantic 15212".
## No results found for "US Mid Atlantic 18964".
## No results found for "US Mid Atlantic 26003".
## No results found for "US Mid Atlantic 41502-2947".
## No results found for "US Mid Atlantic 12302".
## No results found for "US Mid Atlantic 17604".
## No results found for "US Southwest 75701".
## No results found for "US Mid Atlantic 15907".
## No results found for "US Mid Atlantic 15701".
## No results found for "US Mid Atlantic 7632".
## No results found for "US Mid Atlantic 19335".
## No results found for "US Mid Atlantic 15222-2401".
## No results found for "US Mid Atlantic 7470".
## No results found for "US Mid Atlantic 17059-0066".
## No results found for "US Mid Atlantic 17325".
## No results found for "US Mid Atlantic 17740".
## No results found for "US Mid Atlantic 12801".
## No results found for "US Mid Atlantic 15701".
## No results found for "US Mid Atlantic 22657".
## No results found for "US Mid Atlantic 13214".
## No results found for "US Mid Atlantic 17201-0819".
## No results found for "US Mid Atlantic 25313".
## No results found for "US Mid Atlantic 25301".
## No results found for "US Mid Atlantic 16830".
## No results found for "US Mid Atlantic 18603".
## No results found for "US Mid Atlantic 16933".
## No results found for "US Mid Atlantic 11545".
## No results found for "US Mid Atlantic 22853".
## No results found for "US Mid Atlantic 23663".
## No results found for "US Mid Atlantic 24541".
## No results found for "US Mid Atlantic 18951-9005".
## No results found for "US Mid Atlantic 14902".
## No results found for "US Mid Atlantic 21550".
## No results found for "US Mid Atlantic 13815".
## No results found for "US Midwest 43502".
## No results found for "US Mid Atlantic 24060".
## No results found for "US Mid Atlantic 19010".
## No results found for "US Mid Atlantic 17403".
## No results found for "US Mid Atlantic 16901".
## No results found for "US Mid Atlantic 26836".
## No results found for "US Mid Atlantic 20832".
## No results found for "US Mid Atlantic 17257".
## No results found for "US Mid Atlantic 19801".
## No results found for "US Mid Atlantic 10013".
## No results found for "US Mid Atlantic 19102".
## No results found for "US Mid Atlantic 40206".
## No results found for "US Mid Atlantic 14006".
## No results found for "US Mid Atlantic 11932".
## No results found for "US Mid Atlantic 7438".
## No results found for "US Mid Atlantic 20601".
## No results found for "US Mid Atlantic 16373".
## No results found for "US Mid Atlantic 24605".
## No results found for "US Mid Atlantic 14569".
## No results found for "US Mid Atlantic 21404".
## No results found for "US Mid Atlantic 17061".
## No results found for "US Mid Atlantic 22611".
## No results found for "US Mid Atlantic 23219".
## No results found for "US Mid Atlantic 25702".
## No results found for "US Mid Atlantic 21227".
## No results found for "US Midwest 45631".
## No results found for "US Mid Atlantic 11590".
## No results found for "US Mid Atlantic 15237".
## No results found for "US Mid Atlantic 23181".
## No results found for "US Mid Atlantic 8809".
## No results found for "US Mid Atlantic 40202".
## No results found for "US Mid Atlantic 11556".
## No results found for "US Mid Atlantic 40362-0157".
## No results found for "US Mid Atlantic 08753-8396".
## No results found for "US Mid Atlantic 11201".
## No results found for "US Mid Atlantic 14850".
## No results found for "US Mid Atlantic 24210".
## No results found for "US Mid Atlantic 18431".
## No results found for "US Mid Atlantic 10027-4512".
## No results found for "US Mid Atlantic 7432".
## No results found for "US Mid Atlantic 7416".
## No results found for "US Mid Atlantic 22482".
## No results found for "US Mid Atlantic 21601-3013".
## No results found for "US Mid Atlantic 18512".
## No results found for "US Mid Atlantic 42440".
## No results found for "US Mid Atlantic 7921".
## No results found for "US Mid Atlantic 18503".
## No results found for "US Mid Atlantic 10901".
## No results found for "US Mid Atlantic 12414".
## No results found for "US Mid Atlantic 42103".
## No results found for "US Southwest 75201".
## No results found for "US Mid Atlantic 20186".
## No results found for "US Midwest 45830".
## No results found for "US Midwest 50010".
## No results found for "US Mid Atlantic 8512".
## No results found for "US Mid Atlantic 24260".
## No results found for "US Mid Atlantic 20191".
## No results found for "US Mid Atlantic 7306".
## No results found for "US Mid Atlantic 7002".
## No results found for "US Mid Atlantic 20716".
## No results found for "US Mid Atlantic 26554-2777".
## No results found for "US Mid Atlantic 23113".
## No results found for "US Mid Atlantic 41702".
## No results found for "US Mid Atlantic 8080".
## No results found for "US Mid Atlantic 23233".
## No results found for "US Mid Atlantic 22911".
## No results found for "US Mid Atlantic 8901".
## No results found for "US Mid Atlantic 14048".
## No results found for "US Mid Atlantic 7724".
## No results found for "US Mid Atlantic 40223".
## No results found for "US Midwest 54701".
## No results found for "US Mid Atlantic 15219".
## No results found for "US Mid Atlantic 18360".
## No results found for "US Mid Atlantic 21043".
## No results found for "US Mid Atlantic 7024".
## No results found for "US Mid Atlantic 18966".
## No results found for "US Mid Atlantic 17522-0457".
## No results found for "US Mid Atlantic 18017".
## No results found for "US Mid Atlantic 24011".
## No results found for "US Mid Atlantic 16365".
## No results found for "US Mid Atlantic 10016".
## No results found for "US Mid Atlantic 19610".
## No results found for "US Mid Atlantic 7095".
## No results found for "US Southwest 75225".
## No results found for "US Mid Atlantic 41101".
## No results found for "US Mid Atlantic 11753".
## No results found for "US Mid Atlantic 19301".
## No results found for "US Mid Atlantic 21286".
## No results found for "US Midwest 46037".
## No results found for "US Southwest 75069".
## No results found for "US Mid Atlantic 22911".
## No results found for "US Mid Atlantic 19145".
## No results found for "US Mid Atlantic 17110".
## No results found for "US Mid Atlantic 7078".
## No results found for "US Mid Atlantic 15320".
## No results found for "US Mid Atlantic 13126".
## No results found for "US Mid Atlantic 19103".
## No results found for "US Mid Atlantic 7004".
## No results found for "US Mid Atlantic 21050".
## No results found for "US Mid Atlantic 24091".
## No results found for "US Mid Atlantic 10462".
## No results found for "US Mid Atlantic 15237".

Geocoding is a rather tedious process and sometimes requires an iterative approach for our purposes, we will simply skip those objects for which we have not determined X and Y.

nowa<-merge(bank2016,XY, by.x = "adr",by.y="query")
# by połączyć ramki, trzeba wskazć kolumnę (mny) w pierwszej i drugiej, wg których będziemy łączyć ramki danch

Please note that the size of the objects after merging will be different Bank20166 had 259 observations, XY 221 while after mapping the new 243 observations. Why?

Visualize the data on a map

We will use the maps package and ggplot to visualize where the banks come from (where they are headquartered), which are listed on the US public market in 2016

# właczenie biblioteki
library(maps)
## Warning: pakiet 'maps' został zbudowany w wersji R 4.1.3
library(ggmap)
## Ładowanie wymaganego pakietu: ggplot2
## Warning: pakiet 'ggplot2' został zbudowany w wersji R 4.1.3
## Google's Terms of Service: https://cloud.google.com/maps-platform/terms/.
## Please cite ggmap if you use it! See citation("ggmap") for details.
# kontur świata
mapWorld <- borders("world", colour="gray50", fill="white")

# kontur stanów - nie potrzebny
#states<- borders("state",colour="gray50") 

#wygenrowanie kontury
mpw<-ggplot()+mapWorld
# dodanie punków z naszej bazy

mpw <- mpw+ geom_point(data=nowa, aes(x=lon, y=lat) ,color="blue",alpha=0.5, size=2)

#wyświetlenie zbudowanej grafiki

mpw

Task

Show banks based only in the US, in red if there was an integrated report , in blue if there was no such report, in red mark banks audited by KMPG zas different shapes, different types of issuers (Accelerated, not-Accelerated, etc.). The result should be similar to the one below

# mpw<-ggplot()+mapWorld
# mpw <- mpw+ geom_point(data=nowa, aes(x=lon, y=lat), ,color=ifelse(nowa$Is_Integrated_Audit=="Yes","Red","Blue"),alpha=0.5, size=2, shape =factor(nowa$Filer_Status), show.legend = "Yes" )
# mpw
# 
# # to jest wersja, gdzie mapowanie jest na poziomie osi
# 
# mpw<-ggplot(data=nowa, aes(x=lon, y=lat, ,color=ifelse(nowa$Is_Integrated_Audit=="Yes","Red","Blue"),alpha=0.5, size=2, shape =factor(nowa$Filer_Status), show.legend = "Yes" ))+mapWorld
# mpw <- mpw+ geom_point()
# mpw
# 
# #do tego dodamy ograniczenie dla USA
# mpw<-ggplot(data=nowa, aes(x=lon, y=lat, ,color=ifelse(nowa$Is_Integrated_Audit=="Yes","Red","Blue"),alpha=0.5, size=2, shape =factor(nowa$Filer_Status), show.legend = "Yes" ))+mapWorld
# mpw <- mpw+ geom_point()
# mpw<- mpw + xlim(-130,-60) + ylim(20, 50)
# mpw
# 
# # i jeszcze zróżnicujemy wielść `size` w zależności od tego czy bank był badany przez KPMG czy nie
# mpw<-ggplot(data=nowa, aes(x=lon, y=lat, ,color=ifelse(Is_Integrated_Audit=="Yes","Red","Blue"),alpha=0.5, size=ifelse(Auditorx=="KPMG LLP","6","2"), shape =factor(Filer_Status), show.legend = "Yes" ))+mapWorld
# mpw <- mpw+ geom_point()
# mpw<- mpw + xlim(-130,-60) + ylim(20, 50)
# mpw
# 
# 
# dodatkowo chcemy wiedzieć które z banków jakim były typem emitenta `factor(nowa$Filer_Status)`
# 
# 
# 
# 
# 
# # teraz skupimy się tylko na USA podając szergokość i długość geograficzną świata którą chcemy przybliżyć
# mpw + xlim(-130,-60) + ylim(20, 50)

Task 2

Let’s try terza to connect the auditors’ headquarters with the bank’s headquarters. For this we need to geocode the auditors, since we don’t have addresses, we will try to do this based on the name and write this into the audGC object, i.e. audGC<-geocode_OSM(new$Auditorx) and then merge into the title data frame only that we can’t use the previous method e.g.. nowaAU<-merge(new,audGC, by.x = "Auditorx",by.y="query") because we would mulitplicate the observation due to the fact that in the audGC object we have the same variable names that are already in the new object, hence we will generate a new object, which is a data frame which we will call audLL and we will change the names of the variable years and log to Alat and Alot namely audLL<-data.frame(aud =audGC$query,Alat= audGC$lat,Alon =audGC$lon) and only now we will merge - well it didn’t work out why? Because there are repeated records in augGC

nrow(audGC)==length(unique(audGC$query)) hence I need to reduce audGC only to unique records the number of rows in the nrow(audGC) object is different from the count of unique values length(unique(audGC$query)). Therefore, we will combine only unique values newAU<-merge(new,unique(audLL), by.x = "auditorx",by.y="aud"). In this case, we geocoded just 88 obserations for 15 arudite companies levels(factor(newAU$Auditorx)). With the data already prepared, we want to link the graphical banks to their auditing companies.

# 
# #Te rozwiązanie jest zrobione na podstawie galerii i kodu z tej strony: 
# # https://www.data-to-viz.com/story/MapConnection.html
# 
# # przypiszę dane do obiektu data
# 
# data <-nowaAU
# 
# # Download NASA night lights image
# download.file("https://www.nasa.gov/specials/blackmarble/2016/globalmaps/BlackMarble_2016_01deg.jpg", 
# destfile = "IMG/BlackMarble_2016_01deg.jpg", mode = "wb")
# 
# # Load picture and render
# library(jpeg) # readJPGEG
# library(grid)# rasterGrob
# earth <- readJPEG("IMG/BlackMarble_2016_01deg.jpg", native = TRUE)
# earth <- rasterGrob(earth, interpolate = TRUE)
# 
# # Count how many times we have each unique connexion + order by importance
# library(dplyr) # trzeba wywołać bibliotekę, bo z jakiegoś powodu nie chce działać
# summary=data %>% 
#   dplyr::count(lat,lon,Auditorx) %>%
#   arrange(n)
# 
# # A function that makes a dateframe per connection (we will use these connections to plot each lines)
# 
# library(geosphere) # for gcIntermediate
# 
# data_for_connection=function( lon,lat, Alon, Alat, group){
#   inter <- gcIntermediate(c(lon, lat), c(Alon, Alat), n=50, addStartEnd=TRUE, breakAtDateLine=F)             
#   inter=data.frame(inter)
#   inter$group=NA
#   diff_of_lon=abs(lon) + abs(Alon)
#   if(diff_of_lon > 180){
#     inter$group[ which(inter$lon>=0)]=paste(group, "A",sep="")
#     inter$group[ which(inter$lon<0)]=paste(group, "B",sep="")
#   }else{
#     inter$group=group
#   }
#   return(inter)
# }
# 
# # Utworzenie kompletnej ramki danych z punktami wszystkich linii, które mają być wykonane.
#  
# data_ready_plot=data.frame()
# for(i in c(1:nrow(summary))){
#   tmp=data_for_connection(summary$lon[i], summary$lat[i], summary$Alon[i], summary$Alat[i] , i)
#   tmp$homecontinent=summary$Auditorx[i]
#   tmp$n=summary$n[i]
#   data_ready_plot=rbind(data_ready_plot, tmp)
# }
# data_ready_plot$homecontinent=factor(data_ready_plot$Auditorx, levels= levels(as.factor(data_ready_plot$Auditorx)))
# 
# # Plot
# library(ggplot2)
# p <- ggplot() +
#   annotation_custom(earth, xmin = -180, xmax = 180, ymin = -90, ymax = 90) +
#   geom_line(data=nowaAU, aes(x=lon, y=lat,  colour="yellow", alpha=1), size=0.6) +
#   scale_color_brewer(palette="Set3") +
#   theme_void() +
#   theme(
#         legend.position="none",
#         panel.background = element_rect(fill = "black", colour = "black"), 
#         panel.spacing=unit(c(0,0,0,0), "null"),
#         plot.margin=grid::unit(c(0,0,0,0), "cm"),
#   ) +
#   ggplot2::annotate("text", x = -150, y = -45, hjust = 0, size = 11, label = paste("Where surfers travel."), color = "white") +
#   ggplot2::annotate("text", x = -150, y = -51, hjust = 0, size = 8, label = paste("data-to-viz.com | NASA.gov | 10,000 #surf tweets recovered"), color = "white", alpha = 0.5) +
#   #ggplot2::annotate("text", x = 160, y = -51, hjust = 1, size = 7, label = paste("Cacedédi Air-Guimzu 2018"), color = "white", alpha = 0.5) +
#   xlim(-180,180) +
#   ylim(-60,80) +
#   scale_x_continuous(expand = c(0.006, 0.006)) +
#   coord_equal() 
# 
# # Save at PNG
# ggsave("IMG/Surfer_travel.png", width = 36, height = 15.22, units = "in", dpi = 90)
# 
Staszkiewicz, Piotr, and Renata Karkowska. 2021. “Audit Fee and Banks Communication Sentiment.” Economic Research-Ekonomska Istraživanja, October, 1–21. https://doi.org/10.1080/1331677x.2021.1985567.