# ref: [SCR]
urlfile <- getURL("https://en.wikipedia.org/wiki/List_of_Allied_convoys_during_World_War_II_by_region",.opts = list(ssl.verifypeer = FALSE))
#parsed_html <- htmlParse(urlfile, encoding = "UTF-8")
#summary(parsed_html)
tables <- readHTMLTable(urlfile)
tables <- list.clean(tables, fun = is.null, recursive = FALSE)
ECAConvoys <- tables[[1]]
NAConvoys <-tables[[2]]
NACConvoys <- tables[[3]]
I used readHTMLTable function to read and parse the file. The 3 tables I joined in this exercise was the first three tables.
I saved the CSV file after I have combined all of the convoy tables into one data frame. The logic for this is found in sub-section 2a.
ECAConvoys <- mutate(ECAConvoys, Type = "European Coastal Atlantic")
NAConvoys <- mutate(NAConvoys, Type = "North Atlantic")
NACConvoys <- mutate(NACConvoys, Type = "North American")
ECAConvoys <- ECAConvoys[-1,]
NAConvoys <- NAConvoys[-1,]
NACConvoys <- NACConvoys[-1,]
For the requirements of this question, I used the mutate function from dplyr to add a column called type that will house the region name for each region. I also removed the header row from each of those tables.
## Warning: Column `V1` joining factors with different levels, coercing to
## character vector
## Warning: Column `V2` joining factors with different levels, coercing to
## character vector
## Warning: Column `V3` joining factors with different levels, coercing to
## character vector
## Warning: Column `V4` joining factors with different levels, coercing to
## character vector
## Warning: Column `V5` joining factors with different levels, coercing to
## character vector
## Warning: Column `V6` joining factors with different levels, coercing to
## character vector
## Warning: Column `V1` joining character vector and factor, coercing into
## character vector
## Warning: Column `V2` joining character vector and factor, coercing into
## character vector
## Warning: Column `V3` joining character vector and factor, coercing into
## character vector
## Warning: Column `V4` joining character vector and factor, coercing into
## character vector
## Warning: Column `V5` joining character vector and factor, coercing into
## character vector
## Warning: Column `V6` joining character vector and factor, coercing into
## character vector
I used the union function from dplyr to combine the data tables together. I had to call the first one to unite table A and table B and call the second union function to combin table C to the result of table A and table B union. I fixed the header titles for ease of use.
# ref: [REA]
write.csv(convoyDF,"Elizes_Project2a.csv")
convoyDF <- read.csv(file="Elizes_Project2a.csv",header=TRUE, sep=",")
convoyDF$X <- NULL
Per project requirements, I save the file to csv and loaded the file from csv. Upon creation of the csv file, a placeholder columns X was added. I had to eliminate it after successful creation of the data frame.
convoyDF <- separate(convoyDF,Route, c("RTStart","RTEnd"), sep = " to ", remove = TRUE)
convoyDF <- convoyDF[order(convoyDF$Type, convoyDF$RTStart),]
convoyDF %>%
kable() %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive")) %>%
scroll_box(width="100%",height="300px")
CodePrefix | RTStart | RTEnd | FirstSailing | LastSailing | Convoy | Notes | Type | |
---|---|---|---|---|---|---|---|---|
87 | BB | Belfast or River Clyde | Bristol Channel | 1940 from Belfast - 1945 from Clyde | 1943 from Belfast - 1945 from Clyde | European Coastal Atlantic | ||
127 | HN | Bergen | Methil, Fife | 1939 | 1940 | European Coastal Atlantic | ||
47 | BTC | Bristol Channel | River Thames | 1944 | 1945 | European Coastal Atlantic | ||
132 | BC | Bristol Channel | Bay of Biscay | outward and return convoys used same number | European Coastal Atlantic | |||
29 | NP | British Isles | Norway | April 1940 | May 1940 | troop convoys | European Coastal Atlantic | |
31 | TM | British Isles | Norway | April 1940 | May 1940 | troop convoys | European Coastal Atlantic | |
93 | FP | British Isles | Norway | April 1940 | May 1940 | troop convoys | European Coastal Atlantic | |
124 | GREYBACK | Dieppe, Seine-Maritime | Newhaven, East Sussex | ferry service | European Coastal Atlantic | |||
16 | DB | Dikson Island | White Sea | 1942 | European Coastal Atlantic | |||
26 | FD | Faroe Islands | River Clyde | military ferry | European Coastal Atlantic | |||
136 | FS | Firth of Forth | River Thames | 1939 | 1945 | European Coastal Atlantic | ||
65 | GS | Grimsby | Southend-on-Sea | 1940 | 1940 | European Coastal Atlantic | ||
14 | HM | Holyhead | Milford Haven | European Coastal Atlantic | ||||
73 | SD | Iceland | River Clyde | military ferry service | European Coastal Atlantic | |||
62 | KP | Kola Inlet | Pechengsky District | European Coastal Atlantic | ||||
69 | KB | Kola Inlet | White Sea | European Coastal Atlantic | ||||
6 | OB | Liverpool | the Atlantic Ocean | 7 September 1939 | 21 July 1941 | 345 | merged with OA convoy in the southwest approaches - ON and OS convoys replaced OB convoys | European Coastal Atlantic |
109 | UR | Loch Ewe (later Belfast) | ReykjavĂk | European Coastal Atlantic | ||||
45 | ON | Methil, Fife | Bergen | 1939 | 1940 | European Coastal Atlantic | ||
76 | EN | Methil, Fife | Oban via Loch Ewe | temporarily replaced by EC convoys during 1941 | European Coastal Atlantic | |||
96 | MT | Methil, Fife | River Tyne | European Coastal Atlantic | ||||
48 | MH | Milford Haven | Holyhead | European Coastal Atlantic | ||||
137 | SILVERTIP | Newhaven, East Sussex | Dieppe, Seine-Maritime | ferry service | European Coastal Atlantic | |||
123 | TP | Norway | British Isles | May 1940 | May 1940 | troop convoys | European Coastal Atlantic | |
8 | PW | Portsmouth | Wales | European Coastal Atlantic | ||||
79 | DS | River Clyde | ReykjavĂk | military ferry service | European Coastal Atlantic | |||
82 | DF | River Clyde | Faroe Islands | military ferry service | European Coastal Atlantic | |||
114 | WN | River Clyde, Oban and Loch Ewe | Firth of Forth | European Coastal Atlantic | ||||
1 | TBC | River Thames | Bristol Channel | 1944 | 1945 | European Coastal Atlantic | ||
78 | FN | River Thames | Firth of Forth | European Coastal Atlantic | ||||
53 | OA | River Thames (or Methil, Fife after July 1940) | Liverpool | 7 September 1939 | 24 October 1940 | 234 | merged with OB convoy in the southwest approaches | European Coastal Atlantic |
34 | CW | Southend-on-Sea | St. Helens Roads | European Coastal Atlantic | ||||
44 | EC | Southend-on-Sea | Oban via Firth of Forth | 1941 | 1941 | temporary substitution for EN convoys | European Coastal Atlantic | |
102 | SG | Southend-on-Sea | Grimsby | 1940 | 1940 | European Coastal Atlantic | ||
92 | CE | St. Helens Roads | Southend-on-Sea | European Coastal Atlantic | ||||
49 | WP | Wales | Portsmouth | European Coastal Atlantic | ||||
37 | HXA | Western Approaches | English Channel | 1939 | 1945 | English channel section of same numbered HX convoys - no sailings from 1940 to 1944 | European Coastal Atlantic | |
97 | BD | White Sea | Dikson Island | September 1943 | European Coastal Atlantic | |||
98 | BK | White Sea | Kola Inlet | Summer 1941 | European Coastal Atlantic | |||
56 | AH | Aruba | Halifax Harbour | July 1942 | September 1942 | a brief tanker series | North American | |
81 | AW | Aruba | Curaçao | local tanker traffic | North American | |||
117 | ARG | Boston | Argentia, Newfoundland | USN convoys | North American | |||
130 | BX | Boston | Halifax Harbour | 21 March 1942 | 22 May 1945 | North American | ||
38 | BS | Corner Brook, Newfoundland | Sydney, Nova Scotia | a brief series | North American | |||
2 | CP | Curaçao | Panama Canal | 1942 | 1942 | North American | ||
43 | WA | Curaçao | Aruba | North American | ||||
120 | CZ | Curaçao | Panama Canal | North American | ||||
99 | GM | Galveston, Texas | Mississippi River | North American | ||||
25 | GS | Greenland | Sydney, Nova Scotia or St. John’s, Newfoundland | USN escorted convoys | North American | |||
20 | GP | Guantanamo Bay Naval Base | Panama Canal | North American | ||||
28 | GN | Guantanamo Bay Naval Base | New York City | North American | ||||
54 | GAT | Guantanamo Bay Naval Base | Trinidad via Aruba | North American | ||||
57 | GK | Guantanamo Bay Naval Base | Key West, Florida | North American | ||||
85 | GZ | Guantanamo Bay Naval Base | Panama Canal | 1942 | North American | |||
103 | G | Guantanamo Bay Naval Base | Puerto Rico | North American | ||||
107 | GJ | Guantanamo Bay Naval Base | Kingston, Jamaica | North American | ||||
5 | HA | Halifax Harbour | Curaçao | 1942 | September 1942 | replaced HT convoys | North American | |
22 | HS | Halifax Harbour | Sydney, Nova Scotia | North American | ||||
41 | HT | Halifax Harbour | Trinidad | May 1942 | 1942 | replaced by HA convoys | North American | |
50 | HHX | Halifax Harbour | meet HX convoys originating in New York City at the Halifax Ocean Meeting Point (HOMP) | North American | ||||
89 | HON | Halifax Harbour | ON convoys at the Halifax Ocean Meeting Point (HOMP) | North American | ||||
104 | HF | Halifax Harbour | Saint John, New Brunswick | North American | ||||
131 | XB | Halifax Harbour | Boston | 18 March 1942 | North American | |||
139 | HJ | Halifax Harbour | St. John’s, Newfoundland | North American | ||||
116 | CK | Havana, Cuba | Key West, Florida | North American | ||||
110 | HK | Houston, Texas | Key West, Florida | North American | ||||
33 | WAT | Key West | Trinidad via Curaçao | North American | ||||
13 | KW | Key West, Florida | Havana, Cuba | North American | ||||
21 | KG | Key West, Florida | Guantanamo Bay Naval Base | North American | ||||
61 | KP | Key West, Florida | Mississippi River | North American | ||||
67 | KH | Key West, Florida | Houston, Texas | North American | ||||
134 | KN | Key West, Florida | New York City | North American | ||||
122 | JG | Kingston, Jamaica | Guantanamo Bay Naval Base | North American | ||||
10 | NL | Labrador | St. Lawrence River | North American | ||||
63 | PK | Mississippi River | Key West, Florida | North American | ||||
95 | MG | Mississippi River | Galveston, Texas | North American | ||||
11 | KS | New York City | Key West, Florida | North American | ||||
27 | NG | New York City | Guantanamo Bay Naval Base | North American | ||||
101 | NK | New York City | Key West, Florida | North American | ||||
115 | NJ | Newfoundland (island) coast | St. John’s, Newfoundland | North American | ||||
23 | ZC | Panama Canal | Curaçao | North American | ||||
77 | CW | Panama Canal | Key West, Florida | North American | ||||
94 | PG | Panama Canal | Guantanamo Bay Naval Base | 1942 | 1942 | NA | North American | |
140 | ZG | Panama Canal | Guantanamo Bay Naval Base | North American | ||||
121 | QS | Quebec | Sydney, Nova Scotia | North American | ||||
24 | FH | Saint John, New Brunswick | Halifax Harbour | late 1942 | North American | |||
39 | SJ | San Juan, Puerto Rico | Guantanamo Bay Naval Base | North American | ||||
83 | JH | St. John’s, Newfoundland | Halifax Harbour | North American | ||||
113 | JN | St. John’s, Newfoundland | Labrador | North American | ||||
126 | CL | St. John’s, Newfoundland | Sydney, Nova Scotia | North American | ||||
55 | LN | St. Lawrence River | Labrador | North American | ||||
17 | SB | Sydney, Nova Scotia | Corner Brook, Newfoundland | 1942 | North American | |||
84 | SQ | Sydney, Nova Scotia | Quebec | North American | ||||
86 | SHX | Sydney, Nova Scotia | HX convoys | North American | ||||
90 | LC | Sydney, Nova Scotia | St. John’s, Newfoundland | North American | ||||
91 | BW | Sydney, Nova Scotia | St. John’s, Newfoundland | 1942 | North American | |||
133 | SH | Sydney, Nova Scotia | Halifax Harbour | North American | ||||
7 | SG | Sydney, Nova Scotia (later St. John’s, Newfoundland) | Greenland | 1942 | 1942 | USN escorted convoys | North American | |
35 | TO | Trinidad | Curaçao | NA | North American | |||
74 | TH | Trinidad | Halifax Harbour | 1942 | 1942 | North American | ||
119 | TAW | Trinidad | Key West via Curaçao | 1942 | 1942 | North American | ||
129 | TG | Trinidad | Guantanamo Bay Naval Base | North American | ||||
135 | TAG | Trinidad | Guantanamo Bay Naval Base via Aruba | 1942 | North American | |||
128 | WS | Wabana, Newfoundland and Labrador | Sydney, Nova Scotia | North American | ||||
111 | BHX | Bermuda | Liverpool | May 1940 | March 1941 | 97 (# 41-137) | sailed from Bermuda and merged with same number HX convoy at sea | North Atlantic |
4 | TU | British Isles | United States | 1943 | 1944 | troopships | North Atlantic | |
58 | TA | British Isles | United States | large troopships | North Atlantic | |||
70 | CT | British Isles | Canada | 1941 | 1941 | troopships | North Atlantic | |
51 | TC | Canada | British Isles | 1939 | 1941 | troopships carrying Canadian troops | North Atlantic | |
138 | NA | Canada | British Isles | troopships | North Atlantic | |||
59 | TCU | Caribbean (later New York City) | Liverpool | 14-knot CU convoys of tankers and fast cargo ships with troopships | North Atlantic | |||
80 | CU | Caribbean (later New York City) | Liverpool | 20 March 1943 | 30 May 1945 | 73 | 14-knot convoys of tankers with some fast cargo ships | North Atlantic |
40 | CK | Charleston, South Carolina | British Isles | 1944 | 1944 | rarely used | North Atlantic | |
12 | UGS | Chesapeake Bay | Mediterranean | 13 November 1942 | 28 May 1945 | 95 | slower ships | North Atlantic |
60 | UGF | Chesapeake Bay | Mediterranean | 24 October 1942 | 8 April 1945 | 22 | faster ships - (UGF-1) was the invasion force for Operation Torch | North Atlantic |
72 | KMF | Firth of Clyde | Mediterranean | 26 October 1942 | 23 May 1945 | 45 | faster ships to the Mediterranean | North Atlantic |
100 | HG | Gibraltar | Liverpool | 26 September 1939 | 19 September 1942 | 89 | replaced by MKS convoys after Operation Torch | North Atlantic |
64 | HXF | Halifax Harbour | Liverpool | 29 Sept 1939 | 12 February 1940 | 17 | fast sections of HX convoys | North Atlantic |
125 | HX | Halifax Harbour (later New York City) | Liverpool | 16 Sept 1939 | 23 May 1945 | 377 | 9-knot convoys for ships of sustained speeds less than 15 knots | North Atlantic |
18 | JW | Iceland | White Sea | 25 December 1942 | 12 May 1945 | 17 (# 51-67) | replaced PQ convoys | North Atlantic |
19 | PQ | Iceland | White Sea | 29 September 1941 | 2 September 1942 | 18 | replaced by JW convoys | North Atlantic |
3 | KJ | Kingston, Jamaica | United Kingdom | North Atlantic | ||||
30 | ON | Liverpool | Halifax Harbour | 26 July 1941 | 27 May 1945 | 307 | replaced OB convoys for North American destinations - alternate convoys included slower ships until the ONS convoys started | North Atlantic |
32 | OS | Liverpool | Sierra Leone | 24 July 1941 | 27 May 1945 | 131 | replaced OB convoys for non-North American destinations - included KMS convoys detached west of Gibraltar | North Atlantic |
42 | TUC | Liverpool | Caribbean (later New York City) | 14-knot UC convoys of tankers and fast cargo ships with some troopships | North Atlantic | |||
75 | KMS | Liverpool | Mediterranean | 22 October 1942 | 27 April 1945 | 98 | slower ships to the Mediterranean - 1st 12 sailed independently - remainder sailed with OS convoys and detached west of Gibraltar | North Atlantic |
88 | OG | Liverpool | Gibraltar | 2 October 1939 | 17 October 1943 | 95 | early sailings every 5th merged OA/OB convoy became an OG convoy at sea - later OG convoys sailed from Liverpool | North Atlantic |
108 | UC | Liverpool | Caribbean (later New York City) | 15 February 1943 | 3 June 1945 | 71 | 14-knot convoys of tankers with some fast cargo ships | North Atlantic |
118 | ONS | Liverpool | Halifax Harbour | 15 March 1943 | 21 May 1945 | 51 | slower ships westbound on the ON convoy route | North Atlantic |
15 | GUF | Mediterranean | Chesapeake Bay | 29 November 1942 | 16 April 1945 | 22 | faster ships | North Atlantic |
46 | MKF | Mediterranean | Firth of Clyde or Liverpool | 12 November 1942 | 4 June 1945 | 45 | faster ships from the Mediterranean | North Atlantic |
71 | MKS | Mediterranean | Liverpool | 12 November 1942 | 25 May 1945 | 103 | slower ships from the Mediterranean- 1st 11 sailed independently - remainder merged with SL convoys west of Gibraltar | North Atlantic |
106 | GUS | Mediterranean | Chesapeake Bay | 21 December 1942 | 27 May 1945 | 92 | slower ships | North Atlantic |
112 | SL | Sierra Leone | Liverpool | 14 September 1939 | 25 November 1944 | 178 | merged with MKS convoys west of Gibraltar | North Atlantic |
36 | SC | Sydney, Nova Scotia (or Halifax Harbour or New York City) | Liverpool | 15 August 1940 | 26 May 1945 | 177 | 7-knot convoys of eastbound ships too slow for the 9-knot HX convoys | North Atlantic |
9 | AT | United States | British Isles | March 1942 | 1945 | troopships | North Atlantic | |
66 | UT | United States | British Isles | 1943 | 1944 | troopships | North Atlantic | |
68 | RB | United States | British Isles | September 1942 | September 1942 | 1 | small passenger steamers | North Atlantic |
52 | QP | White Sea | Iceland | 28 September 1941 | 17 November 1942 | 15 | replaced by RA convoys | North Atlantic |
105 | RA | White Sea | Scotland | 30 December 1942 | 23 May 1945 | 17 (# 51-67) | replaced QP convoys | North Atlantic |
I did not answer this question. There were too many Origin and Destination ports. A country could have multiple ports with the same name. Since I do not know the port names and the associated country they belong to it would not be feasible for this project. Moreover, guessing would not improve data accuracy.
defineNumConvoy <- function(convoyStr) {
numConvoy <- 1
if (is.na(convoyStr) || convoyStr == "")
numConvoy <- 1
else
if (str_detect(convoyStr,"\\(")== TRUE) {
numConvoy <- str_extract(convoyStr,"\\d*")
numConvoy <- str_trim(numConvoy)
numConvoy <- as.numeric(gsub("[\\$,]","", numConvoy))
}
else
numConvoy <- as.numeric(gsub("[\\$,]","", convoyStr))
return(numConvoy)
}
convoyDF$NumConvoy <- mapply(defineNumConvoy,convoyDF$Convoy)
convoyDFsub2 <- aggregate(convoyDF$NumConvoy, by=list(Region = convoyDF$RTStart), FUN=sum)
names(convoyDFsub2) <- c("RTStart","NumConvoy")
convoyDFsub2 <- convoyDFsub2[order(-convoyDFsub2$NumConvoy),]
convoyDFsub2 <- convoyDFsub2[1:10,]
convoyDFsub2 %>%
kable() %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive")) %>%
scroll_box(width="100%",height="150px")
RTStart | NumConvoy | |
---|---|---|
35 | Liverpool | 1099 |
25 | Halifax Harbour (later New York City) | 377 |
37 | Mediterranean | 262 |
51 | River Thames (or Methil, Fife after July 1940) | 234 |
54 | Sierra Leone | 178 |
61 | Sydney, Nova Scotia (or Halifax Harbour or New York City) | 177 |
11 | Chesapeake Bay | 117 |
4 | Bermuda | 97 |
20 | Gibraltar | 89 |
9 | Caribbean (later New York City) | 74 |
ggplot(data = convoyDFsub2, aes(x=reorder(RTStart,-NumConvoy), y=NumConvoy)) +
geom_bar(stat = "identity", position = "dodge", fill = "red") +
geom_text(aes(label=NumConvoy), vjust=1.6, color="blue", position = position_dodge(0.9), size=3.5) +
scale_fill_brewer(palette="Paired") +
ggtitle("Total Number of Convoys by Origin Port during WW2") +
theme(axis.text.x=element_text(angle = 45, vjust = 0.5)) +
labs(x = "Origin Port", y = "Total Number of Convoys")
convoyDFsub3 <- aggregate(convoyDF$NumConvoy, by=list(Region = convoyDF$RTEnd), FUN=sum)
names(convoyDFsub3) <- c("RTEnd","NumConvoy")
convoyDFsub3 <- convoyDFsub3[order(-convoyDFsub3$NumConvoy),]
convoyDFsub3 <- convoyDFsub3[1:10,]
convoyDFsub3 %>%
kable() %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive")) %>%
scroll_box(width="100%",height="150px")
RTEnd | NumConvoy | |
---|---|---|
36 | Liverpool | 1346 |
25 | Halifax Harbour | 364 |
66 | the Atlantic Ocean | 345 |
37 | Mediterranean | 260 |
59 | Sierra Leone | 131 |
10 | Chesapeake Bay | 114 |
20 | Gibraltar | 95 |
9 | Caribbean (later New York City) | 72 |
17 | Firth of Clyde or Liverpool | 45 |
73 | White Sea | 37 |
ggplot(data = convoyDFsub3, aes(x=reorder(RTEnd,-NumConvoy), y=NumConvoy)) +
geom_bar(stat = "identity", position = "dodge", fill = "red") +
geom_text(aes(label=NumConvoy), vjust=1.6, color="blue", position = position_dodge(0.9), size=3.5) +
scale_fill_brewer(palette="Paired") +
ggtitle("Total Number of Convoys by Destination Port during WW2") +
theme(axis.text.x=element_text(angle = 45, vjust = 0.5)) +
labs(x = "Destination Port", y = "Total Number of Convoys")
To answer this question, the field containing the Number of Convoys was not in a state to be used properly. Some field values were blank or NA. Other field values contained a parentheses with the number. I wrote a function that would default the number of convoys to 1, extract the numeric value from the field especially if it has parentheses, or ensure the value is numeric. After this I was able to retrieve the total number of convoys by Origin Port and by Destination Port using the aggregate function. Finally, I created a Bar Chart that showed the top 10 convoys by Origin Port and by Destination Port.
Based on the Bar Chart display, the city of Liverpool has the most number of convoys for Departures and Arrivals at 1099 and 1346 respectively. A distanct second is Halifax Harbour (which eventually becomes New York City). The very fact that Liverpool is the Number 1 Destination Port may indicate that the Allied U-Boat screens leading into England are very effective in protecting shipping going into and out of Liverpool.
convoyDFsub1 <- aggregate(convoyDF$NumConvoy, by=list(Region = convoyDF$Type), FUN=sum)
names(convoyDFsub1) <- c("Region","NumConvoy")
convoyDFsub1 %>%
kable() %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive")) %>%
scroll_box(width="100%",height="150px")
Region | NumConvoy |
---|---|
European Coastal Atlantic | 616 |
North American | 65 |
North Atlantic | 2264 |
ggplot(data = convoyDFsub1, aes(x=Region, y=NumConvoy)) +
geom_bar(stat = "identity", position = "dodge", fill = "red") +
geom_text(aes(label=NumConvoy), vjust=1.6, color="blue", position = position_dodge(0.9), size=3.5) +
scale_fill_brewer(palette="Paired") +
ggtitle("Total Number of Convoys by Region during WW2") +
theme(axis.text.x=element_text(angle = 45, vjust = 0.5)) +
labs(x = "Region", y = "Total Number of Convoys")
Dividing the total number of convoys by Region was pretty straightforward because I followed the same logic as that when I determined the total number of convoys by Origin Ports and Destination Ports. The North Altanic region contains a large number of convoys at 2264. The second highest number of convoys is the European Coastal Atlantic at 616.
# ref: [SCR]
urlfile <- getURL("https://www.jma.go.jp/en/quake/quake_local_index.html",.opts = list(ssl.verifypeer = FALSE))
#parsed_html <- htmlParse(urlfile, encoding = "UTF-8")
#summary(parsed_html)
tables <- readHTMLTable(urlfile)
tables <- list.clean(tables, fun = is.null, recursive = FALSE)
jpEQDF <- tables[[2]]
names(jpEQDF) <- c("ObservedDate", "Prefecture", "Magnitude", "MaxSeismic","ReportedDate")
I used readHTMLTable function to read and parse the file.
# ref: [REA]
write.csv(jpEQDF,"Elizes_Project2b.csv")
jpEQDF <- read.csv(file="Elizes_Project2b.csv",header=TRUE, sep=",")
jpEQDF$X <- NULL
jpEQDF %>%
kable() %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive")) %>%
scroll_box(width="100%",height="300px")
ObservedDate | Prefecture | Magnitude | MaxSeismic | ReportedDate |
---|---|---|---|---|
02:13 JST 04 Oct 2018 | Ibaraki-ken Oki | M4.1 | 2 | 02:17 JST 04 Oct 2018 |
00:15 JST 04 Oct 2018 | Chiba-ken Toho-oki | M4.6 | 4 | 00:19 JST 04 Oct 2018 |
22:57 JST 03 Oct 2018 | Kumamoto-ken Kumamoto-chiho | M2.5 | 2 | 23:00 JST 03 Oct 2018 |
21:04 JST 03 Oct 2018 | Iburi-chiho Chutobu | M3.0 | 1 | 21:06 JST 03 Oct 2018 |
18:11 JST 03 Oct 2018 | Iburi-chiho Chutobu | M2.8 | 1 | 18:14 JST 03 Oct 2018 |
10:38 JST 03 Oct 2018 | Wakayama-ken Hokubu | M2.4 | 1 | 10:42 JST 03 Oct 2018 |
00:32 JST 03 Oct 2018 | Gifu-ken Mino-chuseibu | M2.9 | 1 | 00:35 JST 03 Oct 2018 |
19:25 JST 02 Oct 2018 | Fukushima-ken Oki | M4.1 | 1 | 19:28 JST 02 Oct 2018 |
16:12 JST 02 Oct 2018 | Nagano-ken Hokubu | M2.4 | 1 | 16:14 JST 02 Oct 2018 |
12:29 JST 02 Oct 2018 | Ibaraki-ken Oki | M3.9 | 1 | 12:34 JST 02 Oct 2018 |
12:28 JST 02 Oct 2018 | Ibaraki-ken Oki | M4.7 | 3 | 12:31 JST 02 Oct 2018 |
05:03 JST 02 Oct 2018 | Iburi-chiho Chutobu | M3.0 | 1 | 05:06 JST 02 Oct 2018 |
04:29 JST 02 Oct 2018 | Iwate-ken Oki | M4.3 | 1 | 04:32 JST 02 Oct 2018 |
23:21 JST 01 Oct 2018 | Iburi-chiho Chutobu | M3.3 | 1 | 23:24 JST 01 Oct 2018 |
21:11 JST 01 Oct 2018 | Iwate-ken Oki | M3.6 | 2 | 21:14 JST 01 Oct 2018 |
17:25 JST 01 Oct 2018 | Iburi-chiho Chutobu | M2.8 | 1 | 17:28 JST 01 Oct 2018 |
16:37 JST 01 Oct 2018 | Iburi-chiho Chutobu | M3.2 | 1 | 16:39 JST 01 Oct 2018 |
11:22 JST 01 Oct 2018 | Iburi-chiho Chutobu | M4.9 | 4 | 11:25 JST 01 Oct 2018 |
03:42 JST 01 Oct 2018 | Kyoto-fu Nambu | M3.2 | 1 | 03:45 JST 01 Oct 2018 |
03:32 JST 01 Oct 2018 | Iburi-chiho Chutobu | M3.2 | 1 | 03:35 JST 01 Oct 2018 |
01:50 JST 01 Oct 2018 | Iburi-chiho Chutobu | M2.7 | 1 | 01:53 JST 01 Oct 2018 |
19:20 JST 30 Sep 2018 | Iburi-chiho Chutobu | M3.0 | 1 | 19:23 JST 30 Sep 2018 |
17:58 JST 30 Sep 2018 | Iburi-chiho Chutobu | M3.0 | 1 | 18:01 JST 30 Sep 2018 |
17:54 JST 30 Sep 2018 | Iburi-chiho Chutobu | M5.0 | 4 | 17:57 JST 30 Sep 2018 |
14:04 JST 30 Sep 2018 | Iburi-chiho Chutobu | M2.7 | 1 | 14:06 JST 30 Sep 2018 |
20:56 JST 29 Sep 2018 | Chiba-ken Toho-oki | M3.7 | 2 | 20:59 JST 29 Sep 2018 |
18:26 JST 29 Sep 2018 | Iburi-chiho Chutobu | M4.2 | 3 | 18:29 JST 29 Sep 2018 |
23:03 JST 28 Sep 2018 | Kumamoto-ken Kumamoto-chiho | M2.6 | 2 | 23:05 JST 28 Sep 2018 |
22:50 JST 28 Sep 2018 | Ibaraki-ken Oki | M3.5 | 1 | 22:53 JST 28 Sep 2018 |
18:57 JST 28 Sep 2018 | Izu-ohshima Kinkai | M3.1 | 1 | 19:00 JST 28 Sep 2018 |
11:50 JST 28 Sep 2018 | Kii-suido | M2.4 | 1 | 11:53 JST 28 Sep 2018 |
06:05 JST 28 Sep 2018 | Iburi-chiho Chutobu | M2.4 | 1 | 06:09 JST 28 Sep 2018 |
04:32 JST 28 Sep 2018 | Fukushima-ken Oki | M4.0 | 3 | 04:36 JST 28 Sep 2018 |
16:24 JST 27 Sep 2018 | Iburi-chiho Chutobu | M2.7 | 1 | 16:27 JST 27 Sep 2018 |
07:54 JST 27 Sep 2018 | Nagano-ken Hokubu | M2.3 | 2 | 07:57 JST 27 Sep 2018 |
06:14 JST 27 Sep 2018 | Iburi-chiho Chutobu | M3.4 | 2 | 06:17 JST 27 Sep 2018 |
03:43 JST 27 Sep 2018 | Kumamoto-ken Kumamoto-chiho | M3.0 | 2 | 03:46 JST 27 Sep 2018 |
Per project requirements, I save the file to csv and loaded the file from csv. Upon creation of the csv file, a placeholder columns X was added. I had to eliminate it after successful creation of the data frame.
jpEQDF$Magnitude <- substring(jpEQDF$Magnitude,2)
jpEQDF$Magnitude <- as.numeric(gsub("[\\$,]","", jpEQDF$Magnitude))
jpEQDF$MaxSeismic <- as.numeric(gsub("[\\$,]","", jpEQDF$MaxSeismic))
# ref: [SCA]
plot(jpEQDF$Magnitude, jpEQDF$MaxSeismic, main="Japanese Earthquakes during Last Week", xlab="Magnitude", ylab="Maximimum Seismic Intensity", pch=19)
abline(lm(jpEQDF$MaxSeismic~jpEQDF$Magnitude), col="red") # regression line
After creating the data frame to house all recent Japanese earthquakes, I tidied up Magnitude and MaxSeismic fields by making sure that they were numeric. In order to define the relationship between Magnitude and Maximum Seisimic Intensity, I developed a Scatter Diagram that shows this relationship.
jpEQDFsub1 <- jpEQDF %>%
count(Prefecture, sort = TRUE)
names(jpEQDFsub1) <- c("Prefecture", "Total Earthquakes")
jpEQDFsub1 %>%
kable() %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive")) %>%
scroll_box(width="100%",height="300px")
Prefecture | Total Earthquakes |
---|---|
Iburi-chiho Chutobu | 17 |
Ibaraki-ken Oki | 4 |
Kumamoto-ken Kumamoto-chiho | 3 |
Chiba-ken Toho-oki | 2 |
Fukushima-ken Oki | 2 |
Iwate-ken Oki | 2 |
Nagano-ken Hokubu | 2 |
Gifu-ken Mino-chuseibu | 1 |
Izu-ohshima Kinkai | 1 |
Kii-suido | 1 |
Kyoto-fu Nambu | 1 |
Wakayama-ken Hokubu | 1 |
I did not create a graph for this, but instead used dplyr wrangling to tidy up the data so that I could have only a list of Prefectures and Total Earthquakes for the week. I used kable package to display the values in a nice HTML format. Thanks Teammate Soumya!
# ref: [SCR]
urlfile <- getURL("https://www.boxofficemojo.com/alltime/world/",.opts = list(ssl.verifypeer = FALSE))
#parsed_html <- htmlParse(urlfile, encoding = "UTF-8")
#summary(parsed_html)
tables <- readHTMLTable(urlfile)
tables <- list.clean(tables, fun = is.null, recursive = FALSE)
moviesDF <- as.data.frame(tables[[2]])
names(moviesDF) <- c("Rank", "Title","Studio","Worldwide","Domestic","DomesticPct","Overseas","OverseasPct","Year")
I used readHTMLTable function to read and parse the file.
# ref: [REA]
write.csv(moviesDF,"Elizes_Project2c.csv")
moviesDF <- read.csv(file="Elizes_Project2c.csv",header=TRUE, sep=",")
moviesDF$X <- NULL
moviesDF %>%
kable() %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive")) %>%
scroll_box(width="100%",height="300px")
Rank | Title | Studio | Worldwide | Domestic | DomesticPct | Overseas | OverseasPct | Year |
---|---|---|---|---|---|---|---|---|
1 | Avatar | Fox | $2,788.0 | $760.5 | 27.3% | $2,027.5 | 72.7% | 2009^ |
2 | Titanic | Par. | $2,187.5 | $659.4 | 30.1% | $1,528.1 | 69.9% | 1997^ |
3 | Star Wars: The Force Awakens | BV | $2,068.2 | $936.7 | 45.3% | $1,131.6 | 54.7% | 2015 |
4 | Avengers: Infinity War | BV | $2,046.5 | $678.8 | 33.2% | $1,367.7 | 66.8% | 2018 |
5 | Jurassic World | Uni. | $1,671.7 | $652.3 | 39.0% | $1,019.4 | 61.0% | 2015 |
6 | Marvel’s The Avengers | BV | $1,518.8 | $623.4 | 41.0% | $895.5 | 59.0% | 2012 |
7 | Furious 7 | Uni. | $1,516.0 | $353.0 | 23.3% | $1,163.0 | 76.7% | 2015 |
8 | Avengers: Age of Ultron | BV | $1,405.4 | $459.0 | 32.7% | $946.4 | 67.3% | 2015 |
9 | Black Panther | BV | $1,346.9 | $700.1 | 52.0% | $646.8 | 48.0% | 2018 |
10 | Harry Potter and the Deathly Hallows Part 2 | WB | $1,341.5 | $381.0 | 28.4% | $960.5 | 71.6% | 2011 |
11 | Star Wars: The Last Jedi | BV | $1,332.5 | $620.2 | 46.5% | $712.4 | 53.5% | 2017 |
12 | Jurassic World: Fallen Kingdom | Uni. | $1,303.8 | $416.7 | 32.0% | $887.1 | 68.0% | 2018 |
13 | Frozen | BV | $1,276.5 | $400.7 | 31.4% | $875.7 | 68.6% | 2013 |
14 | Beauty and the Beast (2017) | BV | $1,263.5 | $504.0 | 39.9% | $759.5 | 60.1% | 2017 |
15 | The Fate of the Furious | Uni. | $1,236.0 | $226.0 | 18.3% | $1,010.0 | 81.7% | 2017 |
16 | Iron Man 3 | BV | $1,214.8 | $409.0 | 33.7% | $805.8 | 66.3% | 2013 |
17 | Incredibles 2 | BV | $1,209.2 | $606.8 | 50.2% | $602.4 | 49.8% | 2018 |
18 | Minions | Uni. | $1,159.4 | $336.0 | 29.0% | $823.4 | 71.0% | 2015 |
19 | Captain America: Civil War | BV | $1,153.3 | $408.1 | 35.4% | $745.2 | 64.6% | 2016 |
20 | Transformers: Dark of the Moon | P/DW | $1,123.8 | $352.4 | 31.4% | $771.4 | 68.6% | 2011 |
21 | The Lord of the Rings: The Return of the King | NL | $1,119.9 | $377.8 | 33.7% | $742.1 | 66.3% | 2003^ |
22 | Skyfall | Sony | $1,108.6 | $304.4 | 27.5% | $804.2 | 72.5% | 2012 |
23 | Transformers: Age of Extinction | Par. | $1,104.1 | $245.4 | 22.2% | $858.6 | 77.8% | 2014 |
24 | The Dark Knight Rises | WB | $1,084.9 | $448.1 | 41.3% | $636.8 | 58.7% | 2012 |
25 | Toy Story 3 | BV | $1,067.0 | $415.0 | 38.9% | $652.0 | 61.1% | 2010 |
26 | Pirates of the Caribbean: Dead Man’s Chest | BV | $1,066.2 | $423.3 | 39.7% | $642.9 | 60.3% | 2006 |
27 | Rogue One: A Star Wars Story | BV | $1,056.1 | $532.2 | 50.4% | $523.9 | 49.6% | 2016 |
28 | Pirates of the Caribbean: On Stranger Tides | BV | $1,045.7 | $241.1 | 23.1% | $804.6 | 76.9% | 2011 |
29 | Despicable Me 3 | Uni. | $1,034.8 | $264.6 | 25.6% | $770.2 | 74.4% | 2017 |
30 | Jurassic Park | Uni. | $1,029.2 | $402.5 | 39.1% | $626.7 | 60.9% | 1993^ |
31 | Finding Dory | BV | $1,028.6 | $486.3 | 47.3% | $542.3 | 52.7% | 2016 |
32 | Star Wars: Episode I - The Phantom Menace | Fox | $1,027.0 | $474.5 | 46.2% | $552.5 | 53.8% | 1999^ |
33 | Alice in Wonderland (2010) | BV | $1,025.5 | $334.2 | 32.6% | $691.3 | 67.4% | 2010 |
34 | Zootopia | BV | $1,023.8 | $341.3 | 33.3% | $682.5 | 66.7% | 2016 |
35 | The Hobbit: An Unexpected Journey | WB (NL) | $1,021.1 | $303.0 | 29.7% | $718.1 | 70.3% | 2012 |
36 | The Dark Knight | WB | $1,004.6 | $534.9 | 53.2% | $469.7 | 46.8% | 2008^ |
37 | Harry Potter and the Sorcerer’s Stone | WB | $974.8 | $317.6 | 32.6% | $657.2 | 67.4% | 2001 |
38 | Despicable Me 2 | Uni. | $970.8 | $368.1 | 37.9% | $602.7 | 62.1% | 2013 |
39 | The Lion King | BV | $968.5 | $422.8 | 43.7% | $545.7 | 56.3% | 1994^ |
40 | The Jungle Book (2016) | BV | $966.6 | $364.0 | 37.7% | $602.5 | 62.3% | 2016 |
41 | Pirates of the Caribbean: At World’s End | BV | $963.4 | $309.4 | 32.1% | $654.0 | 67.9% | 2007 |
42 | Jumanji: Welcome to the Jungle | Sony | $962.1 | $404.5 | 42.0% | $557.6 | 58.0% | 2017 |
43 | Harry Potter and the Deathly Hallows Part 1 | WB | $960.3 | $296.0 | 30.8% | $664.3 | 69.2% | 2010 |
44 | The Hobbit: The Desolation of Smaug | WB (NL) | $958.4 | $258.4 | 27.0% | $700.0 | 73.0% | 2013 |
45 | The Hobbit: The Battle of the Five Armies | WB (NL) | $956.0 | $255.1 | 26.7% | $700.9 | 73.3% | 2014 |
46 | Finding Nemo | BV | $940.3 | $380.8 | 40.5% | $559.5 | 59.5% | 2003^ |
47 | Harry Potter and the Order of the Phoenix | WB | $939.9 | $292.0 | 31.1% | $647.9 | 68.9% | 2007 |
48 | Harry Potter and the Half-Blood Prince | WB | $934.4 | $302.0 | 32.3% | $632.5 | 67.7% | 2009 |
49 | The Lord of the Rings: The Two Towers | NL | $926.0 | $342.6 | 37.0% | $583.5 | 63.0% | 2002^ |
50 | Shrek 2 | DW | $919.8 | $441.2 | 48.0% | $478.6 | 52.0% | 2004 |
51 | Harry Potter and the Goblet of Fire | WB | $896.9 | $290.0 | 32.3% | $606.9 | 67.7% | 2005 |
52 | Spider-Man 3 | Sony | $890.9 | $336.5 | 37.8% | $554.3 | 62.2% | 2007 |
53 | Ice Age: Dawn of the Dinosaurs | Fox | $886.7 | $196.6 | 22.2% | $690.1 | 77.8% | 2009 |
54 | Spectre | Sony | $880.7 | $200.1 | 22.7% | $680.6 | 77.3% | 2015 |
55 | Spider-Man: Homecoming | Sony | $880.2 | $334.2 | 38.0% | $546.0 | 62.0% | 2017 |
56 | Harry Potter and the Chamber of Secrets | WB | $879.0 | $262.0 | 29.8% | $617.0 | 70.2% | 2002 |
57 | Ice Age: Continental Drift | Fox | $877.2 | $161.3 | 18.4% | $715.9 | 81.6% | 2012 |
58 | The Secret Life of Pets | Uni. | $875.5 | $368.4 | 42.1% | $507.1 | 57.9% | 2016 |
59 | Batman v Superman: Dawn of Justice | WB | $873.6 | $330.4 | 37.8% | $543.3 | 62.2% | 2016 |
60 | The Lord of the Rings: The Fellowship of the Ring | NL | $871.5 | $315.5 | 36.2% | $556.0 | 63.8% | 2001^ |
61 | Wolf Warrior 2 | HC | $870.3 | $2.7 | 0.3% | $867.6 | 99.7% | 2017 |
62 | The Hunger Games: Catching Fire | LGF | $865.0 | $424.7 | 49.1% | $440.3 | 50.9% | 2013 |
63 | Guardians of the Galaxy Vol. 2 | BV | $863.8 | $389.8 | 45.1% | $473.9 | 54.9% | 2017 |
64 | Inside Out | BV | $857.6 | $356.5 | 41.6% | $501.1 | 58.4% | 2015 |
65 | Thor: Ragnarok | BV | $854.0 | $315.1 | 36.9% | $538.9 | 63.1% | 2017 |
66 | Star Wars: Episode III - Revenge of the Sith | Fox | $848.8 | $380.3 | 44.8% | $468.5 | 55.2% | 2005^ |
67 | Transformers: Revenge of the Fallen | P/DW | $836.3 | $402.1 | 48.1% | $434.2 | 51.9% | 2009 |
68 | The Twilight Saga: Breaking Dawn Part 2 | LG/S | $829.7 | $292.3 | 35.2% | $537.4 | 64.8% | 2012 |
69 | Inception | WB | $828.3 | $292.6 | 35.3% | $535.7 | 64.7% | 2010 |
70 | Wonder Woman | WB | $821.8 | $412.6 | 50.2% | $409.3 | 49.8% | 2017 |
71 | Spider-Man | Sony | $821.7 | $403.7 | 49.1% | $418.0 | 50.9% | 2002 |
72 | Independence Day | Fox | $817.4 | $306.2 | 37.5% | $511.2 | 62.5% | 1996^ |
73 | Fantastic Beasts and Where To Find Them | WB | $814.0 | $234.0 | 28.8% | $580.0 | 71.2% | 2016 |
74 | Coco | BV | $807.1 | $209.7 | 26.0% | $597.4 | 74.0% | 2017 |
75 | Shrek the Third | P/DW | $799.0 | $322.7 | 40.4% | $476.2 | 59.6% | 2007 |
76 | Harry Potter and the Prisoner of Azkaban | WB | $796.7 | $249.5 | 31.3% | $547.1 | 68.7% | 2004 |
77 | Pirates of the Caribbean: Dead Men Tell No Tales | BV | $794.9 | $172.6 | 21.7% | $622.3 | 78.3% | 2017 |
78 | E.T.: The Extra-Terrestrial | Uni. | $792.9 | $435.1 | 54.9% | $357.8 | 45.1% | 1982^ |
79 | Mission: Impossible - Fallout | Par. | $789.8 | $219.2 | 27.8% | $570.5 | 72.2% | 2018 |
80 | Fast & Furious 6 | Uni. | $788.7 | $238.7 | 30.3% | $550.0 | 69.7% | 2013 |
81 | Indiana Jones and the Kingdom of the Crystal Skull | Par. | $786.6 | $317.1 | 40.3% | $469.5 | 59.7% | 2008 |
82 | Spider-Man 2 | Sony | $783.8 | $373.6 | 47.7% | $410.2 | 52.3% | 2004 |
83 | Deadpool | Fox | $783.1 | $363.1 | 46.4% | $420.0 | 53.6% | 2016 |
84 | Star Wars | Fox | $775.4 | $461.0 | 59.5% | $314.4 | 40.5% | 1977^ |
85 | Guardians of the Galaxy | BV | $773.3 | $333.2 | 43.1% | $440.2 | 56.9% | 2014 |
86 | 2012 | Sony | $769.7 | $166.1 | 21.6% | $603.6 | 78.4% | 2009 |
87 | Maleficent | BV | $758.5 | $241.4 | 31.8% | $517.1 | 68.2% | 2014 |
88 | The Da Vinci Code | Sony | $758.2 | $217.5 | 28.7% | $540.7 | 71.3% | 2006 |
89 | The Amazing Spider-Man | Sony | $757.9 | $262.0 | 34.6% | $495.9 | 65.4% | 2012 |
90 | The Hunger Games: Mockingjay - Part 1 | LGF | $755.4 | $337.1 | 44.6% | $418.2 | 55.4% | 2014 |
91 | Shrek Forever After | P/DW | $752.6 | $238.7 | 31.7% | $513.9 | 68.3% | 2010 |
92 | X-Men: Days of Future Past | Fox | $747.9 | $233.9 | 31.3% | $513.9 | 68.7% | 2014 |
93 | Madagascar 3: Europe’s Most Wanted | P/DW | $746.9 | $216.4 | 29.0% | $530.5 | 71.0% | 2012 |
94 | Suicide Squad | WB | $746.8 | $325.1 | 43.5% | $421.7 | 56.5% | 2016 |
95 | The Chronicles of Narnia: The Lion, the Witch and the Wardrobe | BV | $745.0 | $291.7 | 39.2% | $453.3 | 60.8% | 2005 |
96 | Monsters University | BV | $744.2 | $268.5 | 36.1% | $475.7 | 63.9% | 2013 |
97 | The Matrix Reloaded | WB | $742.1 | $281.6 | 37.9% | $460.6 | 62.1% | 2003 |
98 | Up | BV | $735.1 | $293.0 | 39.9% | $442.1 | 60.1% | 2009 |
99 | Deadpool 2 | Fox | $734.2 | $318.5 | 43.4% | $415.8 | 56.6% | 2018 |
100 | Gravity | WB | $723.2 | $274.1 | 37.9% | $449.1 | 62.1% | 2013 |
Per project requirements, I save the file to csv and loaded the file from csv. Upon creation of the csv file, a placeholder columns X was added. I had to eliminate it after successful creation of the data frame.
#Remove dollar signs
# ref: [HOW]
moviesDF$Worldwide = as.numeric(gsub("[\\$,]","", moviesDF$Worldwide))
moviesDF$Domestic = as.numeric(gsub("\\$","", moviesDF$Domestic))
moviesDF$Overseas = as.numeric(gsub("[\\$,]","", moviesDF$Overseas))
studioDF <- aggregate(moviesDF$Domestic, by=list(Studio = moviesDF$Studio), FUN=sum)
names(studioDF) <- c("Studio","Domestic")
studioDF <- studioDF[order(-studioDF$Domestic),]
# ref: [PIE]
slices <- studioDF$Domestic
Studio <- studioDF$Studio
pct <- round(slices/sum(slices)*100)
Studio <- paste(Studio, pct)
Studio <- paste(Studio, "%", sep = " ")
pie(slices, labels = Studio, col = rainbow(length(Studio)), main = "Pie Chart of Domestic Revenue by Studio")
The movie data frame needed some tidying up before proceeding to answer the question. The fields Domestic, Overseas, and Worldwide were interpreted as strings by the data frame because they contained commas and dollar signs. I used gsub function to strip away the dollar signs and commas and I used as.numeric function to transform the string into a numeric field. From there I was able to use aggregate function to sum all the Domestic revenue by movie studios. I used that new data frame to create a very useful Pie Chart that shows certainly which movie studios have higher percentage of revenues. Based on the Pie Chart, Buena Vista, Warner Brothers, Univision, and Fox have the highest percentage of domestics revenues at 37%, 15%, 11%, and 10% respectivley. The final answer is YES: certain movie studios have a higher percentage of domestic revenue.
# Remove dollar signs from Pct columns
moviesDF$DomesticPct = as.numeric(gsub("\\%","", moviesDF$DomesticPct))
moviesDF$OverseasPct = as.numeric(gsub("\\%","", moviesDF$OverseasPct))
# Remove ^ (caret) symbols from Year
moviesDF$Year = as.numeric(gsub("\\^","", moviesDF$Year))
yearDF <- aggregate(moviesDF$OverseasPct, by=list(Year = moviesDF$Year), FUN=sum)
names(yearDF) <- c("Year","OverseasPct")
yearDF <- yearDF[order(-yearDF$Year),]
yearDF <- yearDF[yearDF$Year > 2008,]
yearDF %>%
kable() %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive")) %>%
scroll_box(width="100%",height="150px")
Year | OverseasPct | |
---|---|---|
25 | 2018 | 361.4 |
24 | 2017 | 809.5 |
23 | 2016 | 597.3 |
22 | 2015 | 466.4 |
21 | 2014 | 400.3 |
20 | 2013 | 516.6 |
19 | 2012 | 543.3 |
18 | 2011 | 217.1 |
17 | 2010 | 330.7 |
16 | 2009 | 408.6 |
Again, I used gsub and as.numeric functions to remove the percent symbols and ensure that Domestic Percent and Overseas Percent fields are numeric. Some carets exists on field values for Year, so I removed them using gsub. For tidying up the data I used aggregrate function to return to me the sum of all Overseas Percents by Year. I ordered the years decrementally and only used the last 10 years worth of data. From there, I developed a line chart that shows the total Overseas Percent in the last 10 years. According to the chart, the international revenue peeked last year (2017) but declined recently.
movies2DF <- aggregate(moviesDF$Worldwide, by=list(Year = moviesDF$Year), FUN=sum)
names(movies2DF) <- c("Year","Revenue")
movies2DF <- movies2DF[order(-movies2DF$Year),]
movies2DF <- movies2DF[movies2DF$Year > 2008,]
movies2DF %>%
kable() %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive")) %>%
scroll_box(width="100%",height="150px")
Year | Revenue | |
---|---|---|
25 | 2018 | 7430.4 |
24 | 2017 | 11721.0 |
23 | 2016 | 9321.4 |
22 | 2015 | 9559.0 |
21 | 2014 | 5095.2 |
20 | 2013 | 7541.6 |
19 | 2012 | 7945.1 |
18 | 2011 | 3511.0 |
17 | 2010 | 4633.7 |
16 | 2009 | 6950.2 |
ggplot(data = movies2DF, aes(x=Year, y=Revenue)) +
geom_bar(stat = "identity", position = "dodge", fill = "green") +
geom_text(aes(label=Revenue), vjust=1.6, color="black", position = position_dodge(0.9), size=3.5) +
scale_fill_brewer(palette="Paired") +
ggtitle("Worldwide Movie Revenue by Years") +
theme(axis.text.x=element_text(angle = 90, vjust = 0.5)) +
labs(x = "Year", y = "Revenue (millions $)")
To answer this question, I only focused on the last 10 years worth of data and since the question did not indicated domestic or international revenue, I used Worldwide revenue exclusively. I used the aggregate function to give me total revenue by the last 10 years. I developed a bar chart and it shows that we can’t tell if a movies in general are getting more popular. However, The revenue for this year ($7.42 Billion) is less than the less three years. However, in general, movie revenue has gone up for the last 10 years.
[DAT] Data Wrangling with dplyr and tidyr Cheat Sheet. Retrieved from website: https://www.rstudio.com/wp-content/uploads/2015/02/data-wrangling-cheatsheet.pdf
[HOW] How to Remove Dollar Sign in R (and other Currency Symbols). Retrieved from website: https://www.rforexcelusers.com/remove-currency-dollar-sign-r/
[PIE] Pie Charts. Retrieved from website: https://www.statmethods.net/graphs/pie.html
[REA] Read CSV in R. Retrieved from website: http://rprogramming.net/read-csv-in-r/
[SCA] Scatterplots. Retrieved from website: https://www.statmethods.net/graphs/scatterplot.html
[SCR] Scraping html tables into R data frames using the XML package. Retrieved from website: https://stackoverflow.com/questions/1395528/scraping-html-tables-into-r-data-frames-using-the-xml-package