Author: Romerl Elizes

Preface

Summary

For this project, I answered all questions that I could answer for each of the Data Sets. For this project, I used the data sets from the following Discussions:

  • List of Allied Convoys during World War 2 by Romerl Elizes
  • Japense Earthquake Data for the Past Week by Jason Givens-Doyle
  • Worldwide Movie Grosses by Robert Mercier

Caveats and Limitations

Caveats to this Project:

  • The deliverables for this Project is 1 RMD file containing all three treatments of the acquired data sets.
  • For each analysis, were applicable, I included a graph from ggplot and a discussion on how and why I approached the data questions.
  • For “List of Allied Convoys during World War 2,” I did not answer my own questions 4-6 and will give explanations why I did not do them in the Discussion and Analysis section.
  • A large number of the field “Number of Convoys”" in the “List of Allied Convoys during World War 2” data set are blank. By default, I listed them as 1. The caveat with this is that the resulting data maybe skewed compared to actual data if they exist.
  • I felt the storing of the data frame onto a csv file, reading it again, and cleaning that file was redundant. I believed it minimized the purpose of tidying up the data. Nevertheless, I followed the project requirements accordingly: I did save the csv file and read from the csv file at the expense of elminating an unnecessary column added when creating the CSV file.

Load Appropriate Libraries

## Loading required package: bitops
## 
## Attaching package: 'tidyr'
## The following object is masked from 'package:RCurl':
## 
##     complete
## -- Attaching packages -------------------------------------------------------------------------------- tidyverse 1.2.1 --
## v ggplot2 2.2.1     v purrr   0.2.4
## v tibble  1.4.1     v dplyr   0.7.4
## v readr   1.1.1     v forcats 0.2.0
## -- Conflicts ----------------------------------------------------------------------------------- tidyverse_conflicts() --
## x tidyr::complete() masks RCurl::complete()
## x dplyr::filter()   masks stats::filter()
## x dplyr::lag()      masks stats::lag()

To display kable tables in the output, it was important to set the default knitr.table.format option to html, otherwise there would be a pandoc error.

I. Allied Convoys during World War 2

B. Save Data Frame to CSV file and load back to R

I saved the CSV file after I have combined all of the convoy tables into one data frame. The logic for this is found in sub-section 2a.

C. Discussion Questions and Tidying Analysis

1. Add an additional field that represents that Region.

For the requirements of this question, I used the mutate function from dplyr to add a column called type that will house the region name for each region. I also removed the header row from each of those tables.

2. Combine all tables into one table and place it in a data frame.

## Warning: Column `V1` joining factors with different levels, coercing to
## character vector
## Warning: Column `V2` joining factors with different levels, coercing to
## character vector
## Warning: Column `V3` joining factors with different levels, coercing to
## character vector
## Warning: Column `V4` joining factors with different levels, coercing to
## character vector
## Warning: Column `V5` joining factors with different levels, coercing to
## character vector
## Warning: Column `V6` joining factors with different levels, coercing to
## character vector
## Warning: Column `V1` joining character vector and factor, coercing into
## character vector
## Warning: Column `V2` joining character vector and factor, coercing into
## character vector
## Warning: Column `V3` joining character vector and factor, coercing into
## character vector
## Warning: Column `V4` joining character vector and factor, coercing into
## character vector
## Warning: Column `V5` joining character vector and factor, coercing into
## character vector
## Warning: Column `V6` joining character vector and factor, coercing into
## character vector

I used the union function from dplyr to combine the data tables together. I had to call the first one to unite table A and table B and call the second union function to combin table C to the result of table A and table B union. I fixed the header titles for ease of use.

2a. Store file into CSV and open it up in CSV

Per project requirements, I save the file to csv and loaded the file from csv. Upon creation of the csv file, a placeholder columns X was added. I had to eliminate it after successful creation of the data frame.

3. The Routes field contains the Origin and Destination separated by “to”. For example: White Sea to Dikson Island. We could create two fields that would contain the Origin and Destination. This would be quite useful in data analysis. For example, we could find how many sailings originated from an Origin or Destination port.

CodePrefix RTStart RTEnd FirstSailing LastSailing Convoy Notes Type
87 BB Belfast or River Clyde Bristol Channel 1940 from Belfast - 1945 from Clyde 1943 from Belfast - 1945 from Clyde European Coastal Atlantic
127 HN Bergen Methil, Fife 1939 1940 European Coastal Atlantic
47 BTC Bristol Channel River Thames 1944 1945 European Coastal Atlantic
132 BC Bristol Channel Bay of Biscay outward and return convoys used same number European Coastal Atlantic
29 NP British Isles Norway April 1940 May 1940 troop convoys European Coastal Atlantic
31 TM British Isles Norway April 1940 May 1940 troop convoys European Coastal Atlantic
93 FP British Isles Norway April 1940 May 1940 troop convoys European Coastal Atlantic
124 GREYBACK Dieppe, Seine-Maritime Newhaven, East Sussex ferry service European Coastal Atlantic
16 DB Dikson Island White Sea 1942 European Coastal Atlantic
26 FD Faroe Islands River Clyde military ferry European Coastal Atlantic
136 FS Firth of Forth River Thames 1939 1945 European Coastal Atlantic
65 GS Grimsby Southend-on-Sea 1940 1940 European Coastal Atlantic
14 HM Holyhead Milford Haven European Coastal Atlantic
73 SD Iceland River Clyde military ferry service European Coastal Atlantic
62 KP Kola Inlet Pechengsky District European Coastal Atlantic
69 KB Kola Inlet White Sea European Coastal Atlantic
6 OB Liverpool the Atlantic Ocean 7 September 1939 21 July 1941 345 merged with OA convoy in the southwest approaches - ON and OS convoys replaced OB convoys European Coastal Atlantic
109 UR Loch Ewe (later Belfast) ReykjavĂ­k European Coastal Atlantic
45 ON Methil, Fife Bergen 1939 1940 European Coastal Atlantic
76 EN Methil, Fife Oban via Loch Ewe temporarily replaced by EC convoys during 1941 European Coastal Atlantic
96 MT Methil, Fife River Tyne European Coastal Atlantic
48 MH Milford Haven Holyhead European Coastal Atlantic
137 SILVERTIP Newhaven, East Sussex Dieppe, Seine-Maritime ferry service European Coastal Atlantic
123 TP Norway British Isles May 1940 May 1940 troop convoys European Coastal Atlantic
8 PW Portsmouth Wales European Coastal Atlantic
79 DS River Clyde ReykjavĂ­k military ferry service European Coastal Atlantic
82 DF River Clyde Faroe Islands military ferry service European Coastal Atlantic
114 WN River Clyde, Oban and Loch Ewe Firth of Forth European Coastal Atlantic
1 TBC River Thames Bristol Channel 1944 1945 European Coastal Atlantic
78 FN River Thames Firth of Forth European Coastal Atlantic
53 OA River Thames (or Methil, Fife after July 1940) Liverpool 7 September 1939 24 October 1940 234 merged with OB convoy in the southwest approaches European Coastal Atlantic
34 CW Southend-on-Sea St. Helens Roads European Coastal Atlantic
44 EC Southend-on-Sea Oban via Firth of Forth 1941 1941 temporary substitution for EN convoys European Coastal Atlantic
102 SG Southend-on-Sea Grimsby 1940 1940 European Coastal Atlantic
92 CE St. Helens Roads Southend-on-Sea European Coastal Atlantic
49 WP Wales Portsmouth European Coastal Atlantic
37 HXA Western Approaches English Channel 1939 1945 English channel section of same numbered HX convoys - no sailings from 1940 to 1944 European Coastal Atlantic
97 BD White Sea Dikson Island September 1943 European Coastal Atlantic
98 BK White Sea Kola Inlet Summer 1941 European Coastal Atlantic
56 AH Aruba Halifax Harbour July 1942 September 1942 a brief tanker series North American
81 AW Aruba Curaçao local tanker traffic North American
117 ARG Boston Argentia, Newfoundland USN convoys North American
130 BX Boston Halifax Harbour 21 March 1942 22 May 1945 North American
38 BS Corner Brook, Newfoundland Sydney, Nova Scotia a brief series North American
2 CP Curaçao Panama Canal 1942 1942 North American
43 WA Curaçao Aruba North American
120 CZ Curaçao Panama Canal North American
99 GM Galveston, Texas Mississippi River North American
25 GS Greenland Sydney, Nova Scotia or St. John’s, Newfoundland USN escorted convoys North American
20 GP Guantanamo Bay Naval Base Panama Canal North American
28 GN Guantanamo Bay Naval Base New York City North American
54 GAT Guantanamo Bay Naval Base Trinidad via Aruba North American
57 GK Guantanamo Bay Naval Base Key West, Florida North American
85 GZ Guantanamo Bay Naval Base Panama Canal 1942 North American
103 G Guantanamo Bay Naval Base Puerto Rico North American
107 GJ Guantanamo Bay Naval Base Kingston, Jamaica North American
5 HA Halifax Harbour Curaçao 1942 September 1942 replaced HT convoys North American
22 HS Halifax Harbour Sydney, Nova Scotia North American
41 HT Halifax Harbour Trinidad May 1942 1942 replaced by HA convoys North American
50 HHX Halifax Harbour meet HX convoys originating in New York City at the Halifax Ocean Meeting Point (HOMP) North American
89 HON Halifax Harbour ON convoys at the Halifax Ocean Meeting Point (HOMP) North American
104 HF Halifax Harbour Saint John, New Brunswick North American
131 XB Halifax Harbour Boston 18 March 1942 North American
139 HJ Halifax Harbour St. John’s, Newfoundland North American
116 CK Havana, Cuba Key West, Florida North American
110 HK Houston, Texas Key West, Florida North American
33 WAT Key West Trinidad via Curaçao North American
13 KW Key West, Florida Havana, Cuba North American
21 KG Key West, Florida Guantanamo Bay Naval Base North American
61 KP Key West, Florida Mississippi River North American
67 KH Key West, Florida Houston, Texas North American
134 KN Key West, Florida New York City North American
122 JG Kingston, Jamaica Guantanamo Bay Naval Base North American
10 NL Labrador St. Lawrence River North American
63 PK Mississippi River Key West, Florida North American
95 MG Mississippi River Galveston, Texas North American
11 KS New York City Key West, Florida North American
27 NG New York City Guantanamo Bay Naval Base North American
101 NK New York City Key West, Florida North American
115 NJ Newfoundland (island) coast St. John’s, Newfoundland North American
23 ZC Panama Canal Curaçao North American
77 CW Panama Canal Key West, Florida North American
94 PG Panama Canal Guantanamo Bay Naval Base 1942 1942 NA North American
140 ZG Panama Canal Guantanamo Bay Naval Base North American
121 QS Quebec Sydney, Nova Scotia North American
24 FH Saint John, New Brunswick Halifax Harbour late 1942 North American
39 SJ San Juan, Puerto Rico Guantanamo Bay Naval Base North American
83 JH St. John’s, Newfoundland Halifax Harbour North American
113 JN St. John’s, Newfoundland Labrador North American
126 CL St. John’s, Newfoundland Sydney, Nova Scotia North American
55 LN St. Lawrence River Labrador North American
17 SB Sydney, Nova Scotia Corner Brook, Newfoundland 1942 North American
84 SQ Sydney, Nova Scotia Quebec North American
86 SHX Sydney, Nova Scotia HX convoys North American
90 LC Sydney, Nova Scotia St. John’s, Newfoundland North American
91 BW Sydney, Nova Scotia St. John’s, Newfoundland 1942 North American
133 SH Sydney, Nova Scotia Halifax Harbour North American
7 SG Sydney, Nova Scotia (later St. John’s, Newfoundland) Greenland 1942 1942 USN escorted convoys North American
35 TO Trinidad Curaçao NA North American
74 TH Trinidad Halifax Harbour 1942 1942 North American
119 TAW Trinidad Key West via Curaçao 1942 1942 North American
129 TG Trinidad Guantanamo Bay Naval Base North American
135 TAG Trinidad Guantanamo Bay Naval Base via Aruba 1942 North American
128 WS Wabana, Newfoundland and Labrador Sydney, Nova Scotia North American
111 BHX Bermuda Liverpool May 1940 March 1941 97 (# 41-137) sailed from Bermuda and merged with same number HX convoy at sea North Atlantic
4 TU British Isles United States 1943 1944 troopships North Atlantic
58 TA British Isles United States large troopships North Atlantic
70 CT British Isles Canada 1941 1941 troopships North Atlantic
51 TC Canada British Isles 1939 1941 troopships carrying Canadian troops North Atlantic
138 NA Canada British Isles troopships North Atlantic
59 TCU Caribbean (later New York City) Liverpool 14-knot CU convoys of tankers and fast cargo ships with troopships North Atlantic
80 CU Caribbean (later New York City) Liverpool 20 March 1943 30 May 1945 73 14-knot convoys of tankers with some fast cargo ships North Atlantic
40 CK Charleston, South Carolina British Isles 1944 1944 rarely used North Atlantic
12 UGS Chesapeake Bay Mediterranean 13 November 1942 28 May 1945 95 slower ships North Atlantic
60 UGF Chesapeake Bay Mediterranean 24 October 1942 8 April 1945 22 faster ships - (UGF-1) was the invasion force for Operation Torch North Atlantic
72 KMF Firth of Clyde Mediterranean 26 October 1942 23 May 1945 45 faster ships to the Mediterranean North Atlantic
100 HG Gibraltar Liverpool 26 September 1939 19 September 1942 89 replaced by MKS convoys after Operation Torch North Atlantic
64 HXF Halifax Harbour Liverpool 29 Sept 1939 12 February 1940 17 fast sections of HX convoys North Atlantic
125 HX Halifax Harbour (later New York City) Liverpool 16 Sept 1939 23 May 1945 377 9-knot convoys for ships of sustained speeds less than 15 knots North Atlantic
18 JW Iceland White Sea 25 December 1942 12 May 1945 17 (# 51-67) replaced PQ convoys North Atlantic
19 PQ Iceland White Sea 29 September 1941 2 September 1942 18 replaced by JW convoys North Atlantic
3 KJ Kingston, Jamaica United Kingdom North Atlantic
30 ON Liverpool Halifax Harbour 26 July 1941 27 May 1945 307 replaced OB convoys for North American destinations - alternate convoys included slower ships until the ONS convoys started North Atlantic
32 OS Liverpool Sierra Leone 24 July 1941 27 May 1945 131 replaced OB convoys for non-North American destinations - included KMS convoys detached west of Gibraltar North Atlantic
42 TUC Liverpool Caribbean (later New York City) 14-knot UC convoys of tankers and fast cargo ships with some troopships North Atlantic
75 KMS Liverpool Mediterranean 22 October 1942 27 April 1945 98 slower ships to the Mediterranean - 1st 12 sailed independently - remainder sailed with OS convoys and detached west of Gibraltar North Atlantic
88 OG Liverpool Gibraltar 2 October 1939 17 October 1943 95 early sailings every 5th merged OA/OB convoy became an OG convoy at sea - later OG convoys sailed from Liverpool North Atlantic
108 UC Liverpool Caribbean (later New York City) 15 February 1943 3 June 1945 71 14-knot convoys of tankers with some fast cargo ships North Atlantic
118 ONS Liverpool Halifax Harbour 15 March 1943 21 May 1945 51 slower ships westbound on the ON convoy route North Atlantic
15 GUF Mediterranean Chesapeake Bay 29 November 1942 16 April 1945 22 faster ships North Atlantic
46 MKF Mediterranean Firth of Clyde or Liverpool 12 November 1942 4 June 1945 45 faster ships from the Mediterranean North Atlantic
71 MKS Mediterranean Liverpool 12 November 1942 25 May 1945 103 slower ships from the Mediterranean- 1st 11 sailed independently - remainder merged with SL convoys west of Gibraltar North Atlantic
106 GUS Mediterranean Chesapeake Bay 21 December 1942 27 May 1945 92 slower ships North Atlantic
112 SL Sierra Leone Liverpool 14 September 1939 25 November 1944 178 merged with MKS convoys west of Gibraltar North Atlantic
36 SC Sydney, Nova Scotia (or Halifax Harbour or New York City) Liverpool 15 August 1940 26 May 1945 177 7-knot convoys of eastbound ships too slow for the 9-knot HX convoys North Atlantic
9 AT United States British Isles March 1942 1945 troopships North Atlantic
66 UT United States British Isles 1943 1944 troopships North Atlantic
68 RB United States British Isles September 1942 September 1942 1 small passenger steamers North Atlantic
52 QP White Sea Iceland 28 September 1941 17 November 1942 15 replaced by RA convoys North Atlantic
105 RA White Sea Scotland 30 December 1942 23 May 1945 17 (# 51-67) replaced QP convoys North Atlantic

4. Add two additional fields to represent Origin Country and Destination Country. Again, this could be useful to find out from a more global perspective, the number of sailings from Origin and Destination countries. If I were the enemy, I would like to know this information so that I can concentrate my forces there.

I did not answer this question. There were too many Origin and Destination ports. A country could have multiple ports with the same name. Since I do not know the port names and the associated country they belong to it would not be feasible for this project. Moreover, guessing would not improve data accuracy.

5. The Origin and Destination ports also identify where the major ports in the world. Allied convoys usually consist of multiple ships which means that the Origin and Destination ports should have the facilities to handle these ships.

RTStart NumConvoy
35 Liverpool 1099
25 Halifax Harbour (later New York City) 377
37 Mediterranean 262
51 River Thames (or Methil, Fife after July 1940) 234
54 Sierra Leone 178
61 Sydney, Nova Scotia (or Halifax Harbour or New York City) 177
11 Chesapeake Bay 117
4 Bermuda 97
20 Gibraltar 89
9 Caribbean (later New York City) 74

RTEnd NumConvoy
36 Liverpool 1346
25 Halifax Harbour 364
66 the Atlantic Ocean 345
37 Mediterranean 260
59 Sierra Leone 131
10 Chesapeake Bay 114
20 Gibraltar 95
9 Caribbean (later New York City) 72
17 Firth of Clyde or Liverpool 45
73 White Sea 37

To answer this question, the field containing the Number of Convoys was not in a state to be used properly. Some field values were blank or NA. Other field values contained a parentheses with the number. I wrote a function that would default the number of convoys to 1, extract the numeric value from the field especially if it has parentheses, or ensure the value is numeric. After this I was able to retrieve the total number of convoys by Origin Port and by Destination Port using the aggregate function. Finally, I created a Bar Chart that showed the top 10 convoys by Origin Port and by Destination Port.

Based on the Bar Chart display, the city of Liverpool has the most number of convoys for Departures and Arrivals at 1099 and 1346 respectively. A distanct second is Halifax Harbour (which eventually becomes New York City). The very fact that Liverpool is the Number 1 Destination Port may indicate that the Allied U-Boat screens leading into England are very effective in protecting shipping going into and out of Liverpool.

7. Determine total number of convoys by Region.

Region NumConvoy
European Coastal Atlantic 616
North American 65
North Atlantic 2264

Dividing the total number of convoys by Region was pretty straightforward because I followed the same logic as that when I determined the total number of convoys by Origin Ports and Destination Ports. The North Altanic region contains a large number of convoys at 2264. The second highest number of convoys is the European Coastal Atlantic at 616.

II. Japanese Earthquakes for the Past Week

B. Store file into CSV and open it up in CSV

ObservedDate Prefecture Magnitude MaxSeismic ReportedDate
02:13 JST 04 Oct 2018 Ibaraki-ken Oki M4.1 2 02:17 JST 04 Oct 2018
00:15 JST 04 Oct 2018 Chiba-ken Toho-oki M4.6 4 00:19 JST 04 Oct 2018
22:57 JST 03 Oct 2018 Kumamoto-ken Kumamoto-chiho M2.5 2 23:00 JST 03 Oct 2018
21:04 JST 03 Oct 2018 Iburi-chiho Chutobu M3.0 1 21:06 JST 03 Oct 2018
18:11 JST 03 Oct 2018 Iburi-chiho Chutobu M2.8 1 18:14 JST 03 Oct 2018
10:38 JST 03 Oct 2018 Wakayama-ken Hokubu M2.4 1 10:42 JST 03 Oct 2018
00:32 JST 03 Oct 2018 Gifu-ken Mino-chuseibu M2.9 1 00:35 JST 03 Oct 2018
19:25 JST 02 Oct 2018 Fukushima-ken Oki M4.1 1 19:28 JST 02 Oct 2018
16:12 JST 02 Oct 2018 Nagano-ken Hokubu M2.4 1 16:14 JST 02 Oct 2018
12:29 JST 02 Oct 2018 Ibaraki-ken Oki M3.9 1 12:34 JST 02 Oct 2018
12:28 JST 02 Oct 2018 Ibaraki-ken Oki M4.7 3 12:31 JST 02 Oct 2018
05:03 JST 02 Oct 2018 Iburi-chiho Chutobu M3.0 1 05:06 JST 02 Oct 2018
04:29 JST 02 Oct 2018 Iwate-ken Oki M4.3 1 04:32 JST 02 Oct 2018
23:21 JST 01 Oct 2018 Iburi-chiho Chutobu M3.3 1 23:24 JST 01 Oct 2018
21:11 JST 01 Oct 2018 Iwate-ken Oki M3.6 2 21:14 JST 01 Oct 2018
17:25 JST 01 Oct 2018 Iburi-chiho Chutobu M2.8 1 17:28 JST 01 Oct 2018
16:37 JST 01 Oct 2018 Iburi-chiho Chutobu M3.2 1 16:39 JST 01 Oct 2018
11:22 JST 01 Oct 2018 Iburi-chiho Chutobu M4.9 4 11:25 JST 01 Oct 2018
03:42 JST 01 Oct 2018 Kyoto-fu Nambu M3.2 1 03:45 JST 01 Oct 2018
03:32 JST 01 Oct 2018 Iburi-chiho Chutobu M3.2 1 03:35 JST 01 Oct 2018
01:50 JST 01 Oct 2018 Iburi-chiho Chutobu M2.7 1 01:53 JST 01 Oct 2018
19:20 JST 30 Sep 2018 Iburi-chiho Chutobu M3.0 1 19:23 JST 30 Sep 2018
17:58 JST 30 Sep 2018 Iburi-chiho Chutobu M3.0 1 18:01 JST 30 Sep 2018
17:54 JST 30 Sep 2018 Iburi-chiho Chutobu M5.0 4 17:57 JST 30 Sep 2018
14:04 JST 30 Sep 2018 Iburi-chiho Chutobu M2.7 1 14:06 JST 30 Sep 2018
20:56 JST 29 Sep 2018 Chiba-ken Toho-oki M3.7 2 20:59 JST 29 Sep 2018
18:26 JST 29 Sep 2018 Iburi-chiho Chutobu M4.2 3 18:29 JST 29 Sep 2018
23:03 JST 28 Sep 2018 Kumamoto-ken Kumamoto-chiho M2.6 2 23:05 JST 28 Sep 2018
22:50 JST 28 Sep 2018 Ibaraki-ken Oki M3.5 1 22:53 JST 28 Sep 2018
18:57 JST 28 Sep 2018 Izu-ohshima Kinkai M3.1 1 19:00 JST 28 Sep 2018
11:50 JST 28 Sep 2018 Kii-suido M2.4 1 11:53 JST 28 Sep 2018
06:05 JST 28 Sep 2018 Iburi-chiho Chutobu M2.4 1 06:09 JST 28 Sep 2018
04:32 JST 28 Sep 2018 Fukushima-ken Oki M4.0 3 04:36 JST 28 Sep 2018
16:24 JST 27 Sep 2018 Iburi-chiho Chutobu M2.7 1 16:27 JST 27 Sep 2018
07:54 JST 27 Sep 2018 Nagano-ken Hokubu M2.3 2 07:57 JST 27 Sep 2018
06:14 JST 27 Sep 2018 Iburi-chiho Chutobu M3.4 2 06:17 JST 27 Sep 2018
03:43 JST 27 Sep 2018 Kumamoto-ken Kumamoto-chiho M3.0 2 03:46 JST 27 Sep 2018

Per project requirements, I save the file to csv and loaded the file from csv. Upon creation of the csv file, a placeholder columns X was added. I had to eliminate it after successful creation of the data frame.

C. Discussion Questions and Tidying Analysis

1. The most obvious analysis would be between magnitude (which shows the energy released on a logarithmic scale) and maximum seismic intensity (which describes how much the quake was felt on the surface - a full description is here).

After creating the data frame to house all recent Japanese earthquakes, I tidied up Magnitude and MaxSeismic fields by making sure that they were numeric. In order to define the relationship between Magnitude and Maximum Seisimic Intensity, I developed a Scatter Diagram that shows this relationship.

2. Other ideas would be tracking which prefectures had the most earthquakes in the last week, which ones were affected the most or if there was any correlation between the time between the earthquake occurred and when it was reported and any other data (location, magnitude, seismic intensity, time of day).

Prefecture Total Earthquakes
Iburi-chiho Chutobu 17
Ibaraki-ken Oki 4
Kumamoto-ken Kumamoto-chiho 3
Chiba-ken Toho-oki 2
Fukushima-ken Oki 2
Iwate-ken Oki 2
Nagano-ken Hokubu 2
Gifu-ken Mino-chuseibu 1
Izu-ohshima Kinkai 1
Kii-suido 1
Kyoto-fu Nambu 1
Wakayama-ken Hokubu 1

I did not create a graph for this, but instead used dplyr wrangling to tidy up the data so that I could have only a list of Prefectures and Total Earthquakes for the week. I used kable package to display the values in a nice HTML format. Thanks Teammate Soumya!

III. All Time Box Office Worldwide Grosses

B. Store file into CSV and open it up in CSV

Rank Title Studio Worldwide Domestic DomesticPct Overseas OverseasPct Year
1 Avatar Fox $2,788.0 $760.5 27.3% $2,027.5 72.7% 2009^
2 Titanic Par. $2,187.5 $659.4 30.1% $1,528.1 69.9% 1997^
3 Star Wars: The Force Awakens BV $2,068.2 $936.7 45.3% $1,131.6 54.7% 2015
4 Avengers: Infinity War BV $2,046.5 $678.8 33.2% $1,367.7 66.8% 2018
5 Jurassic World Uni. $1,671.7 $652.3 39.0% $1,019.4 61.0% 2015
6 Marvel’s The Avengers BV $1,518.8 $623.4 41.0% $895.5 59.0% 2012
7 Furious 7 Uni. $1,516.0 $353.0 23.3% $1,163.0 76.7% 2015
8 Avengers: Age of Ultron BV $1,405.4 $459.0 32.7% $946.4 67.3% 2015
9 Black Panther BV $1,346.9 $700.1 52.0% $646.8 48.0% 2018
10 Harry Potter and the Deathly Hallows Part 2 WB $1,341.5 $381.0 28.4% $960.5 71.6% 2011
11 Star Wars: The Last Jedi BV $1,332.5 $620.2 46.5% $712.4 53.5% 2017
12 Jurassic World: Fallen Kingdom Uni. $1,303.8 $416.7 32.0% $887.1 68.0% 2018
13 Frozen BV $1,276.5 $400.7 31.4% $875.7 68.6% 2013
14 Beauty and the Beast (2017) BV $1,263.5 $504.0 39.9% $759.5 60.1% 2017
15 The Fate of the Furious Uni. $1,236.0 $226.0 18.3% $1,010.0 81.7% 2017
16 Iron Man 3 BV $1,214.8 $409.0 33.7% $805.8 66.3% 2013
17 Incredibles 2 BV $1,209.2 $606.8 50.2% $602.4 49.8% 2018
18 Minions Uni. $1,159.4 $336.0 29.0% $823.4 71.0% 2015
19 Captain America: Civil War BV $1,153.3 $408.1 35.4% $745.2 64.6% 2016
20 Transformers: Dark of the Moon P/DW $1,123.8 $352.4 31.4% $771.4 68.6% 2011
21 The Lord of the Rings: The Return of the King NL $1,119.9 $377.8 33.7% $742.1 66.3% 2003^
22 Skyfall Sony $1,108.6 $304.4 27.5% $804.2 72.5% 2012
23 Transformers: Age of Extinction Par. $1,104.1 $245.4 22.2% $858.6 77.8% 2014
24 The Dark Knight Rises WB $1,084.9 $448.1 41.3% $636.8 58.7% 2012
25 Toy Story 3 BV $1,067.0 $415.0 38.9% $652.0 61.1% 2010
26 Pirates of the Caribbean: Dead Man’s Chest BV $1,066.2 $423.3 39.7% $642.9 60.3% 2006
27 Rogue One: A Star Wars Story BV $1,056.1 $532.2 50.4% $523.9 49.6% 2016
28 Pirates of the Caribbean: On Stranger Tides BV $1,045.7 $241.1 23.1% $804.6 76.9% 2011
29 Despicable Me 3 Uni. $1,034.8 $264.6 25.6% $770.2 74.4% 2017
30 Jurassic Park Uni. $1,029.2 $402.5 39.1% $626.7 60.9% 1993^
31 Finding Dory BV $1,028.6 $486.3 47.3% $542.3 52.7% 2016
32 Star Wars: Episode I - The Phantom Menace Fox $1,027.0 $474.5 46.2% $552.5 53.8% 1999^
33 Alice in Wonderland (2010) BV $1,025.5 $334.2 32.6% $691.3 67.4% 2010
34 Zootopia BV $1,023.8 $341.3 33.3% $682.5 66.7% 2016
35 The Hobbit: An Unexpected Journey WB (NL) $1,021.1 $303.0 29.7% $718.1 70.3% 2012
36 The Dark Knight WB $1,004.6 $534.9 53.2% $469.7 46.8% 2008^
37 Harry Potter and the Sorcerer’s Stone WB $974.8 $317.6 32.6% $657.2 67.4% 2001
38 Despicable Me 2 Uni. $970.8 $368.1 37.9% $602.7 62.1% 2013
39 The Lion King BV $968.5 $422.8 43.7% $545.7 56.3% 1994^
40 The Jungle Book (2016) BV $966.6 $364.0 37.7% $602.5 62.3% 2016
41 Pirates of the Caribbean: At World’s End BV $963.4 $309.4 32.1% $654.0 67.9% 2007
42 Jumanji: Welcome to the Jungle Sony $962.1 $404.5 42.0% $557.6 58.0% 2017
43 Harry Potter and the Deathly Hallows Part 1 WB $960.3 $296.0 30.8% $664.3 69.2% 2010
44 The Hobbit: The Desolation of Smaug WB (NL) $958.4 $258.4 27.0% $700.0 73.0% 2013
45 The Hobbit: The Battle of the Five Armies WB (NL) $956.0 $255.1 26.7% $700.9 73.3% 2014
46 Finding Nemo BV $940.3 $380.8 40.5% $559.5 59.5% 2003^
47 Harry Potter and the Order of the Phoenix WB $939.9 $292.0 31.1% $647.9 68.9% 2007
48 Harry Potter and the Half-Blood Prince WB $934.4 $302.0 32.3% $632.5 67.7% 2009
49 The Lord of the Rings: The Two Towers NL $926.0 $342.6 37.0% $583.5 63.0% 2002^
50 Shrek 2 DW $919.8 $441.2 48.0% $478.6 52.0% 2004
51 Harry Potter and the Goblet of Fire WB $896.9 $290.0 32.3% $606.9 67.7% 2005
52 Spider-Man 3 Sony $890.9 $336.5 37.8% $554.3 62.2% 2007
53 Ice Age: Dawn of the Dinosaurs Fox $886.7 $196.6 22.2% $690.1 77.8% 2009
54 Spectre Sony $880.7 $200.1 22.7% $680.6 77.3% 2015
55 Spider-Man: Homecoming Sony $880.2 $334.2 38.0% $546.0 62.0% 2017
56 Harry Potter and the Chamber of Secrets WB $879.0 $262.0 29.8% $617.0 70.2% 2002
57 Ice Age: Continental Drift Fox $877.2 $161.3 18.4% $715.9 81.6% 2012
58 The Secret Life of Pets Uni. $875.5 $368.4 42.1% $507.1 57.9% 2016
59 Batman v Superman: Dawn of Justice WB $873.6 $330.4 37.8% $543.3 62.2% 2016
60 The Lord of the Rings: The Fellowship of the Ring NL $871.5 $315.5 36.2% $556.0 63.8% 2001^
61 Wolf Warrior 2 HC $870.3 $2.7 0.3% $867.6 99.7% 2017
62 The Hunger Games: Catching Fire LGF $865.0 $424.7 49.1% $440.3 50.9% 2013
63 Guardians of the Galaxy Vol. 2 BV $863.8 $389.8 45.1% $473.9 54.9% 2017
64 Inside Out BV $857.6 $356.5 41.6% $501.1 58.4% 2015
65 Thor: Ragnarok BV $854.0 $315.1 36.9% $538.9 63.1% 2017
66 Star Wars: Episode III - Revenge of the Sith Fox $848.8 $380.3 44.8% $468.5 55.2% 2005^
67 Transformers: Revenge of the Fallen P/DW $836.3 $402.1 48.1% $434.2 51.9% 2009
68 The Twilight Saga: Breaking Dawn Part 2 LG/S $829.7 $292.3 35.2% $537.4 64.8% 2012
69 Inception WB $828.3 $292.6 35.3% $535.7 64.7% 2010
70 Wonder Woman WB $821.8 $412.6 50.2% $409.3 49.8% 2017
71 Spider-Man Sony $821.7 $403.7 49.1% $418.0 50.9% 2002
72 Independence Day Fox $817.4 $306.2 37.5% $511.2 62.5% 1996^
73 Fantastic Beasts and Where To Find Them WB $814.0 $234.0 28.8% $580.0 71.2% 2016
74 Coco BV $807.1 $209.7 26.0% $597.4 74.0% 2017
75 Shrek the Third P/DW $799.0 $322.7 40.4% $476.2 59.6% 2007
76 Harry Potter and the Prisoner of Azkaban WB $796.7 $249.5 31.3% $547.1 68.7% 2004
77 Pirates of the Caribbean: Dead Men Tell No Tales BV $794.9 $172.6 21.7% $622.3 78.3% 2017
78 E.T.: The Extra-Terrestrial Uni. $792.9 $435.1 54.9% $357.8 45.1% 1982^
79 Mission: Impossible - Fallout Par. $789.8 $219.2 27.8% $570.5 72.2% 2018
80 Fast & Furious 6 Uni. $788.7 $238.7 30.3% $550.0 69.7% 2013
81 Indiana Jones and the Kingdom of the Crystal Skull Par. $786.6 $317.1 40.3% $469.5 59.7% 2008
82 Spider-Man 2 Sony $783.8 $373.6 47.7% $410.2 52.3% 2004
83 Deadpool Fox $783.1 $363.1 46.4% $420.0 53.6% 2016
84 Star Wars Fox $775.4 $461.0 59.5% $314.4 40.5% 1977^
85 Guardians of the Galaxy BV $773.3 $333.2 43.1% $440.2 56.9% 2014
86 2012 Sony $769.7 $166.1 21.6% $603.6 78.4% 2009
87 Maleficent BV $758.5 $241.4 31.8% $517.1 68.2% 2014
88 The Da Vinci Code Sony $758.2 $217.5 28.7% $540.7 71.3% 2006
89 The Amazing Spider-Man Sony $757.9 $262.0 34.6% $495.9 65.4% 2012
90 The Hunger Games: Mockingjay - Part 1 LGF $755.4 $337.1 44.6% $418.2 55.4% 2014
91 Shrek Forever After P/DW $752.6 $238.7 31.7% $513.9 68.3% 2010
92 X-Men: Days of Future Past Fox $747.9 $233.9 31.3% $513.9 68.7% 2014
93 Madagascar 3: Europe’s Most Wanted P/DW $746.9 $216.4 29.0% $530.5 71.0% 2012
94 Suicide Squad WB $746.8 $325.1 43.5% $421.7 56.5% 2016
95 The Chronicles of Narnia: The Lion, the Witch and the Wardrobe BV $745.0 $291.7 39.2% $453.3 60.8% 2005
96 Monsters University BV $744.2 $268.5 36.1% $475.7 63.9% 2013
97 The Matrix Reloaded WB $742.1 $281.6 37.9% $460.6 62.1% 2003
98 Up BV $735.1 $293.0 39.9% $442.1 60.1% 2009
99 Deadpool 2 Fox $734.2 $318.5 43.4% $415.8 56.6% 2018
100 Gravity WB $723.2 $274.1 37.9% $449.1 62.1% 2013

Per project requirements, I save the file to csv and loaded the file from csv. Upon creation of the csv file, a placeholder columns X was added. I had to eliminate it after successful creation of the data frame.

C. Discussion Questions and Tidying Analysis

1. Do certain movie studios have a higher percentage of domestic grosses?

The movie data frame needed some tidying up before proceeding to answer the question. The fields Domestic, Overseas, and Worldwide were interpreted as strings by the data frame because they contained commas and dollar signs. I used gsub function to strip away the dollar signs and commas and I used as.numeric function to transform the string into a numeric field. From there I was able to use aggregate function to sum all the Domestic revenue by movie studios. I used that new data frame to create a very useful Pie Chart that shows certainly which movie studios have higher percentage of revenues. Based on the Pie Chart, Buena Vista, Warner Brothers, Univision, and Fox have the highest percentage of domestics revenues at 37%, 15%, 11%, and 10% respectivley. The final answer is YES: certain movie studios have a higher percentage of domestic revenue.

2. Has the percentage of international grosses gone up recently (I’m defining “recent” this as the last decade or so)?

Year OverseasPct
25 2018 361.4
24 2017 809.5
23 2016 597.3
22 2015 466.4
21 2014 400.3
20 2013 516.6
19 2012 543.3
18 2011 217.1
17 2010 330.7
16 2009 408.6

Again, I used gsub and as.numeric functions to remove the percent symbols and ensure that Domestic Percent and Overseas Percent fields are numeric. Some carets exists on field values for Year, so I removed them using gsub. For tidying up the data I used aggregrate function to return to me the sum of all Overseas Percents by Year. I ordered the years decrementally and only used the last 10 years worth of data. From there, I developed a line chart that shows the total Overseas Percent in the last 10 years. According to the chart, the international revenue peeked last year (2017) but declined recently.

References

[DAT] Data Wrangling with dplyr and tidyr Cheat Sheet. Retrieved from website: https://www.rstudio.com/wp-content/uploads/2015/02/data-wrangling-cheatsheet.pdf

[HOW] How to Remove Dollar Sign in R (and other Currency Symbols). Retrieved from website: https://www.rforexcelusers.com/remove-currency-dollar-sign-r/

[PIE] Pie Charts. Retrieved from website: https://www.statmethods.net/graphs/pie.html

[REA] Read CSV in R. Retrieved from website: http://rprogramming.net/read-csv-in-r/

[SCA] Scatterplots. Retrieved from website: https://www.statmethods.net/graphs/scatterplot.html

[SCR] Scraping html tables into R data frames using the XML package. Retrieved from website: https://stackoverflow.com/questions/1395528/scraping-html-tables-into-r-data-frames-using-the-xml-package