The U.S. Geological Survey (USGS) is a program run by the National Institute of Standards and Technology (NIST) to help provide data and information about the occurences of earthquakes. The data is provided in a variety of formats and in a number of frequencies. For this analysis, data on all recorded earthquakes from the 30 days ending November 18, 2019 at 1:08 P.M. PST is being analyzed. This data was obtained from https://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/all_month.csv at the USGS web site. This data was placed into a file named earthquake.csv and is provided along with this report. The table below provides a brief overview of the variables contained with the dataset.
| Variable | Description |
|---|---|
| time | Time of Earthquake occurence |
| latitude | Latitude Location of Earthquake |
| longitude | Longitude location of Earthquake |
| depth | Depth of the Event |
| mag | Magnitude of Event |
| magType | Algorithm or Method Used to Evaluate the Method of the Earthquake |
| nst | Number of Seismic Stations used to evaluate Earthquake Location |
| gap | The Largest azimutahl gap between azimuthally adjacent stations (in degrees) |
| horizontalError | Uncertainty of Observed Event’s Location (in KM) |
| dmin | Smallest observed Distance to event epicenter from the Closest Seismic Station |
| rms | Root Mean Square Calculations of Residuals in predictions of Event occurence. |
| net | ID of Data Contributor |
| id | Unique Identification of Eathquake |
| updated | Time of Upload in Original Dataset |
| place | Nearby Named Geographical Region |
| horizontalError | Uncertainty of Earthquake Location (in KM) |
| depthError | Uncertainty of Earthquake Depth (in KM) |
| magNst | Total number of Seismic Stations used to Calculate Earthquake’s Magnitude |
| Status | Indicates Whether Event has been viewed by a Person |
| locationSource | Network that Authored location of Event |
| magSource | Network that Authored Preferred Magnitude |
A small sample of the dataset is provided below. Notice that during this time period of 30 days ending November 18, 2019, there were 11,886 observed earthquakes.
The USGS collects data on earthquakes that occur around the world, not just in the US. Figure 4 shows the locations of the 11,866 observed earthquakes.
Globally Observed Earthquakes
A cursory view of the map shows clustering earthquakes in typically known locations for earthquakes such as Alaska, the western United States, eastern Mediterranean, and Asian “Ring-of-Fire” around the countries of Japan, Malaysia, Phillipines. A closer inspection reveals that while the United States does have a lot of frequent earthquakes, they tend to be lower in strength (i.e. magnitude) than in other parts of the world. For example, both the western coast of South America and “Ring-of-Fire” show highly concentrated zones of very strong earthquakes.
Magnitude of Globally Observed Earthquakes
Figure 5 shows the approximate magnitudes of the observed earthquakes. The size of each circle show the approximate magnitude of each quake.
Figure 6 shows the approximate magnitudes of observed earthquakes in California and nearby areas.
Observed Earthquakes Around California
Magnitudes Detected by Sensors
Analysis: For the Bubble Chart in figure 7, we decided to utilize the variables NST, MagNST, and the Magnitude. The x-axis represents the amount of seismic sensors used to detect the location, and the y-axis represents the number of seismic sensors used to calculate the magnitude. The diameter of each circle, is based off of the Magnitude for each indivudal observation.
As evident from the bubble chart, it appeards that there are more Location seismic sensors used than sensors used to calculate magnitude. In addition, one can tell from the distribuion of the larger circles, that the number of seismic sensors used has no effect on evaluating the magnitude of a quake.
The World Health Organization (WHO) was created shortly after World War II as an international agency whose mission would be to improve overall world health. The WHO works within the United Nations system to help prevent and fight diseases around the world. They maintain a information about this mission in an online database called the Global Health Observatory (GHO). This can be accessed at https://www.who.int/data/gho.
The R package WHO provides an interface to the GHO database. This API is can obtain various datasets directly from the database. This analysis will focus on data related to Cholera. Cholera is an infection caused by eating or drinking food or water that is infected the the bacterium Vibrio cholerae. While it is preventable and treatable, it can cause death. The WHO estimates there are upward of 4 million cases of the infection with upwards of 143,000 of these resulting in death. See https://www.who.int/health-topics/cholera#tab=tab_1 for further details.
The R API was used to collect data related to cholera. Observations for the following indicators were collected and analyzed.
| Indicator | Description | Further Details |
|---|---|---|
| CHOLERA_0000000001 | Number of reported cases of cholera | https://www.who.int/data/gho/indicator-metadata-registry/imr-details/42 |
| WSH_10 | Number of diarrhoea deaths from inadequate water, sanitation and hygiene | https://www.who.int/data/gho/indicator-metadata-registry/imr-details/2260 |
Cases of Cholera by Deaths from Improper Water
The smoothed scatterplot in figure 8 depicts that as the number of cases of Cholera increases, the number of deaths from a basic water source remains relatively constant. This may be counter-intuitive from certain perspectives that may think that there should be a strong, positive linear correlation between Cholera cases and related deaths. However, it appears that the amount of Cholera cases is at a constant trend with the amount of deaths in a country.
Reported Cholera Cases by Country
The Lollipop chart in figure 9 depicts that that the country with the most reported Cholera cases, is Haiti. Other countries with significant Cholera populations, include the Democratic Republic of the Congo, Yemen, Somalia, the United Republoic of Tanzania, Kenya and South Sudan. For the most part, it appears that Cholera is not very prevalent among other countries.
Although one would think the data maintained by the WHO in the GHO database is clean and possibly even tidy, it is not. This section describes how this data was retrieved, transformed and prepared for the preceding analysis and visualizations.
First, the CHOLERA_0000000001 dataset containing the observed number of reported cases of Cholera is read from the GHO database.
tb_cholera <- get_data("CHOLERA_0000000001")
tb_cholera <- tb_cholera %>%
group_by(country) %>%
arrange(country, year) %>%
select(country, year, value, region) %>%
rename("cases" = value)
Next, the WSH_10 dataset containing the observed deaths due to poor water quality was then downloaded. This data is joined to the reported cholera cases data. However, it is important to note that the WSH-10 dataset only contains observations for 2016.
d_table <- get_data("WSH_10")
viz_data <- tb_cholera %>%
left_join(d_table,
by = c("country", "year", "region")) %>%
filter( year == 2016) %>%
rename('deaths' = value)
Although the data appears tidy and is close to being usable, the deaths variable contains extra data besides the death counts that needs to be ignored. The death counts are parsed out of the data to make the data yet even closer to being ready to analyze.
viz_data$deaths <- viz_data$deaths %>%
str_remove(pattern = "[:space:]\\[[:digit:]+-[:digit:]+\\]") %>%
parse_integer()
There is one variable gho that exists in the original dataset but it not needed. This variable is removed leaving the viz_data dataset as tidy and ready for the above analysis.
viz_data <- viz_data %>% select(-gho)