1) Earthquakes

Data Source

The U.S. Geological Survey (USGS) is a program run by the National Institute of Standards and Technology (NIST) to help provide data and information about the occurences of earthquakes. The data is provided in a variety of formats and in a number of frequencies. For this analysis, data on all recorded earthquakes from the 30 days ending November 18, 2019 at 1:08 P.M. PST is being analyzed. This data was obtained from https://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/all_month.csv at the USGS web site. This data was placed into a file named earthquake.csv and is provided along with this report. The table below provides a brief overview of the variables contained with the dataset.

Variable	Description
time	Time of Earthquake occurence
latitude	Latitude Location of Earthquake
longitude	Longitude location of Earthquake
depth	Depth of the Event
mag	Magnitude of Event
magType	Algorithm or Method Used to Evaluate the Method of the Earthquake
nst	Number of Seismic Stations used to evaluate Earthquake Location
gap	The Largest azimutahl gap between azimuthally adjacent stations (in degrees)
horizontalError	Uncertainty of Observed Event’s Location (in KM)
dmin	Smallest observed Distance to event epicenter from the Closest Seismic Station
rms	Root Mean Square Calculations of Residuals in predictions of Event occurence.
net	ID of Data Contributor
id	Unique Identification of Eathquake
updated	Time of Upload in Original Dataset
place	Nearby Named Geographical Region
horizontalError	Uncertainty of Earthquake Location (in KM)
depthError	Uncertainty of Earthquake Depth (in KM)
magNst	Total number of Seismic Stations used to Calculate Earthquake’s Magnitude
Status	Indicates Whether Event has been viewed by a Person
locationSource	Network that Authored location of Event
magSource	Network that Authored Preferred Magnitude

A small sample of the dataset is provided below. Notice that during this time period of 30 days ending November 18, 2019, there were 11,886 observed earthquakes.

Vizualizations

Earthquake Locations

The USGS collects data on earthquakes that occur around the world, not just in the US. Figure 4 shows the locations of the 11,866 observed earthquakes.

Globally Observed Earthquakes

A cursory view of the map shows clustering earthquakes in typically known locations for earthquakes such as Alaska, the western United States, eastern Mediterranean, and Asian “Ring-of-Fire” around the countries of Japan, Malaysia, Phillipines. A closer inspection reveals that while the United States does have a lot of frequent earthquakes, they tend to be lower in strength (i.e. magnitude) than in other parts of the world. For example, both the western coast of South America and “Ring-of-Fire” show highly concentrated zones of very strong earthquakes.

Earthquake Magnitudes

Magnitude of Globally Observed Earthquakes

Figure 5 shows the approximate magnitudes of the observed earthquakes. The size of each circle show the approximate magnitude of each quake.

Figure 6 shows the approximate magnitudes of observed earthquakes in California and nearby areas.

Observed Earthquakes Around California

Earthquake Magnitudes Detected by Sensors

Magnitudes Detected by Sensors

Analysis: For the Bubble Chart in figure 7, we decided to utilize the variables NST, MagNST, and the Magnitude. The x-axis represents the amount of seismic sensors used to detect the location, and the y-axis represents the number of seismic sensors used to calculate the magnitude. The diameter of each circle, is based off of the Magnitude for each indivudal observation.

As evident from the bubble chart, it appeards that there are more Location seismic sensors used than sensors used to calculate magnitude. In addition, one can tell from the distribuion of the larger circles, that the number of seismic sensors used has no effect on evaluating the magnitude of a quake.

2) Disease / Illness

Data Source

The World Health Organization (WHO) was created shortly after World War II as an international agency whose mission would be to improve overall world health. The WHO works within the United Nations system to help prevent and fight diseases around the world. They maintain a information about this mission in an online database called the Global Health Observatory (GHO). This can be accessed at https://www.who.int/data/gho.

The R package WHO provides an interface to the GHO database. This API is can obtain various datasets directly from the database. This analysis will focus on data related to Cholera. Cholera is an infection caused by eating or drinking food or water that is infected the the bacterium Vibrio cholerae. While it is preventable and treatable, it can cause death. The WHO estimates there are upward of 4 million cases of the infection with upwards of 143,000 of these resulting in death. See https://www.who.int/health-topics/cholera#tab=tab_1 for further details.

The R API was used to collect data related to cholera. Observations for the following indicators were collected and analyzed.

Indicator	Description	Further Details
CHOLERA_0000000001	Number of reported cases of cholera	https://www.who.int/data/gho/indicator-metadata-registry/imr-details/42
WSH_10	Number of diarrhoea deaths from inadequate water, sanitation and hygiene	https://www.who.int/data/gho/indicator-metadata-registry/imr-details/2260

Vizualizations

Cases of Cholera by Deaths from Improper Water

The smoothed scatterplot in figure 8 depicts that as the number of cases of Cholera increases, the number of deaths from a basic water source remains relatively constant. This may be counter-intuitive from certain perspectives that may think that there should be a strong, positive linear correlation between Cholera cases and related deaths. However, it appears that the amount of Cholera cases is at a constant trend with the amount of deaths in a country.

Reported Cholera Cases by Country

The Lollipop chart in figure 9 depicts that that the country with the most reported Cholera cases, is Haiti. Other countries with significant Cholera populations, include the Democratic Republic of the Congo, Yemen, Somalia, the United Republoic of Tanzania, Kenya and South Sudan. For the most part, it appears that Cholera is not very prevalent among other countries.

Data Wrangling

Although one would think the data maintained by the WHO in the GHO database is clean and possibly even tidy, it is not. This section describes how this data was retrieved, transformed and prepared for the preceding analysis and visualizations.

First, the CHOLERA_0000000001 dataset containing the observed number of reported cases of Cholera is read from the GHO database.

tb_cholera <- get_data("CHOLERA_0000000001")

tb_cholera <- tb_cholera %>%
  group_by(country) %>%
  arrange(country, year) %>%
  select(country, year, value, region) %>%
  rename("cases" = value)

Next, the WSH_10 dataset containing the observed deaths due to poor water quality was then downloaded. This data is joined to the reported cholera cases data. However, it is important to note that the WSH-10 dataset only contains observations for 2016.

d_table <- get_data("WSH_10")

viz_data <- tb_cholera %>%
  left_join(d_table,
            by = c("country", "year", "region")) %>%
  filter( year == 2016) %>%
  rename('deaths' = value)

Although the data appears tidy and is close to being usable, the deaths variable contains extra data besides the death counts that needs to be ignored. The death counts are parsed out of the data to make the data yet even closer to being ready to analyze.

viz_data$deaths <- viz_data$deaths %>%
  str_remove(pattern = "[:space:]\\[[:digit:]+-[:digit:]+\\]") %>%
  parse_integer()

There is one variable gho that exists in the original dataset but it not needed. This variable is removed leaving the viz_data dataset as tidy and ready for the above analysis.

viz_data <- viz_data %>% select(-gho)

Data Visualization Techniques

STAT 451-01

Jedidiah Harwood, Kurt Wydrinski

December 11, 2019

1) Earthquakes

Data Source

Vizualizations

Earthquake Locations

Earthquake Magnitudes

Earthquake Magnitudes Detected by Sensors

2) Disease / Illness

Data Source

Vizualizations

Cases of Cholera by Deaths from Improper Water

Reported Cholera Cases by Country

Data Wrangling

Data Visualization Techniques

STAT 451-01

Jedidiah Harwood, Kurt Wydrinski

December 11, 2019

1) Earthquakes

Data Source

Vizualizations

Earthquake Locations

Earthquake Magnitudes

Earthquake Magnitudes Detected by Sensors

2) Disease / Illness

Data Source

Vizualizations

Cases of Cholera by Deaths from Improper Water

Reported Cholera Cases by Country

Water Quality Related Deaths

Regional Share of Cholera Cases

Data Wrangling