#Loading in Packages and data
library(plotly)

medals <- read.csv("summer2016.csv")
View(medals)
state <- read.csv("state_economic_data.csv")
View(state)

Problem 1

library(plotly)
library(dplyr)


medals <- medals %>%
  filter(Sex == "F") %>%
  mutate(Weightlbs = Weight*2.2046, 
         HeightInches = Height*0.39370) %>%
  plot_ly(x = ~HeightInches, y = ~Weightlbs, color = ~Medal,
          hoverinfo = "text",
          text = ~paste("Athlete:", Name, "<br>",
                        "Height:", round(HeightInches, 1), "<br>",
                        "Weight:", round(Weightlbs, 1), "<br>",
                        "Medal:", Medal)) %>%
  add_markers(colors = c("Bronze" = "#804A00", "Silver" = "grey", "Gold" = "gold"),
              marker = list(size = 7, opacity = 0.75)) %>%
  layout(title = "2016 Female Olympic Medalists: Height vs Weight",
         xaxis = list(title = "Height (in)"),
         yaxis = list(title = "Weight (lbs)"))

medals

This is a scatterplot comparing the heights and weights of Female Medalists in the 2016 Olympics. I found this dataset off of datacamp, and it includes information such as the athlete, their demographics (age, sex, height, weight) and olympic information such as their country, sport, event, and medal recieved. I was interested in learning if Female’s height and/or weight had an influence of the medal they earned. For instance, do taller women generally have more gold medals than a female who is shorter? After graphing this in plotly, there does show to be some pattern regarding the height and weight of a woman influencing which medal they received. Comparing the right side of the graph to the left, we can see there significantly more gold medals than silver or bronze. This tells us that the taller the female, the more likely they are to win a gold medal over a competitor who is shorter. Along these same lines, there does seem to be more gold medals for those who weigh more than 185lbs. It is also cool to analyze how there seems to be a large number of silver medals distributed in the middle of the graph. This shows that these athletes are “average” among other medalists.

While I did not run into very many issues creating this graph, there were a lot of moving parts. This graph did take me longer to produce because you have to work step by step through multiple different sections. After each addition I made, I would run my code to ensure that it ran correctly. While this was not extremely challenging, it took more effort and thought.

Problem 2

library(dplyr)
library(plotly)


state_econ <- state %>%
  filter(year >= 2003) %>%
  plot_ly(x = ~population, y = ~employment, color = ~region, frame = ~year,
          hoverinfo = "text",
          text = ~paste("State:", state, "<br>",
                        "Population:", population, "<br>",
                        "Employment:", employment)) %>%
  add_markers(ids = ~state,
              colors = c("pink", "orange", "lightblue", "lightgreen"),
              showlegend = TRUE,
              marker = list(size = 10, opacity = .75)) %>%
  add_text(x = 30000, y = 3000000, text = ~year, frame = ~year,
           textfont = list(size=60, color = toRGB("black"), textfont = list(family = "Arial Black")), 
           showlegend = FALSE) %>%
  animation_slider(hide = TRUE) %>%
  layout(title = "Population vs Employment Across U.S. States by Year",
         xaxis = list(title = "Population (Thousands)"),
         yaxis = list(title = "Employment (Millions)"))

state_econ

This graph is a scatter plot of Population versus Employment in the United States. Each state is represented by a point on the graph, and these points are colored by their region. There is also an animation aspect that allows us to see how these states change between the years 2003 and 2017. Examining the entire plot over the years, there seems to be a very strong, linear correlation between population and employment. As the population increases, so does employment. By animating, we are able to see trends in the economy. For example, we can see a decreasing trend from 2008 to 2010, a result from the 2008 recession/stock market crash. Other than this, population and employment rise as the years continue. This graph also allows us to see individual states and regions. We can see that Southern states such as Florida and Texas have had a big increase in Population and Employment over the 14 year period. We can also see that many western states such as Montana and Idaho have had pretty steady population and employment over the years. Even though these are just a few features to look at, there is much more to analyze on this graph.

The hardest part of this graph was the adding the year on the graph. This feature was created using the add_text() function. When I tried to add this feature, the graph plotted the years 1997 to 2017. The year would show on the graph between 1997 to 2002, and once 2003 came along, the year just disappeared. Not only this, all of the points (representing US states) were one color and the legend disappeared as well. Long story short, this feature gave me some troubles. To help with the disappearing years, I decided to only use the years greater than or equal to 2003. Since there was no employment data for 1997-2002, there was no point to include them. Then I fixed the colors of the points by specifically calling the “colors” feature in the add_markers() function. This overrode the single color issue. Then finally for the legend, I made sure that I set showlegend equal to true in the add_markers() function and set it equal to false in the add_text() function. Stating both of these specifically allowed R to formally understand the code. The only thing that I still cannot understand is why the year 2004 is “jumping” into the graph. I tried to change the position and size of the text and nothing changed. It is only that single year that gives me issues.