Project 2

Loading the Data – NYC Street Trees

I will load the data and isolate the variables of interest. Due to the size of the file, I am unable to load the dataframe into github.

treesraw <- read.csv( '/Users/shanehylton/Desktop/fulltrees.csv')
trees <- treesraw[c(10,38,39)]

Plotting a Map of the Data

This suggestion recommended creating a plot of longitude and latitude for each tree. The challenging part of the recommendation will be coloring by species. The plot showed a very interesting outline of New York City.

ggplot(trees, aes(x = longitude, y = latitude, color = spc_common))+
  geom_point()+
  theme(legend.position = "none")+
  ggtitle('Plot of NYC Street Trees')+
  ylab("Latitude")+
  xlab("Longitude")+
  theme(plot.title = element_text(hjust = 0.5))

Analyzing the Lakers’ Stats Per Game

Because the only recommendations in the discussion post were to analyze the data, I will refine the dataframe and perform some manipulation. I chose to only pay attention to the players I deemed relevant from a points per game point of view. I filtered the dataframe to omit any players who scored fewer than 7 points per game. After this, I noticed that the points per game statistic was at the end of the data frame, so I relocated it to be after the player’s name for easy viewing.

link <- url('https://raw.githubusercontent.com/tylerbaker01/DATA-607-Project-2/main/Laker\'s%20Per%20Game%20Stats')
lakersraw <- read.csv(link)
agepts <- lakersraw[c(3, 28)]
relevantlakers <- filter(lakersraw, PTS.G > 7)
relsort <- relevantlakers[order(-relevantlakers$PTS.G), ]
relsort %>%
  relocate(PTS.G, .after = X)

##   Rk                        X PTS.G Age  G GS   MP  FG  FGA   FG. X3P X3PA
## 2  2            Anthony Davis  26.1  26 62 62 34.4 8.9 17.7 0.503 1.2  3.5
## 1  1             LeBron James  25.3  35 67 67 34.6 9.6 19.4 0.493 2.2  6.3
## 4  4               Kyle Kuzma  12.8  24 61  9 25.0 4.8 11.0 0.436 1.4  4.5
## 7  7             Dion Waiters  11.9  28  7  0 23.6 4.4 10.4 0.425 1.0  4.3
## 3  3 Kentavious Caldwell-Pope   9.3  26 69 26 25.5 3.4  7.3 0.467 1.3  3.5
## 6  6            Avery Bradley   8.6  29 49 44 24.2 3.5  7.8 0.444 1.3  3.5
## 5  5              Danny Green   8.0  32 68 68 24.8 2.9  7.0 0.416 1.8  4.8
## 9  9            Dwight Howard   7.5  34 69  2 18.9 2.9  4.0 0.729 0.0  0.1
## 8  8              Rajon Rondo   7.1  33 48  3 20.5 2.9  6.8 0.418 0.9  2.6
##    X3P. X2P X2PA  X2P.  eFG.  FT FTA   FT. ORB DRB TRB  AST STL BLK TOV  PF
## 2 0.330 7.7 14.2 0.546 0.536 7.2 8.5 0.846 2.3 7.0 9.3  3.2 1.5 2.3 2.5 2.5
## 1 0.348 7.4 13.1 0.564 0.550 3.9 5.7 0.693 1.0 6.9 7.8 10.2 1.2 0.5 3.9 1.8
## 4 0.316 3.4  6.5 0.518 0.500 1.9 2.5 0.735 0.9 3.6 4.5  1.3 0.5 0.4 1.5 2.1
## 7 0.233 3.4  6.1 0.558 0.473 2.0 2.3 0.875 0.3 1.6 1.9  2.4 0.6 0.6 1.9 2.3
## 3 0.385 2.1  3.9 0.541 0.558 1.1 1.5 0.775 0.6 1.5 2.1  1.6 0.8 0.2 0.9 1.9
## 6 0.364 2.2  4.3 0.510 0.526 0.4 0.5 0.833 0.4 2.0 2.3  1.3 0.9 0.1 1.0 2.2
## 5 0.367 1.1  2.2 0.524 0.542 0.5 0.7 0.688 0.8 2.6 3.3  1.3 1.3 0.5 0.9 2.0
## 9 0.600 2.9  3.9 0.732 0.735 1.6 3.1 0.514 2.5 4.9 7.3  0.7 0.4 1.1 1.2 3.2
## 8 0.328 2.0  4.2 0.473 0.480 0.6 0.9 0.659 0.5 2.5 3.0  5.0 0.8 0.0 1.9 1.2

Thread: World Government Indicators

For this step in the project, I took a look at the data Thomas linked and saw that it linked to a number of very wide, untidy Excel sheets. I downloaded one of the sheets and converted it to a .csv. Then I uploaded it to github and loaded it into R for tidying. After removing the unnecessary rows, I decided to aggregate the data to represent the mean data for each category over time. I am only interested in countries that existed through the entire data collection period, so I removed all NA entries.

My final dataframe is sorted by the percentile rank for government effectiveness.

link <- 'https://raw.githubusercontent.com/st3vejobs/Project-2-files/main/government_eff.csv'
govraw <- read.csv(url(link))
gov <- govraw[-c(1:14), ]

names <- c(unlist(c(gov[1, ])))
colnames(gov) <- names
gov <- gov[-c(1), ]
gov[gov=="#N/A"]<-NA
gov <- na.omit(gov)

cols <- c(which(names(gov) == 'Estimate'))
Estimate <- gov[cols]
Estimate[ , c(1:length(Estimate))] <- apply(Estimate[ , c(1:length(Estimate)),drop=F], 2,           
                 function(x) as.numeric(as.character(x)))
Estimate <- rowMeans(Estimate)
finaldf <- data.frame(gov[ , c(1:2)])
finaldf <- cbind(finaldf, Estimate)


cols <- c(which(names(gov) == 'StdErr'))
StdErr <- gov[cols]
StdErr[ , c(1:length(StdErr))] <- apply(StdErr[ , c(1:length(StdErr)),drop=F], 2,           
                 function(x) as.numeric(as.character(x)))
StdErr <- rowMeans(StdErr)
finaldf <- cbind(finaldf, StdErr)

cols <- c(which(names(gov) == 'NumSrc'))
NumSrc <- gov[cols]
NumSrc[ , c(1:length(NumSrc))] <- apply(NumSrc[ , c(1:length(NumSrc)),drop=F], 2,           
                 function(x) as.numeric(as.character(x)))
NumSrc <- rowMeans(NumSrc)
finaldf <- cbind(finaldf, NumSrc)

cols <- c(which(names(gov) == 'Rank'))
Rank <- gov[cols]
Rank[ , c(1:length(Rank))] <- apply(Rank[ , c(1:length(Rank)),drop=F], 2,           
                 function(x) as.numeric(as.character(x)))
Rank <- rowMeans(Rank)
finaldf <- cbind(finaldf, Rank)

cols <- c(which(names(gov) == 'Upper'))
Upper <- gov[cols]
Upper[ , c(1:length(Upper))] <- apply(Upper[ , c(1:length(Upper)),drop=F], 2,           
                 function(x) as.numeric(as.character(x)))
Upper <- rowMeans(Upper)
finaldf <- cbind(finaldf, Upper)

cols <- c(which(names(gov) == 'Lower'))
Lower <- gov[cols]
Lower[ , c(1:length(Lower))] <- apply(Lower[ , c(1:length(Lower)),drop=F], 2,           
                 function(x) as.numeric(as.character(x)))
Lower <- rowMeans(Lower)
finaldf <- cbind(finaldf, Lower)

finalsort <- finaldf[order(-finaldf$Rank), ]
head(finalsort)

##     Country.Territory Code Estimate    StdErr   NumSrc     Rank    Upper
## 185         Singapore  SGP 2.167273 0.2068182 7.181818 99.22727 99.95364
## 79            Finland  FIN 2.077273 0.2100000 6.545455 98.94364 99.89000
## 70            Denmark  DNK 2.024545 0.2086364 6.681818 98.31364 99.86909
## 52        Switzerland  CHE 1.972273 0.2154545 6.181818 98.00409 99.88864
## 160            Norway  NOR 1.905000 0.2172727 6.227273 97.31818 99.80136
## 159       Netherlands  NLD 1.876818 0.2086364 6.681818 96.93091 99.54136
##        Lower
## 185 95.40545
## 79  93.86955
## 70  93.01500
## 52  91.68773
## 160 90.37545
## 159 90.01818

Project 2

Shane Hylton

10/3/2021

Loading the Data – NYC Street Trees

Plotting a Map of the Data

Analyzing the Lakers’ Stats Per Game

Thread: World Government Indicators