I will load the data and isolate the variables of interest. Due to the size of the file, I am unable to load the dataframe into github.
treesraw <- read.csv( '/Users/shanehylton/Desktop/fulltrees.csv')
trees <- treesraw[c(10,38,39)]
This suggestion recommended creating a plot of longitude and latitude for each tree. The challenging part of the recommendation will be coloring by species. The plot showed a very interesting outline of New York City.
ggplot(trees, aes(x = longitude, y = latitude, color = spc_common))+
geom_point()+
theme(legend.position = "none")+
ggtitle('Plot of NYC Street Trees')+
ylab("Latitude")+
xlab("Longitude")+
theme(plot.title = element_text(hjust = 0.5))
Because the only recommendations in the discussion post were to analyze the data, I will refine the dataframe and perform some manipulation. I chose to only pay attention to the players I deemed relevant from a points per game point of view. I filtered the dataframe to omit any players who scored fewer than 7 points per game. After this, I noticed that the points per game statistic was at the end of the data frame, so I relocated it to be after the player’s name for easy viewing.
link <- url('https://raw.githubusercontent.com/tylerbaker01/DATA-607-Project-2/main/Laker\'s%20Per%20Game%20Stats')
lakersraw <- read.csv(link)
agepts <- lakersraw[c(3, 28)]
relevantlakers <- filter(lakersraw, PTS.G > 7)
relsort <- relevantlakers[order(-relevantlakers$PTS.G), ]
relsort %>%
relocate(PTS.G, .after = X)
## Rk X PTS.G Age G GS MP FG FGA FG. X3P X3PA
## 2 2 Anthony Davis 26.1 26 62 62 34.4 8.9 17.7 0.503 1.2 3.5
## 1 1 LeBron James 25.3 35 67 67 34.6 9.6 19.4 0.493 2.2 6.3
## 4 4 Kyle Kuzma 12.8 24 61 9 25.0 4.8 11.0 0.436 1.4 4.5
## 7 7 Dion Waiters 11.9 28 7 0 23.6 4.4 10.4 0.425 1.0 4.3
## 3 3 Kentavious Caldwell-Pope 9.3 26 69 26 25.5 3.4 7.3 0.467 1.3 3.5
## 6 6 Avery Bradley 8.6 29 49 44 24.2 3.5 7.8 0.444 1.3 3.5
## 5 5 Danny Green 8.0 32 68 68 24.8 2.9 7.0 0.416 1.8 4.8
## 9 9 Dwight Howard 7.5 34 69 2 18.9 2.9 4.0 0.729 0.0 0.1
## 8 8 Rajon Rondo 7.1 33 48 3 20.5 2.9 6.8 0.418 0.9 2.6
## X3P. X2P X2PA X2P. eFG. FT FTA FT. ORB DRB TRB AST STL BLK TOV PF
## 2 0.330 7.7 14.2 0.546 0.536 7.2 8.5 0.846 2.3 7.0 9.3 3.2 1.5 2.3 2.5 2.5
## 1 0.348 7.4 13.1 0.564 0.550 3.9 5.7 0.693 1.0 6.9 7.8 10.2 1.2 0.5 3.9 1.8
## 4 0.316 3.4 6.5 0.518 0.500 1.9 2.5 0.735 0.9 3.6 4.5 1.3 0.5 0.4 1.5 2.1
## 7 0.233 3.4 6.1 0.558 0.473 2.0 2.3 0.875 0.3 1.6 1.9 2.4 0.6 0.6 1.9 2.3
## 3 0.385 2.1 3.9 0.541 0.558 1.1 1.5 0.775 0.6 1.5 2.1 1.6 0.8 0.2 0.9 1.9
## 6 0.364 2.2 4.3 0.510 0.526 0.4 0.5 0.833 0.4 2.0 2.3 1.3 0.9 0.1 1.0 2.2
## 5 0.367 1.1 2.2 0.524 0.542 0.5 0.7 0.688 0.8 2.6 3.3 1.3 1.3 0.5 0.9 2.0
## 9 0.600 2.9 3.9 0.732 0.735 1.6 3.1 0.514 2.5 4.9 7.3 0.7 0.4 1.1 1.2 3.2
## 8 0.328 2.0 4.2 0.473 0.480 0.6 0.9 0.659 0.5 2.5 3.0 5.0 0.8 0.0 1.9 1.2
For this step in the project, I took a look at the data Thomas linked and saw that it linked to a number of very wide, untidy Excel sheets. I downloaded one of the sheets and converted it to a .csv. Then I uploaded it to github and loaded it into R for tidying. After removing the unnecessary rows, I decided to aggregate the data to represent the mean data for each category over time. I am only interested in countries that existed through the entire data collection period, so I removed all NA entries.
My final dataframe is sorted by the percentile rank for government effectiveness.
link <- 'https://raw.githubusercontent.com/st3vejobs/Project-2-files/main/government_eff.csv'
govraw <- read.csv(url(link))
gov <- govraw[-c(1:14), ]
names <- c(unlist(c(gov[1, ])))
colnames(gov) <- names
gov <- gov[-c(1), ]
gov[gov=="#N/A"]<-NA
gov <- na.omit(gov)
cols <- c(which(names(gov) == 'Estimate'))
Estimate <- gov[cols]
Estimate[ , c(1:length(Estimate))] <- apply(Estimate[ , c(1:length(Estimate)),drop=F], 2,
function(x) as.numeric(as.character(x)))
Estimate <- rowMeans(Estimate)
finaldf <- data.frame(gov[ , c(1:2)])
finaldf <- cbind(finaldf, Estimate)
cols <- c(which(names(gov) == 'StdErr'))
StdErr <- gov[cols]
StdErr[ , c(1:length(StdErr))] <- apply(StdErr[ , c(1:length(StdErr)),drop=F], 2,
function(x) as.numeric(as.character(x)))
StdErr <- rowMeans(StdErr)
finaldf <- cbind(finaldf, StdErr)
cols <- c(which(names(gov) == 'NumSrc'))
NumSrc <- gov[cols]
NumSrc[ , c(1:length(NumSrc))] <- apply(NumSrc[ , c(1:length(NumSrc)),drop=F], 2,
function(x) as.numeric(as.character(x)))
NumSrc <- rowMeans(NumSrc)
finaldf <- cbind(finaldf, NumSrc)
cols <- c(which(names(gov) == 'Rank'))
Rank <- gov[cols]
Rank[ , c(1:length(Rank))] <- apply(Rank[ , c(1:length(Rank)),drop=F], 2,
function(x) as.numeric(as.character(x)))
Rank <- rowMeans(Rank)
finaldf <- cbind(finaldf, Rank)
cols <- c(which(names(gov) == 'Upper'))
Upper <- gov[cols]
Upper[ , c(1:length(Upper))] <- apply(Upper[ , c(1:length(Upper)),drop=F], 2,
function(x) as.numeric(as.character(x)))
Upper <- rowMeans(Upper)
finaldf <- cbind(finaldf, Upper)
cols <- c(which(names(gov) == 'Lower'))
Lower <- gov[cols]
Lower[ , c(1:length(Lower))] <- apply(Lower[ , c(1:length(Lower)),drop=F], 2,
function(x) as.numeric(as.character(x)))
Lower <- rowMeans(Lower)
finaldf <- cbind(finaldf, Lower)
finalsort <- finaldf[order(-finaldf$Rank), ]
head(finalsort)
## Country.Territory Code Estimate StdErr NumSrc Rank Upper
## 185 Singapore SGP 2.167273 0.2068182 7.181818 99.22727 99.95364
## 79 Finland FIN 2.077273 0.2100000 6.545455 98.94364 99.89000
## 70 Denmark DNK 2.024545 0.2086364 6.681818 98.31364 99.86909
## 52 Switzerland CHE 1.972273 0.2154545 6.181818 98.00409 99.88864
## 160 Norway NOR 1.905000 0.2172727 6.227273 97.31818 99.80136
## 159 Netherlands NLD 1.876818 0.2086364 6.681818 96.93091 99.54136
## Lower
## 185 95.40545
## 79 93.86955
## 70 93.01500
## 52 91.68773
## 160 90.37545
## 159 90.01818