Introduction: The article I chose to do my analysis on wasn't an actual pure "news article". "Club Soccer Predictions": https://projects.fivethirtyeight.com/soccer-predictions/ was more of a listing of predictions and current rankings. I chose this dataset because the World Cup is around the corner and I am a huge fan of soccer.

soccer_data<-read.csv('https://raw.githubusercontent.com/Sangeetha-007/R-Practice/master/607/Assignments/Assignment1/spi_global_rankings_intl.csv')
head(soccer_data)
##   rank      name   confed  off  def   spi
## 1    1    Brazil CONMEBOL 3.21 0.23 94.26
## 2    2     Spain     UEFA 2.90 0.37 90.58
## 3    3   Germany     UEFA 3.32 0.61 90.05
## 4    4  Portugal     UEFA 2.89 0.44 89.34
## 5    5    France     UEFA 2.81 0.42 89.03
## 6    6 Argentina CONMEBOL 2.68 0.39 88.43

Summary of the International Soccer Power Index data

##       rank            name              confed               off       
##  Min.   :  1.00   Length:220         Length:220         Min.   :0.200  
##  1st Qu.: 55.75   Class :character   Class :character   1st Qu.:0.690  
##  Median :110.50   Mode  :character   Mode  :character   Median :1.070  
##  Mean   :110.50                                         Mean   :1.163  
##  3rd Qu.:165.25                                         3rd Qu.:1.560  
##  Max.   :220.00                                         Max.   :3.320  
##       def            spi       
##  Min.   :0.23   Min.   : 0.26  
##  1st Qu.:0.91   1st Qu.:17.74  
##  Median :1.34   Median :37.83  
##  Mean   :1.71   Mean   :39.38  
##  3rd Qu.:2.17   3rd Qu.:58.85  
##  Max.   :6.27   Max.   :94.26

Created a subset with the 10 team international teams

soccer_subset<-head(soccer_data, 10)
soccer_subset
##    rank        name   confed  off  def   spi
## 1     1      Brazil CONMEBOL 3.21 0.23 94.26
## 2     2       Spain     UEFA 2.90 0.37 90.58
## 3     3     Germany     UEFA 3.32 0.61 90.05
## 4     4    Portugal     UEFA 2.89 0.44 89.34
## 5     5      France     UEFA 2.81 0.42 89.03
## 6     6   Argentina CONMEBOL 2.68 0.39 88.43
## 7     7 Netherlands     UEFA 2.94 0.62 86.89
## 8     8     England     UEFA 2.44 0.52 83.67
## 9     9     Belgium     UEFA 2.76 0.71 83.53
## 10   10     Uruguay CONMEBOL 2.32 0.50 82.64

Renamed the columns of the subset

colnames(soccer_subset) <- c("Ranking", "Country", "Association", "Offense", "Defense", "Soccer Power Index")
head(soccer_subset)
##   Ranking   Country Association Offense Defense Soccer Power Index
## 1       1    Brazil    CONMEBOL    3.21    0.23              94.26
## 2       2     Spain        UEFA    2.90    0.37              90.58
## 3       3   Germany        UEFA    3.32    0.61              90.05
## 4       4  Portugal        UEFA    2.89    0.44              89.34
## 5       5    France        UEFA    2.81    0.42              89.03
## 6       6 Argentina    CONMEBOL    2.68    0.39              88.43

Looked at the summary of the new subset

summary(soccer_subset)
##     Ranking        Country          Association           Offense     
##  Min.   : 1.00   Length:10          Length:10          Min.   :2.320  
##  1st Qu.: 3.25   Class :character   Class :character   1st Qu.:2.700  
##  Median : 5.50   Mode  :character   Mode  :character   Median :2.850  
##  Mean   : 5.50                                         Mean   :2.827  
##  3rd Qu.: 7.75                                         3rd Qu.:2.930  
##  Max.   :10.00                                         Max.   :3.320  
##     Defense       Soccer Power Index
##  Min.   :0.2300   Min.   :82.64     
##  1st Qu.:0.3975   1st Qu.:84.47     
##  Median :0.4700   Median :88.73     
##  Mean   :0.4810   Mean   :87.84     
##  3rd Qu.:0.5875   3rd Qu.:89.87     
##  Max.   :0.7100   Max.   :94.26

Created a new csv of the subset so it can be uploaded to github

write.csv(soccer_subset, file="SoccerSubset.csv", row.names=FALSE)
getwd()
## [1] "/Users/Sangeetha"

Bar graph on Defense per Country

ggplot(soccer_subset, aes(x=Country, y=Defense)) + geom_bar(stat="identity") + 
  labs(x="Country", y="Defense") +ggtitle("International Soccer Team Defense Values")

Bar graph on Offense per Country

ggplot(soccer_subset, aes(x=Country, y=Offense)) + geom_bar(stat="identity") + 
  labs(x="Country", y="Offense") +ggtitle("International Soccer Team Offense Values")

The country with the highest defense is Belgium, while the country with the highest offense is Germany. This is interesting to me because I didn't think Belgium is a strong soccer team at all. I would have assumed Argentina or Brazil would have higher values.

Conclusion/Findings and Recommendations:

From my analysis, it was interesting to see how highest defense is Belgium and highest offense is Germany. I can't make further analysis based on the data presented in the zip file to take this analysis further, but if I find supporting data elsewhere, it may be beneficial. Data based on the number of fans or population for each national team would be helpful and I can probably check if there is a correlation to the offenses and defenses.