setwd("C:/Users/Elianna Nahandast/Desktop/R Project Data/Project_1")
Warning: The working directory was changed to C:/Users/Elianna Nahandast/Desktop/R Project Data/Project_1 inside a notebook chunk. The working directory will be reset when the chunk is finished running. Use the knitr root.dir option in the setup chunk to change the working directory for notebook chunks.
female <- read.csv("female.csv")

# Determines if R considers female to be a data frame
is.data.frame(female)
[1] TRUE
# Can now use head to see what female looks like
head(female)

# Isolates 3 columns from the original data set
library(tidyverse)
female <- female %>%
  filter(Country.Name == "Brazil" | Country.Name == "India" | Country.Name == "United States")

head(female)

#Eliminate unwanted columns
colnames(female)
 [1] "Series.Name"    "Series.Code"    "Country.Name"   "Country.Code"   "X1990..YR1990." "X2000..YR2000." "X2012..YR2012." "X2013..YR2013."
 [9] "X2014..YR2014." "X2015..YR2015." "X2016..YR2016." "X2017..YR2017." "X2018..YR2018." "X2019..YR2019." "X2020..YR2020." "X2021..YR2021."
rel_col <- which(colnames(female)== "Country.Name" | colnames(female)== "X2014..YR2014." | colnames(female)== "X2017..YR2017." | colnames(female)== "X2021..YR2021.")
rel_col
[1]  3  9 12 16
female <- female[rel_col] %>% rename("2014" = 2, "2017" = 3, "2021" = 4)
head(female)

#Flip rows and columns
transpose_f <- data.frame(t(female[-1]))
colnames(transpose_f) <- female[, 1]
head(transpose_f)

#tells us what type the data is being stored as. In this case they are characters meaning computations with them will fail
#So we must turn them into numeric values
print(sapply(transpose_f, class))
       Brazil         India United States 
  "character"   "character"   "character" 
transpose_f$Brazil = as.numeric(transpose_f$Brazil)
transpose_f$India = as.numeric(transpose_f$India)
transpose_f$"United States" = as.numeric(transpose_f$"United States")
head(transpose_f)

#Can now be summarized (compute the mean)
#summarise_each(transpose_f, list(mean))

transpose_f %>% summarise(across(c(Brazil, India, "United States"), list(mean=mean, sd=sd)))

#Graph data
year <- c(2014, 2017, 2021)
ggplot(data=transpose_f, aes(x=year, y=India, group=1)) + geom_line() + geom_point()


################################ Condensed Version for Male Data ##################################################
setwd("C:/Users/Elianna Nahandast/Desktop/R Project Data/Project_1")
male <- read.csv("male.csv")
male <- male %>% 
  filter(Country.Name == "Brazil" | Country.Name == "India" | Country.Name == "United States")

male <- male[rel_col] %>% 
  rename("2014" = 2, "2017" = 3, "2021" = 4)

transpose_m <- data.frame(t(male[-1]))
colnames(transpose_m) <- male[, 1]

transpose_m$Brazil = as.numeric(as.character(transpose_m$Brazil))
transpose_m$India = as.numeric(as.character(transpose_m$India))
transpose_m$"United States" = as.numeric(as.character((transpose_m$"United States")))

head(transpose_m)

#Distinguish between male and female data
transpose_m <- rename(transpose_m, "Brazil_m" = 1, "India_m" = 2, "United_States_m" = 3)

head(transpose_m)

#Merge male and female data sets
transpose_m <- rownames_to_column(transpose_m, var="Year")
transpose_f <- rownames_to_column(transpose_f, var="Year")

acct_owner_by_gender <- merge(x = transpose_m, y = transpose_f, by = "Year", all.x = TRUE)
acct_owner_by_gender <- rename(acct_owner_by_gender, "United_States" = 7)

head(acct_owner_by_gender)

#Plot data
gfg_plot <- ggplot(acct_owner_by_gender, aes(x=year)) +
  geom_line(aes(y = India), color = "black") +
  geom_line(aes(y = India_m), color = "red") + 
  geom_line(aes(y = Brazil), color = "green") +
  geom_line(aes(y = Brazil_m), color = "blue") +
  geom_line(aes(y = United_States), color = "purple") +
  geom_line(aes(y = United_States_m), color = "violet") 

gfg_plot


###################### Project 1 ############################

# (1) Change y-axis from "India" to "Percentage Ownership"
gfg_plot + labs(y = "Percentage Ownership")

  1. Discuss percentage change over time

Of the 3 countries a higher account ownership among men more likely, especially within Brazil and India. However, notably in the United States, there is no clear higher ownership between the two genders as in some years it is higher among females while in others it is equally higher among males. Also, notably in India, as of the recent years this gap in ownership between the two genders has slowly closed. Whereas in Brazil it has stayed relatively the same over the years.

As for account ownership overall, many countries demonstrate a rise in ownership especially among females. The only areas in contradiction with this trend are the male populations of both India and the United States, which have demonstrated a gradual decrease in account ownership. However, even with this decline, account ownership percentages of 2021 remain across all countries, for the most part, higher than those of 2014.

  1. Is it possible to say that account percentages have been increasing if you disregard a certain year? Does this change if you focus on a particular particular country and gender combinations?

Across all countries, there is a noticeable sharp turning point around 2017 where each country either spiked in account ownership or declined heavily. In this case, disregarding years after 2017 could make it seem as though account ownership in a country was more likely when in reality it was less likely at a point in time and vice versus.

There are also many cases in each country where omitting certain years could produce the idea of a more probable account ownership percentage For example: In India, disregarding 2016 - 2018 In Brazil, disregarding 2016 - 2018 In the United States 2015 - 2018 (Although this would be more of a constant rate)

These results could also be replicated with different gender and country combinations, for example only using the higher ownership genders for each country: the females of the United States, females of India (don’t have a sharp decline like the men do although their percentage is lower across the years in comparison), and males of Brazil. These combinations could create the allusions of a higher likelihood of account ownership percentages among certain genders.

#################### Project 1 (Continued) #############################
# (4) Analysis of csv files

# Get and Plot primary or less percentage rates
setwd("C:/Users/Elianna Nahandast/Desktop/R Project Data/Project_1")
Warning: The working directory was changed to C:/Users/Elianna Nahandast/Desktop/R Project Data/Project_1 inside a notebook chunk. The working directory will be reset when the chunk is finished running. Use the knitr root.dir option in the setup chunk to change the working directory for notebook chunks.
primary <- read.csv("primary.csv")

library(tidyverse)
primary <- primary %>% 
  filter(Country.Name == "Albania" | Country.Name == "Australia" | Country.Name == "Armenia" | Country.Name == "Algeria" | Country.Name == "Argentina")

head(primary)

#Eliminate unwanted columns
rel_col <- which(colnames(primary)== "Country.Name" | colnames(primary)== "X2014..YR2014." | colnames(primary)== "X2017..YR2017." | colnames(primary)== "X2021..YR2021.")

primary <- primary[rel_col] %>% 
  rename("2014" = 2, "2017" = 3, "2021" = 4)

head (primary)
transpose_p <- data.frame(t(primary[-1]))
colnames(transpose_p) <- primary[, 1]

transpose_p$Albania = as.numeric(as.character(transpose_p$Albania))
transpose_p$Australia = as.numeric(as.character(transpose_p$Australia))
transpose_p$Armenia = as.numeric(as.character((transpose_p$Armenia)))
transpose_p$Algeria = as.numeric(as.character((transpose_p$Algeria)))
transpose_p$Argentina = as.numeric(as.character((transpose_p$Argentina)))

head(transpose_p)

year <- c(2014, 2017, 2021)
ggplot(data=transpose_p, aes(x=year, y=Albania, group=1)) + geom_line() + geom_point()


#Distinguish primary data
transpose_p <- rename(transpose_p, "Albania_p" = 1, "Australia_p" = 2, "Armenia_p" = 3, "Algeria_p" = 4, "Argentia_p" = 5)
head(transpose_p)

# Get and plot secondary or more data
setwd("C:/Users/Elianna Nahandast/Desktop/R Project Data/Project_1")
secondary <- read.csv("secondary.csv")

library(tidyverse)
secondary <- secondary %>% 
  filter(Country.Name == "Albania" | Country.Name == "Australia" | Country.Name == "Armenia" | Country.Name == "Algeria" | Country.Name == "Argentina")

head(secondary)

#Eliminate unwanted columns
rel_col <- which(colnames(secondary)== "Country.Name" | colnames(secondary)== "X2014..YR2014." | colnames(secondary)== "X2017..YR2017." | colnames(secondary)== "X2021..YR2021.")

secondary <- secondary[rel_col] %>% 
  rename("2014" = 2, "2017" = 3, "2021" = 4)

head (secondary)
transpose_s <- data.frame(t(secondary[-1]))
colnames(transpose_s) <- secondary[, 1]

transpose_s$Albania = as.numeric(as.character(transpose_s$Albania))
transpose_s$Australia = as.numeric(as.character(transpose_s$Australia))
transpose_s$Armenia = as.numeric(as.character((transpose_s$Armenia)))
transpose_s$Algeria = as.numeric(as.character((transpose_s$Algeria)))
transpose_s$Argentina = as.numeric(as.character((transpose_s$Argentina)))

head(transpose_s)

# Merge the data sets together

transpose_p <- rownames_to_column(transpose_p, var="Year")
transpose_s <- rownames_to_column(transpose_s, var="Year")

acct_owner_by_education <- merge(x = transpose_p, y = transpose_s, by = "Year", all.x = TRUE)

head(acct_owner_by_education)

#Plot data

gfg_plot <- ggplot(acct_owner_by_education, aes(x=year)) +
  geom_line(aes(y = Albania_p), color = "black") +
  geom_line(aes(y = Albania), color = "gray") + 
  geom_line(aes(y = Australia_p), color = "lightblue") +
  geom_line(aes(y = Australia), color = "blue") +
  geom_line(aes(y = Armenia_p), color = "purple") +
  geom_line(aes(y = Armenia), color = "pink") +
  geom_line(aes(y = Algeria), color = "orange") +
  geom_line(aes(y = Algeria_p), color = "red") +
  geom_line(aes(y = Argentina), color = "darkgreen") +
  geom_line(aes(y = Argentia_p), color = "green")

gfg_plot


# Change y - axis
gfg_plot + labs(y = "Percentage Education")

NA
NA

Summary of Data:

The previous graph includes the data across five countries (Albania, Australia, Armenia, Algeria, and Argentina) and their percentages in account ownership of those having a primary education or less and those having a secondary education or higher. At a glance, there doesn’t seem to be as much direct correlation as seen with the previous graph, however there are some notable observations.

Across most countries, while in previous years account ownership within these two groups remained at distinctly differing percentages, in the recent years it has equaled to around the same percentage in both groups of education. For example, this is seen in Algeria, Argentina, and Armenia. Having this in mind, we can see the two countries that fall from this trend are Australia and Albania, both of which have a large difference in ownership percentages between these two groups.

In both Albania and Australia, higher account ownership percentages were present among those having secondary education or higher.

As mentioned previously, this data can also be cherry-picked in various ways especially in disregarding certain years. Over time, each educated group increased and decreased in some way in regards to their account ownership levels, so omitting certain years could easily be done to show the desired results.

For example, looking at Armenia’s primary education or less population, should the graph only include years 2014 - 2017 it could be said this group was less likely to have account ownership at a financial Institution. However, if we looked at the year 2017 - 2021, it could be said this group had a higher likelihood of having account ownership.

Another notable observation, is that there isn’t a large indication of higher probability of one educational group having account ownership over the other. In some countries, those having primary or less were seen to have higher account owner and in others those of secondary or higher were seen to have higher account ownership. Even across time, both educational groups increased and decreased in percentages. There would likely have to be more background information provided or gained in order to make any real conclusions/reasoning behind the data.

---
title: "Project_1"
output: html_notebook
---

```{r}
setwd("C:/Users/Elianna Nahandast/Desktop/R Project Data/Project_1")
female <- read.csv("female.csv")

# Determines if R considers female to be a data frame
is.data.frame(female)

# Can now use head to see what female looks like
head(female)

# Isolates 3 columns from the original data set
library(tidyverse)
female <- female %>%
  filter(Country.Name == "Brazil" | Country.Name == "India" | Country.Name == "United States")

head(female)

#Eliminate unwanted columns
colnames(female)
rel_col <- which(colnames(female)== "Country.Name" | colnames(female)== "X2014..YR2014." | colnames(female)== "X2017..YR2017." | colnames(female)== "X2021..YR2021.")
rel_col

female <- female[rel_col] %>% rename("2014" = 2, "2017" = 3, "2021" = 4)
head(female)

#Flip rows and columns
transpose_f <- data.frame(t(female[-1]))
colnames(transpose_f) <- female[, 1]
head(transpose_f)

#tells us what type the data is being stored as. In this case they are characters meaning computations with them will fail
#So we must turn them into numeric values
print(sapply(transpose_f, class))

transpose_f$Brazil = as.numeric(transpose_f$Brazil)
transpose_f$India = as.numeric(transpose_f$India)
transpose_f$"United States" = as.numeric(transpose_f$"United States")
head(transpose_f)

#Can now be summarized (compute the mean)
#summarise_each(transpose_f, list(mean))

transpose_f %>% summarise(across(c(Brazil, India, "United States"), list(mean=mean, sd=sd)))

#Graph data
year <- c(2014, 2017, 2021)
ggplot(data=transpose_f, aes(x=year, y=India, group=1)) + geom_line() + geom_point()

################################ Condensed Version for Male Data ##################################################
setwd("C:/Users/Elianna Nahandast/Desktop/R Project Data/Project_1")
male <- read.csv("male.csv")
male <- male %>% 
  filter(Country.Name == "Brazil" | Country.Name == "India" | Country.Name == "United States")

male <- male[rel_col] %>% 
  rename("2014" = 2, "2017" = 3, "2021" = 4)

transpose_m <- data.frame(t(male[-1]))
colnames(transpose_m) <- male[, 1]

transpose_m$Brazil = as.numeric(as.character(transpose_m$Brazil))
transpose_m$India = as.numeric(as.character(transpose_m$India))
transpose_m$"United States" = as.numeric(as.character((transpose_m$"United States")))

head(transpose_m)

#Distinguish between male and female data
transpose_m <- rename(transpose_m, "Brazil_m" = 1, "India_m" = 2, "United_States_m" = 3)

head(transpose_m)

#Merge male and female data sets
transpose_m <- rownames_to_column(transpose_m, var="Year")
transpose_f <- rownames_to_column(transpose_f, var="Year")

acct_owner_by_gender <- merge(x = transpose_m, y = transpose_f, by = "Year", all.x = TRUE)
acct_owner_by_gender <- rename(acct_owner_by_gender, "United_States" = 7)

head(acct_owner_by_gender)

#Plot data
gfg_plot <- ggplot(acct_owner_by_gender, aes(x=year)) +
  geom_line(aes(y = India), color = "black") +
  geom_line(aes(y = India_m), color = "red") + 
  geom_line(aes(y = Brazil), color = "green") +
  geom_line(aes(y = Brazil_m), color = "blue") +
  geom_line(aes(y = United_States), color = "purple") +
  geom_line(aes(y = United_States_m), color = "violet") 

gfg_plot

###################### Project 1 ############################

# (1) Change y-axis from "India" to "Percentage Ownership"
gfg_plot + labs(y = "Percentage Ownership")
```
 (2) Discuss percentage change over time
 
Of the 3 countries a higher account ownership among men more likely, especially within
Brazil and India. However, notably in the United States, there is no clear higher 
ownership between the two genders as in some years it is higher among females while in
others it is equally higher among males. Also, notably in India, as of the recent
years this gap in ownership between the two genders has slowly closed. Whereas in Brazil
it has stayed relatively the same over the years.

As for account ownership overall, many countries demonstrate a rise in ownership especially
among females. The only areas in contradiction with this trend are the male 
populations of both India and the United States, which have demonstrated a gradual
decrease in account ownership. However, even with this decline, account ownership 
percentages of 2021 remain across all countries, for the most part, higher than those of
2014.

 (3) Is it possible to say that account percentages have been increasing if you disregard 
     a certain year? Does this change if you focus on a particular
     particular country and gender combinations?

Across all countries, there is a noticeable sharp turning point around 2017 where
each country either spiked in account ownership or declined heavily. In this case, 
disregarding years after 2017 could make it seem as though account ownership 
in a country was more likely when in reality it was less likely at a point in time
and vice versus.

There are also many cases in each country where omitting certain years could 
produce the idea of a more probable account ownership percentage
For example:
In India, disregarding 2016 - 2018
In Brazil, disregarding 2016 - 2018
In the United States 2015 - 2018 (Although this would be more of a constant rate)

These results could also be replicated with different gender and country
combinations, for example only using the higher ownership genders for each country:
the females of the United States, females of India (don't have a sharp decline
like the men do although their percentage is lower across the years in comparison), 
and males of Brazil. These combinations could create the allusions of a higher likelihood of
account ownership percentages among certain genders.

```{r}
#################### Project 1 (Continued) #############################
# (4) Analysis of csv files

# Get and Plot primary or less percentage rates
setwd("C:/Users/Elianna Nahandast/Desktop/R Project Data/Project_1")
primary <- read.csv("primary.csv")

library(tidyverse)
primary <- primary %>% 
  filter(Country.Name == "Albania" | Country.Name == "Australia" | Country.Name == "Armenia" | Country.Name == "Algeria" | Country.Name == "Argentina")

head(primary)

#Eliminate unwanted columns
rel_col <- which(colnames(primary)== "Country.Name" | colnames(primary)== "X2014..YR2014." | colnames(primary)== "X2017..YR2017." | colnames(primary)== "X2021..YR2021.")

primary <- primary[rel_col] %>% 
  rename("2014" = 2, "2017" = 3, "2021" = 4)

head (primary)
transpose_p <- data.frame(t(primary[-1]))
colnames(transpose_p) <- primary[, 1]

transpose_p$Albania = as.numeric(as.character(transpose_p$Albania))
transpose_p$Australia = as.numeric(as.character(transpose_p$Australia))
transpose_p$Armenia = as.numeric(as.character((transpose_p$Armenia)))
transpose_p$Algeria = as.numeric(as.character((transpose_p$Algeria)))
transpose_p$Argentina = as.numeric(as.character((transpose_p$Argentina)))

head(transpose_p)

year <- c(2014, 2017, 2021)
ggplot(data=transpose_p, aes(x=year, y=Albania, group=1)) + geom_line() + geom_point()

#Distinguish primary data
transpose_p <- rename(transpose_p, "Albania_p" = 1, "Australia_p" = 2, "Armenia_p" = 3, "Algeria_p" = 4, "Argentia_p" = 5)
head(transpose_p)

# Get and plot secondary or more data
setwd("C:/Users/Elianna Nahandast/Desktop/R Project Data/Project_1")
secondary <- read.csv("secondary.csv")

library(tidyverse)
secondary <- secondary %>% 
  filter(Country.Name == "Albania" | Country.Name == "Australia" | Country.Name == "Armenia" | Country.Name == "Algeria" | Country.Name == "Argentina")

head(secondary)

#Eliminate unwanted columns
rel_col <- which(colnames(secondary)== "Country.Name" | colnames(secondary)== "X2014..YR2014." | colnames(secondary)== "X2017..YR2017." | colnames(secondary)== "X2021..YR2021.")

secondary <- secondary[rel_col] %>% 
  rename("2014" = 2, "2017" = 3, "2021" = 4)

head (secondary)
transpose_s <- data.frame(t(secondary[-1]))
colnames(transpose_s) <- secondary[, 1]

transpose_s$Albania = as.numeric(as.character(transpose_s$Albania))
transpose_s$Australia = as.numeric(as.character(transpose_s$Australia))
transpose_s$Armenia = as.numeric(as.character((transpose_s$Armenia)))
transpose_s$Algeria = as.numeric(as.character((transpose_s$Algeria)))
transpose_s$Argentina = as.numeric(as.character((transpose_s$Argentina)))

head(transpose_s)

# Merge the data sets together

transpose_p <- rownames_to_column(transpose_p, var="Year")
transpose_s <- rownames_to_column(transpose_s, var="Year")

acct_owner_by_education <- merge(x = transpose_p, y = transpose_s, by = "Year", all.x = TRUE)

head(acct_owner_by_education)

#Plot data

gfg_plot <- ggplot(acct_owner_by_education, aes(x=year)) +
  geom_line(aes(y = Albania_p), color = "black") +
  geom_line(aes(y = Albania), color = "gray") + 
  geom_line(aes(y = Australia_p), color = "lightblue") +
  geom_line(aes(y = Australia), color = "blue") +
  geom_line(aes(y = Armenia_p), color = "purple") +
  geom_line(aes(y = Armenia), color = "pink") +
  geom_line(aes(y = Algeria), color = "orange") +
  geom_line(aes(y = Algeria_p), color = "red") +
  geom_line(aes(y = Argentina), color = "darkgreen") +
  geom_line(aes(y = Argentia_p), color = "green")

gfg_plot

# Change y - axis
gfg_plot + labs(y = "Percentage Education")

```

Summary of Data:

The previous graph includes the data across five countries (Albania, Australia, Armenia, 
Algeria, and Argentina) and their percentages in account ownership
of those having a primary education or less and those having a secondary 
education or higher. At a glance, there doesn't seem to be as much direct correlation
as seen with the previous graph, however there are some notable observations.

Across most countries, while in previous years account ownership within these two
groups remained at distinctly differing percentages, in the recent years it has equaled to
around the same percentage in both groups of education. For example, this is seen in Algeria, Argentina, 
and Armenia. Having this in mind, we can see the two countries that fall from this trend are 
Australia and Albania, both of which have a large difference in ownership percentages between 
these two groups.

In both Albania and Australia, higher account ownership percentages were present among those having
secondary education or higher.

As mentioned previously, this data can also be cherry-picked in various ways especially
in disregarding certain years. Over time, each educated group increased and
decreased in some way in regards to their account ownership levels, so omitting certain years
could easily be done to show the desired results.

For example, looking at Armenia's primary education or less population, should the graph
only include years 2014 - 2017 it could be said this group was less likely to have account 
ownership at a financial Institution. However, if we looked at the year 2017 - 2021, it could be 
said this group had a higher likelihood of having account ownership.

Another notable observation, is that there isn't a large indication of higher probability
of one educational group having account ownership over the other. In some countries, those having 
primary or less were seen to have higher account owner and in others those of secondary or
higher were seen to have higher account ownership. Even across time, both educational groups increased
and decreased in percentages. There would likely have to be more background information provided or 
gained in order to make any real conclusions/reasoning behind the data.
