Assignment - 2nd session

QUESTION 1

nettle %>%  
  filter(Country %in% c('Uganda', 'Yemen'))

This code says in “nettle” which is the assigned object for the csv file, to conduct filter only “Country”. Then “in” “Country” to only show character vectors “Uganda” and “Yemen”.

nettle %>% 
  filter(Langs > 100, Population < median(Population))

This code says in “nettle” to filter with two “and” conditions which is produced by the , specifically, the code wants to filter in “Langs” column for values greater than 100. Also, in the “Population” column for values which is lower than the median of the “Population” column

nettle %>% 
  mutate(MGS_cat = ifelse(MGS < 6, 'non-fertile', 'fertile')) %>% 
  group_by(MGS_cat) %>% 
  summarize(value = mean(Langs))

This code says to create new variables through mutate command, specifically, it says to create a new column that is named “MGS_cat”
The ifelse function has three arguments (test, yes, no) = the first part is the ‘test’ or basically the conditions which we want to test. The second argument ‘yes’ means if the ‘test’ passes then it does this function. The third argument ‘no’ means if the ‘test’ fails then it does this function. Hence in this function, we are saying that if “MGS_cat is less than”6” then it considered “non fertile”, otherwise it is considered as “fertile”
group_by function groups previous tables, hence it groups the result of the previous mutate function
The last function here “summarize” shows the variable named “value” which is the mean of the “Langs” column

QUESTION 2 - NOTE: The comments in the code per line explains the functions

glottolog %>% 
  filter(!is.na(macroarea)) %>%  #Filters away in "macrorea" that has no value in that column
  group_by(level, macroarea) %>% #The group_by means we group the table according to level and macroarea
  count(level, macroarea) %>% #count function here counts unique values based on the level and macroarea columns
  ggplot(aes(x = macroarea, y = n, fill = level)) + #it creates a ggplot where "macroarea" is the x-axis, and the y-axis contains the values produced in the previous codes, and fill function here to fill in missing values
  geom_bar(stat = "identity", position = "fill") + #it says the ggplot should produce a barplot, with "identity" as stat and "fill" in position as part of the aesthetic of the barplot. 
  labs(x = "Macroarea", y = "Proportion") + #labs creates labels, where x-axis is named "Macroarea" and y-axis is "Proportion"
  scale_fill_discrete(name = "Type") + #sets the color of the barplot
  scale_y_continuous(labels = scales::percent)  #sets the y scale to be continuous

  coord_flip() #changes the coordination orientation of the barplot

QUESTION 3:

library(ggplot2) #loads the ggplot2 library
emo_valence <- read.csv("warriner_2013_emotional_valence.csv") #reads the csv file

ggplot(data = emo_valence, aes(x = Val)) + geom_histogram() +
geom_vline(xintercept = mean(emo_valence$Val)) +
geom_vline(aes(xintercept = emo_valence$Val * 0.68 / 100), linetype = 2) + geom_vline(aes(xintercept = emo_valence$Val * 0.95 / 100), linetype = 2)

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Assignment - 2nd session

Glyd Jun Aranes

2025-01-28