How many sections does each language have, grouped by year?

Is there a difference in transferability degree between languages?

It seems that there isn’t.

`summarise()` has grouped output by 'language', 'Year'. You can override using the
`.groups` argument.
Warning: Specifying width/height in layout() is now deprecated.
Please specify in ggplotly() or plot_ly()
No trace type specified:
  Based on info supplied, a 'bar' trace seems appropriate.
  Read more about this trace type -> https://plotly.com/r/reference/#bar
No trace type specified:
  Based on info supplied, a 'bar' trace seems appropriate.
  Read more about this trace type -> https://plotly.com/r/reference/#bar

How are the Vietnamese sections distributed? Is there a college that takes up the bulk of it, or are they more or less evenly distributed among the 9 schools?

FALSE `summarise()` has grouped output by 'Year'. You can override using the `.groups`
FALSE argument.
FALSE No trace type specified:
FALSE   Based on info supplied, a 'bar' trace seems appropriate.
FALSE   Read more about this trace type -> https://plotly.com/r/reference/#bar
FALSE No trace type specified:
FALSE   Based on info supplied, a 'bar' trace seems appropriate.
FALSE   Read more about this trace type -> https://plotly.com/r/reference/#bar

Status of the courses”

General Education Status

There are three values in the General Education Status column, A, C, Y. Whatever does this mean??

Credit Status

A very small minority of classes are non degree applicable. Languages like Tagalog and Vietnames are always degree applicable.

---
title: "Languages at Community Colleges"
output: html_notebook
editor_options: 
  chunk_output_type: inline
---

```{r, include = F, eval = F}
library(readxl)
library(magrittr)
library(tidydr)
library(ggplot2)
library(data.table)
library(dplyr)
library(stringr)
library(plotly)


#readig the different datasets
data <- read_excel("2years_Winter_2022.xls")
data_2021 <- read.csv("data_2021.csv")
data_2022_fall <- read.csv("data_2022_fall.csv")
data_2022_winter <- read.csv("data_2022_spring.csv")
data_2022_spring <- read.csv("data_2022_spring.csv")
data_2022_summer <- read.csv("data_2022_summer.csv")
data_2023_winter <- read.csv("data_2023_winter.csv")
data_2023_spring <- read.csv("data_2023_spring.csv")
dim(data_2023_winter)

#creating the year and session column and attaching it

year_2021 <- data_2021 %>% 
    mutate(Year = rep(c("2021"), times = 1369))

year_2022_fall <- data_2022_fall %>% 
    mutate(Year = rep(c("2022 Fall"), times = 1355))

year_2022_winter <- data_2022_winter %>% 
    mutate(Year = rep(c("2022 Winter"), times = 1487))

year_2022_spring <- data_2022_spring %>% 
    mutate(Year = rep(c("2022 Spring"), times = 1487))

year_2022_summer <- data_2022_summer %>% 
    mutate(Year = rep(c("2022 Summer"), times = 466))

year_2023_spring <- data_2023_spring %>% 
    mutate(Year = rep(c("2023 Spring"), times = 288))
    
year_2023_Winter <- data_2023_winter %>% 
    mutate(Year = rep(c("2023 Winter"), times = 115))
    

#merging the dataframes 
overall <- rbind(year_2021, year_2022_fall, year_2022_spring, year_2022_summer, year_2022_winter, year_2023_Winter, year_2023_spring)
#basic R here seems to be working better than the other methods when the columbs have the same rows. 
overall
colnames(overall)

```


# How many sections does each language have, grouped by year?

```{r, warning = F, echo = F, message = F}

#exploring the data

overall %>% 
filter(str_detect(overall$Course.ID, "VIET")) %>% 
group_by(Year) %>% 
summarize(n = n_distinct(College)) #around 9

overall %>% 
    filter(str_detect(overall$Course.ID, "TAGA")) %>% 
group_by(Year) %>% 
summarize(n = n_distinct(College)) #just 1

overall %>% filter(str_detect(overall$Course.ID, "PR") | str_detect(overall$Course.ID, "PER")) %>% 
group_by(Year) %>% 
summarize(n = n_distinct(College)) # around 4

# how do these languages compare to the others. Counting the sessions

as.integer(overall$Sections.Count)
mutated <- overall %>% 
    mutate(language = case_when(str_detect(overall$TOP.Code, "Vietnames") ~ "Vietnamese", 
            str_detect(overall$TOP.Code, "Fren") ~ "French", 
             str_detect(overall$TOP.Code, "Ger") ~ "German", 
           str_detect(overall$TOP.Code, "Span") ~ "Spanish", 
           str_detect(overall$TOP.Code, "Chin") ~ "Chinese", 
           str_detect(overall$TOP.Code, "Jap") ~ "Japanese", 
           str_detect(overall$TOP.Code, "Arab") ~ "Arabic", 
           str_detect(overall$TOP.Code, "Ital") ~ "Italian", 
           str_detect(overall$Course.Title, "PERS") ~ "Persian",
           str_detect(overall$TOP.Code, "Rus") ~ "Russian",
           str_detect(overall$TOP.Code, "Kor") ~ "Korean", 
           str_detect(overall$Course.ID, "TAGA") ~ "Tagalog", 
           )) %>% 
    filter(language != "NA") %>% 
    group_by(language, Year) %>% 
    summarize(sum = sum(as.integer(Sections.Count, na.rm = T))) %>% 
    ungroup() %>% 
    plot_ly(x = ~Year, y = ~sum, color = ~language, colors = ~ "Paired", 
            type = "scatter", mode = "lines") %>% 
    layout(autosize = F, height = 600, width = 800, 
           xaxis = list(title = "year"), 
               yaxis = list(title = "Number of Sections"), title = "Distribution of sections by language")

mutated

```

# Is there a difference in transferability degree between languages?

It seems that there isn't. 

```{r, echo = F, warning = F, message=FALSE}

overall_languages <-  overall %>% 
    mutate(language = case_when(str_detect(overall$TOP.Code, "Vietnames") ~ "Vietnamese", 
            str_detect(overall$TOP.Code, "Fren") ~ "French", 
             str_detect(overall$TOP.Code, "Ger") ~ "German", 
           str_detect(overall$TOP.Code, "Span") ~ "Spanish", 
           str_detect(overall$TOP.Code, "Chin") ~ "Chinese", 
           str_detect(overall$TOP.Code, "Jap") ~ "Japanese", 
           str_detect(overall$TOP.Code, "Arab") ~ "Arabic", 
           str_detect(overall$TOP.Code, "Ital") ~ "Italian", 
           str_detect(overall$Course.Title, "PERS") ~ "Persian",
           str_detect(overall$TOP.Code, "Rus") ~ "Russian",
           str_detect(overall$TOP.Code, "Kor") ~ "Korean", 
           str_detect(overall$Course.ID, "TAGA") ~ "Tagalog"))


overall_languages %>% 
    filter(language %in% c("Spanish", "Tagalog", "Vietnamese", "French", "Japanese")) %>% 
    group_by(language, Year, Transfer.Status) %>% 
    summarize(n = sum(Sections.Count, na.rm = T)) %>% 
    plot_ly(x = ~Year, y = ~n, color = ~language, split = ~Transfer.Status) %>% 
    layout(autosize = F, height = 800, width = 1000, 
           xaxis = list(title = "year"), 
               yaxis = list(title = "Number of Sections"), title = "Transferability Status by Language")
    

```

# How are the Vietnamese sections distributed? Is there a college that takes up the bulk of it, or are they more or less evenly distributed among the 9 schools?

```{r, echo = F, warning = F, comment = F}

overall_languages %>% 
    filter(language == "Vietnamese") %>% 
    group_by(Year, College) %>% 
    summarize(sum = sum(as.integer(Sections.Count, na.rm = T))) %>% 
    plot_ly(x = ~Year, y = ~sum, color = ~College, colors = "Paired") %>% 
        layout(autosize = F, height = 800, width = 800, xaxis = list(title = "year"), 
               yaxis = list(title = "Number of Sections"), title = "Distribution of sections by College") 
    

```


# Status of the courses"

## General Education Status

There are three values in the General Education Status column, A, C, Y. Whatever does this mean??

## Credit Status

A very small minority of classes are non degree applicable. Languages like Tagalog and Vietnames are always degree applicable. 

```{r, echo = F, include = F}

#how many levels are there in the general education column
#As Ivy what A, C, Y means. 
levels(as.factor(overall_languages$General.Education.Status))
levels(as.factor(overall_languages$Credit.Status))

overall_languages %>% 
    filter(language %in% c("Spanish", "Tagalog", "Vietnamese", "French", "Japanese")) %>% 
    group_by(language, Year, Credit.Status) %>% 
    summarize(n = sum(Sections.Count, na.rm = T)) %>% 
    plot_ly(x = ~Year, y = ~n, color = ~language, split = ~Credit.Status) %>% 
    layout(autosize = F, height = 800, width = 1000)

```


