R Lab 3 in Introductory Statistics for the Behavioral Sciences.

Student: Kristín Ósk Ingvarsdóttir

Home assignment for the R-course.

Table 1. Summary for PhD students in three Swedish Universities.

library(ggplot2)
phds <- read.table("C:/Users/kristin/Documents/Rverkefni/phds2.txt", header = T, 
    sep = ",")
summary(phds)
##    university             subject     sex         age           year     
##  Lund   :1140   humanities    :1140   f:1710   -24  :684   Min.   :1973  
##  Malmö  :1140   science       :1140   m:1710   25-29:684   1st Qu.:1982  
##  Uppsala:1140   social_science:1140            30-34:684   Median :1992  
##                                                35-39:684   Mean   :1992  
##                                                40+  :684   3rd Qu.:2001  
##                                                            Max.   :2010  
##   nbr_of_phds   
##  Min.   :  0.0  
##  1st Qu.:  0.0  
##  Median :  8.0  
##  Mean   : 18.5  
##  3rd Qu.: 27.0  
##  Max.   :202.0

Image 1. Boxplot for the number of PhD students for each university (Lund, Malmö and Uppsala) through the years 1973-2010.

library(ggplot2)
phds <- read.table("C:/Users/kristin/Documents/Rverkefni/phds2.txt", header = T, 
    sep = ",")
qplot(factor(university), nbr_of_phds, data = phds, geom = "boxplot")

plot of chunk unnamed-chunk-2

The boxplot shows the variance for the variable 'number of PhD students' for each of the three universities. From the graph it can be seen that there are hardly any PhD students at Malmö University through the years 1973-2010. However, the variance is rather similar between the other two universities, Lund and Uppsala.

The boxplot only represents the 'count of nbr.of.phds' for the three universities, e.g. the count of 'zero PhD students' each year for Malmö University is a lot more compared to the other two universities. Thus, the boxplot isn't that informative, and instead it is ideal to run a bargraph where the height of the bars will represent the values of the variable 'nbr.of.phds'.

Image 2. Graph showing the count of number of PhD students for each university, divided by gender.

library(ggplot2)
phds <- read.table("C:/Users/kristin/Documents/Rverkefni/phds2.txt", header = T, 
    sep = ",")
ggplot(data = phds, aes(x = university, y = nbr_of_phds, fill = sex)) + geom_bar(stat = "identity", 
    position = position_dodge()) + scale_fill_manual(values = c("#D2691E", "#008B8B")) + 
    xlab("Universities") + ylab("Number of PhD students") + ggtitle("Number of PhD students for each university, divided by gender.")

plot of chunk unnamed-chunk-3

The bargraph shows that through the years 1973-2010 there has generally been more male PhD students than female ones at the three universities. Uppsala Uni had slightly more male and female students compared to lund Uni (and Malmö Uni). Additionally, Malmö Uni has a very low number of PhD students.

Let's look into the data from Malmö University.

Table 2. Subset for Malmö University created.

library(ggplot2)
phds <- read.table("C:/Users/kristin/Documents/Rverkefni/phds2.txt", header = T, 
    sep = ",")
phds_Malmö <- phds[phds$university == "Malmö", ]

Image 3. Line graph showing the total number of PhD students at Malmö University per year.

library(ggplot2)
phds <- read.table("C:/Users/kristin/Documents/Rverkefni/phds2.txt", header = T, 
    sep = ",")
qplot(year, nbr_of_phds, fill = sex, data = phds_Malmö, geom = c("line", "point"), 
    colour = sex, stat = "summary", fun.y = sum) + scale_colour_manual(values = c("#D2691E", 
    "#008B8B")) + xlab("Year") + ylab("Total number of PhD students") + ggtitle("Total number of PhD students for Malmö University.")

plot of chunk unnamed-chunk-5

The graph shows that there were only 5 PhD students enrolled at the university from the year 2009 - 2010, and when the number of PhD students in Malmö Uni is plotted against 'subject' (see next image) we can see that all 5 of them were studying social science, 2 females and 3 males.

OBS. The graph only shows one dot(colour) for the year 2010 instead of two (female and male student).

Image 4. Graph showing the number of PhD students at Malmö University, divided by subject and gender

library(ggplot2)
phds <- read.table("C:/Users/kristin/Documents/Rverkefni/phds2.txt", header = T, 
    sep = ",")
qplot(subject, nbr_of_phds, fill = sex, data = phds_Malmö, geom = c("line", 
    "point"), colour = sex, stat = "summary", fun.y = sum) + scale_colour_manual(values = c("#D2691E", 
    "#008B8B")) + xlab("Subject") + ylab("Total number of PhD students") + ggtitle("The total number of PhD students for Malmö University.")
## geom_path: Each group consist of only one observation. Do you need to
## adjust the group aesthetic?

plot of chunk unnamed-chunk-6

The number of phd students for each university differs between various subjects as well, which are Humanities, Science and Social science.

Image 5. Graph showing the number of Phd students within each subject; Humanities, Science and Social Science.

library(ggplot2)
phds <- read.table("C:/Users/kristin/Documents/Rverkefni/phds2.txt", header = T, 
    sep = ",")
ggplot(data = phds, aes(x = university, y = nbr_of_phds, fill = subject)) + 
    geom_bar(stat = "identity", position = position_dodge()) + scale_fill_manual(values = c("#D2691E", 
    "#008B8B", "#FFA500")) + xlab("Universities") + ylab("Number of PhD students") + 
    ggtitle("Number of PhD students for each university, divided by subject.")

plot of chunk unnamed-chunk-7

The bargraph shows that through the years has science been the most popular subject (it has more students registered every year). Lund and Uppsala seem to have an equal number of PhD students within science per year, but Lund has more students within social science, while Uppsala has more students within humanities.

Image 6. Graph showing the TOTAL number of PhD students in each subject.

library(ggplot2)
phds <- read.table("C:/Users/kristin/Documents/Rverkefni/phds2.txt", header = T, 
    sep = ",")
qplot(subject, nbr_of_phds, fill = sex, data = phds, geom = c("line", "point"), 
    colour = sex, stat = "summary", fun.y = sum) + scale_colour_manual(values = c("#D2691E", 
    "#008B8B")) + xlab("Subject") + ylab("Total number of PhD students") + ggtitle("The total number of PhD students for the three Universities.")
## geom_path: Each group consist of only one observation. Do you need to
## adjust the group aesthetic?

plot of chunk unnamed-chunk-8

The graph shows that science is most popular within male academics, then comes social science and then humanities. For female academics it is humanities, then science and then social science.

Let's see how the number of PhDs is spread through the years for each university.

Image 7. Bargraph for the number of PhD students at the three universities during the years 1973-2010.

library(ggplot2)
phds <- read.table("C:/Users/kristin/Documents/Rverkefni/phds2.txt", header = T, 
    sep = ",")
ggplot(data = phds, aes(x = year, y = nbr_of_phds, fill = university)) + geom_bar(stat = "identity", 
    position = position_dodge()) + scale_fill_manual(values = c("#D2691E", "#008B8B", 
    "#FFA500")) + xlab("1973-2010") + ylab("Number of PhD students") + ggtitle("Number of PhD students for each university through the years 1973-2010")

plot of chunk unnamed-chunk-9

Lund and Uppsala have similar fluctuation of number of students per year. The number is highest in the beginning (around 1973 and up). There is a sudden decrease in number of PhD students around 1995 at Lund Uni and Uppsala Uni, especially at Uppsala Uni. It is also visible that the number of PhD students has been decreasing for the past 10 years (2010 and below), however Malmö University has received its first PhD students in the past two years.

Here is a better graph showing the total number of the PhD students (for Lund and Uppsala), but first let's create a subset for the Lund and Uppsala students.

Table 3. Subset for Lund University created.

library(ggplot2)
phds <- read.table("C:/Users/kristin/Documents/Rverkefni/phds2.txt", header = T, 
    sep = ",")
phds_Lund <- phds[phds$university == "Lund", ]

Image 8. Total number of students for the Lund University from 1973-2010.

library(ggplot2)
phds <- read.table("C:/Users/kristin/Documents/Rverkefni/phds2.txt", header = T, 
    sep = ",")
qplot(year, nbr_of_phds, fill = sex, data = phds_Lund, geom = c("line", "point"), 
    colour = sex, stat = "summary", fun.y = sum) + scale_colour_manual(values = c("#D2691E", 
    "#008B8B")) + xlab("Year") + ylab("Total number of PhD students") + ggtitle("The total number of PhD students for Lund University.")

plot of chunk unnamed-chunk-11

The graph shows that the number of male academics had been quite stable through the years '85 - 95' whereas the number of female academics increased. Then around the year 97' the number started to decrease for both gender, but has been slowely increasing again in the resent years. More importantly, the graph shows that the number of male and female academics has been aligning towards to each other, creating almost an equal number of academics of both gender.

Table 4. Subset for Uppsala University created.

library(ggplot2)
phds <- read.table("C:/Users/kristin/Documents/Rverkefni/phds2.txt", header = T, 
    sep = ",")
phds_Uppsala <- phds[phds$university == "Uppsala", ]

Image 9. Total number of students for the Uppsala University from 1973-2010.

library(ggplot2)
phds <- read.table("C:/Users/kristin/Documents/Rverkefni/phds2.txt", header = T, 
    sep = ",")
qplot(year, nbr_of_phds, fill = sex, data = phds_Uppsala, geom = c("line", "point"), 
    colour = sex, stat = "summary", fun.y = sum) + scale_colour_manual(values = c("#D2691E", 
    "#008B8B")) + xlab("Year") + ylab("Total number of PhD students") + ggtitle("The total number of PhD students for Uppsala University.")

plot of chunk unnamed-chunk-13

The graph shows, just like Lund University, that the number of male academics had been quite stable through the years '85 - 95' whereas the number of female academics increased. It also shows a suddent drop in number of male and female academics around the year '93, especially for male academics (it drops down by almost 200 males). However the number increases again and just like for Lund University then the number of male and female academics is slowely becoming an equal number.

Ask: How can I merge these two graphs into one? Or have them side by side (matrix) for a better visual comparison?