Home assignment for the R-course.
Table 1. Summary for PhD students in three Swedish Universities.
library(ggplot2)
phds <- read.table("C:/Users/kristin/Documents/Rverkefni/phds2.txt", header = T,
sep = ",")
summary(phds)
## university subject sex age year
## Lund :1140 humanities :1140 f:1710 -24 :684 Min. :1973
## Malmö :1140 science :1140 m:1710 25-29:684 1st Qu.:1982
## Uppsala:1140 social_science:1140 30-34:684 Median :1992
## 35-39:684 Mean :1992
## 40+ :684 3rd Qu.:2001
## Max. :2010
## nbr_of_phds
## Min. : 0.0
## 1st Qu.: 0.0
## Median : 8.0
## Mean : 18.5
## 3rd Qu.: 27.0
## Max. :202.0
Image 1. Boxplot for the number of PhD students for each university (Lund, Malmö and Uppsala) through the years 1973-2010.
library(ggplot2)
phds <- read.table("C:/Users/kristin/Documents/Rverkefni/phds2.txt", header = T,
sep = ",")
qplot(factor(university), nbr_of_phds, data = phds, geom = "boxplot")
The boxplot shows the variance for the variable 'number of PhD students' for each of the three universities. From the graph it can be seen that there are hardly any PhD students at Malmö University through the years 1973-2010. However, the variance is rather similar between the other two universities, Lund and Uppsala.
The boxplot only represents the 'count of nbr.of.phds' for the three universities, e.g. the count of 'zero PhD students' each year for Malmö University is a lot more compared to the other two universities. Thus, the boxplot isn't that informative, and instead it is ideal to run a bargraph where the height of the bars will represent the values of the variable 'nbr.of.phds'.
Image 2. Graph showing the count of number of PhD students for each university, divided by gender.
library(ggplot2)
phds <- read.table("C:/Users/kristin/Documents/Rverkefni/phds2.txt", header = T,
sep = ",")
ggplot(data = phds, aes(x = university, y = nbr_of_phds, fill = sex)) + geom_bar(stat = "identity",
position = position_dodge()) + scale_fill_manual(values = c("#D2691E", "#008B8B")) +
xlab("Universities") + ylab("Number of PhD students") + ggtitle("Number of PhD students for each university, divided by gender.")
The bargraph shows that through the years 1973-2010 there has generally been more male PhD students than female ones at the three universities. Uppsala Uni had slightly more male and female students compared to lund Uni (and Malmö Uni). Additionally, Malmö Uni has a very low number of PhD students.
Let's look into the data from Malmö University.
Table 2. Subset for Malmö University created.
library(ggplot2)
phds <- read.table("C:/Users/kristin/Documents/Rverkefni/phds2.txt", header = T,
sep = ",")
phds_Malmö <- phds[phds$university == "Malmö", ]
Image 3. Line graph showing the total number of PhD students at Malmö University per year.
library(ggplot2)
phds <- read.table("C:/Users/kristin/Documents/Rverkefni/phds2.txt", header = T,
sep = ",")
qplot(year, nbr_of_phds, fill = sex, data = phds_Malmö, geom = c("line", "point"),
colour = sex, stat = "summary", fun.y = sum) + scale_colour_manual(values = c("#D2691E",
"#008B8B")) + xlab("Year") + ylab("Total number of PhD students") + ggtitle("Total number of PhD students for Malmö University.")
The graph shows that there were only 5 PhD students enrolled at the university from the year 2009 - 2010, and when the number of PhD students in Malmö Uni is plotted against 'subject' (see next image) we can see that all 5 of them were studying social science, 2 females and 3 males.
OBS. The graph only shows one dot(colour) for the year 2010 instead of two (female and male student).
Image 4. Graph showing the number of PhD students at Malmö University, divided by subject and gender
library(ggplot2)
phds <- read.table("C:/Users/kristin/Documents/Rverkefni/phds2.txt", header = T,
sep = ",")
qplot(subject, nbr_of_phds, fill = sex, data = phds_Malmö, geom = c("line",
"point"), colour = sex, stat = "summary", fun.y = sum) + scale_colour_manual(values = c("#D2691E",
"#008B8B")) + xlab("Subject") + ylab("Total number of PhD students") + ggtitle("The total number of PhD students for Malmö University.")
## geom_path: Each group consist of only one observation. Do you need to
## adjust the group aesthetic?
The number of phd students for each university differs between various subjects as well, which are Humanities, Science and Social science.
Image 5. Graph showing the number of Phd students within each subject; Humanities, Science and Social Science.
library(ggplot2)
phds <- read.table("C:/Users/kristin/Documents/Rverkefni/phds2.txt", header = T,
sep = ",")
ggplot(data = phds, aes(x = university, y = nbr_of_phds, fill = subject)) +
geom_bar(stat = "identity", position = position_dodge()) + scale_fill_manual(values = c("#D2691E",
"#008B8B", "#FFA500")) + xlab("Universities") + ylab("Number of PhD students") +
ggtitle("Number of PhD students for each university, divided by subject.")
The bargraph shows that through the years has science been the most popular subject (it has more students registered every year). Lund and Uppsala seem to have an equal number of PhD students within science per year, but Lund has more students within social science, while Uppsala has more students within humanities.
Image 6. Graph showing the TOTAL number of PhD students in each subject.
library(ggplot2)
phds <- read.table("C:/Users/kristin/Documents/Rverkefni/phds2.txt", header = T,
sep = ",")
qplot(subject, nbr_of_phds, fill = sex, data = phds, geom = c("line", "point"),
colour = sex, stat = "summary", fun.y = sum) + scale_colour_manual(values = c("#D2691E",
"#008B8B")) + xlab("Subject") + ylab("Total number of PhD students") + ggtitle("The total number of PhD students for the three Universities.")
## geom_path: Each group consist of only one observation. Do you need to
## adjust the group aesthetic?
The graph shows that science is most popular within male academics, then comes social science and then humanities. For female academics it is humanities, then science and then social science.
Let's see how the number of PhDs is spread through the years for each university.
Image 7. Bargraph for the number of PhD students at the three universities during the years 1973-2010.
library(ggplot2)
phds <- read.table("C:/Users/kristin/Documents/Rverkefni/phds2.txt", header = T,
sep = ",")
ggplot(data = phds, aes(x = year, y = nbr_of_phds, fill = university)) + geom_bar(stat = "identity",
position = position_dodge()) + scale_fill_manual(values = c("#D2691E", "#008B8B",
"#FFA500")) + xlab("1973-2010") + ylab("Number of PhD students") + ggtitle("Number of PhD students for each university through the years 1973-2010")
Lund and Uppsala have similar fluctuation of number of students per year. The number is highest in the beginning (around 1973 and up). There is a sudden decrease in number of PhD students around 1995 at Lund Uni and Uppsala Uni, especially at Uppsala Uni. It is also visible that the number of PhD students has been decreasing for the past 10 years (2010 and below), however Malmö University has received its first PhD students in the past two years.
Here is a better graph showing the total number of the PhD students (for Lund and Uppsala), but first let's create a subset for the Lund and Uppsala students.
Table 3. Subset for Lund University created.
library(ggplot2)
phds <- read.table("C:/Users/kristin/Documents/Rverkefni/phds2.txt", header = T,
sep = ",")
phds_Lund <- phds[phds$university == "Lund", ]
Image 8. Total number of students for the Lund University from 1973-2010.
library(ggplot2)
phds <- read.table("C:/Users/kristin/Documents/Rverkefni/phds2.txt", header = T,
sep = ",")
qplot(year, nbr_of_phds, fill = sex, data = phds_Lund, geom = c("line", "point"),
colour = sex, stat = "summary", fun.y = sum) + scale_colour_manual(values = c("#D2691E",
"#008B8B")) + xlab("Year") + ylab("Total number of PhD students") + ggtitle("The total number of PhD students for Lund University.")
The graph shows that the number of male academics had been quite stable through the years '85 - 95' whereas the number of female academics increased. Then around the year 97' the number started to decrease for both gender, but has been slowely increasing again in the resent years. More importantly, the graph shows that the number of male and female academics has been aligning towards to each other, creating almost an equal number of academics of both gender.
Table 4. Subset for Uppsala University created.
library(ggplot2)
phds <- read.table("C:/Users/kristin/Documents/Rverkefni/phds2.txt", header = T,
sep = ",")
phds_Uppsala <- phds[phds$university == "Uppsala", ]
Image 9. Total number of students for the Uppsala University from 1973-2010.
library(ggplot2)
phds <- read.table("C:/Users/kristin/Documents/Rverkefni/phds2.txt", header = T,
sep = ",")
qplot(year, nbr_of_phds, fill = sex, data = phds_Uppsala, geom = c("line", "point"),
colour = sex, stat = "summary", fun.y = sum) + scale_colour_manual(values = c("#D2691E",
"#008B8B")) + xlab("Year") + ylab("Total number of PhD students") + ggtitle("The total number of PhD students for Uppsala University.")
The graph shows, just like Lund University, that the number of male academics had been quite stable through the years '85 - 95' whereas the number of female academics increased. It also shows a suddent drop in number of male and female academics around the year '93, especially for male academics (it drops down by almost 200 males). However the number increases again and just like for Lund University then the number of male and female academics is slowely becoming an equal number.
Ask: How can I merge these two graphs into one? Or have them side by side (matrix) for a better visual comparison?