Previously on STAT 412:

  • Descriptive Statistics
  • Visualization Techniques
    • Bar-plot and Stacked-bar-plot
    • Histogram and Density-plot
    • Box-plot
    • Scatter-plot
    • Spine-plot
    • Mosaic-plot
    • Correlation-plot
  • Additional exercise for difference btw box-plot and histogram & density-plot
  • Requirements for plots
  • Color Customization Info

Introduction to the Dataset

  • Loading the data

This week, we will use the data “m111survey” from the package “tigerstats”. Firstly, we have to install the package, then call it via library().

library(tigerstats)
data("m111survey")

We may prefer to work with data frames, therefore we have to check class of the data.

class(m111survey)
## [1] "data.frame"
dim(m111survey)
## [1] 71 12

It is a data-frame with 71 observations and 12 attributes. Now, let’s examine the class of each attribute. We have 5 numerical values, 1 integer and 6 factors with 2 or 3 levels.

str(m111survey)
## 'data.frame':    71 obs. of  12 variables:
##  $ height         : num  76 74 64 62 72 70.8 70 79 59 67 ...
##  $ ideal_ht       : num  78 76 NA 65 72 NA 72 76 61 67 ...
##  $ sleep          : num  9.5 7 9 7 8 10 4 6 7 7 ...
##  $ fastest        : int  119 110 85 100 95 100 85 160 90 90 ...
##  $ weight_feel    : Factor w/ 3 levels "1_underweight",..: 1 2 2 1 1 3 2 2 2 3 ...
##  $ love_first     : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ...
##  $ extra_life     : Factor w/ 2 levels "no","yes": 2 2 1 1 2 1 2 2 2 1 ...
##  $ seat           : Factor w/ 3 levels "1_front","2_middle",..: 1 2 2 1 3 1 1 3 3 2 ...
##  $ GPA            : num  3.56 2.5 3.8 3.5 3.2 3.1 3.68 2.7 2.8 NA ...
##  $ enough_Sleep   : Factor w/ 2 levels "no","yes": 1 1 1 1 1 2 1 2 1 2 ...
##  $ sex            : Factor w/ 2 levels "female","male": 2 2 1 1 2 2 2 2 1 1 ...
##  $ diff.ideal.act.: num  2 2 NA 3 0 NA 2 -3 2 0 ...
head(m111survey,10)
height ideal_ht sleep fastest weight_feel love_first extra_life seat GPA enough_Sleep sex diff.ideal.act.
76.0 78 9.5 119 1_underweight no yes 1_front 3.56 no male 2
74.0 76 7.0 110 2_about_right no yes 2_middle 2.50 no male 2
64.0 NA 9.0 85 2_about_right no no 2_middle 3.80 no female NA
62.0 65 7.0 100 1_underweight no no 1_front 3.50 no female 3
72.0 72 8.0 95 1_underweight no yes 3_back 3.20 no male 0
70.8 NA 10.0 100 3_overweight no no 1_front 3.10 yes male NA
70.0 72 4.0 85 2_about_right no yes 1_front 3.68 no male 2
79.0 76 6.0 160 2_about_right no yes 3_back 2.70 yes male -3
59.0 61 7.0 90 2_about_right no yes 3_back 2.80 no female 2
67.0 67 7.0 90 3_overweight no no 2_middle NA yes female 0
sum(is.na(m111survey))
## [1] 5
m111survey=na.omit(m111survey)
sum(is.na(m111survey))
## [1] 0
dim(m111survey)
## [1] 68 12

However, the data includes 5 missing observations. To have an effective visualization for this recitation, we prefer to delete (with the purpose of exemplifying) the rows including missing values. Hence, we have 68 rows with fully observed data.

Data dictionary are given below:

Data Info: Results of a survey of MAT 111 students at Georgetown College.

  • height: How tall are you, in inches?
  • ideal_ht: A numeric vector How tall would you LIKE to be, in inches?
  • sleep: How much sleep did you get last night?
  • fastest: What is the highest speed at which you have ever driven a car?
  • weight_feel: How do you feel about your weight?
  • love_first: Do you believe in love at first sight?
  • extra_life: Do you believe in extraterrestrial life?
  • seat: When you have a choice, where do you prefer to sit in a classroom?
  • GPA: What is your college GPA?
  • enough_Sleep: Do you think you get enough sleep?
  • sex: What sex are you?
  • diff.ideal.act.: Your ideal height minus your actual height.

Descriptive Stats

Before the visualization part, it is worthwhile to examine the descriptive stats.

summary(m111survey)
##      height         ideal_ht         sleep          fastest     
##  Min.   :51.00   Min.   :54.00   Min.   :2.000   Min.   : 60.0  
##  1st Qu.:65.00   1st Qu.:67.00   1st Qu.:5.000   1st Qu.: 92.5  
##  Median :68.00   Median :68.50   Median :6.500   Median :104.5  
##  Mean   :68.02   Mean   :69.99   Mean   :6.294   Mean   :106.5  
##  3rd Qu.:72.00   3rd Qu.:75.00   3rd Qu.:7.000   3rd Qu.:120.0  
##  Max.   :79.00   Max.   :90.00   Max.   :9.500   Max.   :190.0  
##         weight_feel love_first extra_life       seat         GPA       
##  1_underweight: 9   no :42     no :38     1_front :26   Min.   :1.900  
##  2_about_right:24   yes:26     yes:30     2_middle:30   1st Qu.:2.875  
##  3_overweight :35                         3_back  :12   Median :3.225  
##                                                         Mean   :3.188  
##                                                         3rd Qu.:3.553  
##                                                         Max.   :4.000  
##  enough_Sleep     sex     diff.ideal.act. 
##  no :45       female:38   Min.   :-4.000  
##  yes:23       male  :30   1st Qu.: 0.000  
##                           Median : 2.000  
##                           Mean   : 1.974  
##                           3rd Qu.: 3.000  
##                           Max.   :18.000

Let’s continue with the “Lattice Graphs”!

Visualization Techniques

Lattice Plots

We have mostly covered the plots to represent univariate and bivariate data. What about multivariate data? Lattice plots are commonly used to show the relation between multiple variables. They provide a variety of functions for creating different types of lattice plots such as scatter plots, density plots, and bivariate plots.

For lattice plots, you have to install the package lattice and then call it.

Histogram and Density-plot

RQ: Is there a significant difference in height distributions among students based on gender?

library(lattice)
library(mosaic)
favstats(~height,data=m111survey)  # to decide the number of breaks; if you use "height|sex" you can obtain the stats for height for each gender
min Q1 median Q3 max mean sd n missing
51 65 68 72 79 68.01838 5.379327 68 0
histogram(~height|factor(sex),data=m111survey,type="count",xlab="Height of Students in inches", ylab="Frequency",main="Histogram of Height",breaks=c(50,55,60,65,70,75,80)) #type=percent,count,density(default) #breaks=seq(from=50,to=80,by=5)

Highest frequency belongs to the range between 60-65 for females and 70-75 for males. Observations for male are mainly gathered around 65-75 but, females have wider spread.

Info: 65 inches = 165cm 72inches = 183cm

The same graphs but different looks:

histogram(~height|factor(sex),data=m111survey,type="count",xlab="Height of Students in inches", ylab="Frequency",main="Histogram of Height",breaks=c(50,55,60,65,70,75,80),layout=c(1,2)) #c(1,2): 1 column, 2 rows

RQ1: How does the distribution of students’ GPA vary when considering both gender and love at first sight?

RQ2: Are there noticeable patterns or differences in GPA frequencies when accounting for both gender and love at first sight?

histogram(~GPA|factor(sex)*love_first,data=m111survey,type="count",xlab="GPA of Students", ylab="Frequency",main="Histogram of GPA")

Those who don’t believe in love at first sight show varying GPAs for both males and females. On the other side, female students who believe in love at first sight are mostly successful, with a GPA higher than 2.5. Male students who believe in love at first sight are not many and scattered.

Density-plot

RQ1: How does the density of students’ heights vary based on their seating preferences, and is there a noticeable difference when considering gender?

RQ2: Are there specific patterns in height distribution within different seating arrangements, and do these patterns differ between male and female students?

densityplot(~ height|seat , data=m111survey, groups = sex, col=c("yellowgreen","darkblue"),par.settings = list(superpose.line = list(col=c("yellowgreen", "darkblue"))), plot.points = FALSE, auto.key = list(col=c("yellowgreen","darkblue")),lwd=2, main="Density-plot of Height by seat preferences and gender") #plot.points=FALSE: to prevent adding points at the 

Boys and girls sitting in the front and middle seem to have similar heights. But, if boys sit at the back, they’re mostly tall.

Box-plot

RQ1: How does the actual sleep duration relate to both the feeling of weight and the perceived amount of sleep, as reported by students (enough_sleep)?

RQ2: Are there any noticeable trends or differences in sleep duration based on individuals’ perception of their weight and their actual sleep habits?

bwplot(sleep ~ factor(enough_Sleep) | weight_feel, data=m111survey)

table(m111survey$weight_feel,m111survey$enough_Sleep)
##                
##                 no yes
##   1_underweight  9   0
##   2_about_right 15   9
##   3_overweight  21  14
Scatter-plot

RQ: How does the relationship between actual height and ideal height vary between male and female students?

xyplot(height ~ ideal_ht | sex  , data=m111survey , pch=20 , cex=1.5 )

#xyplot(height ~ ideal_ht | sex  , data=m111survey , pch=20 , cex=1.5 )[1,] you would see the plot left-hand side closer 

RQ: How does the relationship between actual height and ideal height vary across different GPA levels, taking into account seating preferences?

GPA_levels=equal.count(m111survey$GPA, number=3, overlap=0.5) 
GPA_levels
## 
## Data:
##  [1] 3.560 2.500 3.500 3.200 3.680 2.700 2.800 2.100 2.500 3.890 3.200 3.200
## [13] 2.200 3.500 3.550 3.750 3.500 3.400 2.770 3.000 3.167 3.200 3.413 3.700
## [25] 3.500 3.750 2.800 2.200 3.600 2.800 3.100 3.000 3.500 3.900 3.787 3.200
## [37] 3.700 2.000 3.100 3.500 3.700 2.550 3.730 3.500 3.200 3.100 3.900 3.250
## [49] 3.200 3.294 2.000 3.000 3.700 3.300 3.200 3.600 3.300 4.000 3.400 2.700
## [61] 2.900 1.900 2.800 3.500 3.300 3.700 2.914 2.700
## 
## Intervals:
##     min   max count
## 1 1.897 3.203    34
## 2 2.897 3.553    34
## 3 3.247 4.003    34
## 
## Overlap between adjacent intervals:
## [1] 17 17
xyplot(height ~ ideal_ht | GPA_levels*seat  , data=m111survey , pch=20 , cex=1.5)

ggplot Graphics

ggplot() allows you to make complex plots with just a few lines of code because it’s based on a rich underlying theory, the grammar of graphics.

Firstly, we have to call ggplot2 package.

Scatter-plot

RQ: Is there a linear trend in the relationship between students’ actual height and their perceived ideal height

library(ggplot2)
ggplot(m111survey, aes(x = height, y = ideal_ht)) + 
  geom_point()+
  geom_smooth(method="lm")

Bubble-plot

RQ: How is the relationship between actual height and ideal height, and is there any observable pattern when considering the sleep duration and seating preferences?

ggplot(m111survey, aes(x = height, y = ideal_ht,size=sleep,col=seat)) + 
  geom_point()

Scatter-plot

RQ: How does the fastest driving time relate to students’ GPA? Is there any discernible pattern or correlation between the time spent on driving and academic performance among the surveyed students?

ggplot(m111survey, aes(fastest, GPA)) + 
  geom_point() + 
  facet_wrap(~seat)

Box-plot

RQ: How does the distribution of fastest driving times vary among students with different levels of love_first?

ggplot(m111survey, aes(love_first,fastest)) + geom_boxplot()

Violin-plot

RQ: Is there a noticeable variation in students’ GPA based on their seating preferences?

ggplot(m111survey, aes(x=seat, y=GPA, fill=seat)) + geom_violin(trim=FALSE)  + geom_boxplot(width=0.2,fill="beige")

Bar-plot

RQ: How do the self-perceived weight feelings vary among the surveyed students, as indicated by the bar plot? Are there any notable differences in the frequencies of feeling underweight, about right, and overweight?

f=table(m111survey$weight_feel)
f_data=data.frame(f) #automatically write their frequencies regarding the group name
ggplot(f_data,aes(x=reorder(Var1,Freq),y=Freq,fill=Var1))+geom_bar(stat="identity")+
  labs(title="Bar Plot of Weight_feel",y="Frequencies",x="Levels")+geom_text(aes(label=c("underweight","about right", "overweight")),vjust=-0.25,fontface="bold")

Lollipop chart
f_seat=data.frame(table(m111survey$seat))
ggplot(f_seat,aes(x=Var1,y=Freq))+geom_point(size=5, color="darkblue", fill="greenyellow", alpha=0.8, shape=21, stroke=1.5) + geom_segment(aes(x=Var1, xend=Var1, y=0, yend=Freq))

Data Manipulation

“tidyverse” is a package that includes several sub-packages (e.g., dplyr, ggplot2, etc.) in it. These packages are really good at working with data. They make it easy to organize and analyze information effectively, helping users get valuable insights from their data.

library(tidyverse)

Quick reference guide:

  • mutate(): adds new columns or modifies current variables in the dataset

  • summarize(): collapses all rows and returns a one-row summary

  • group_by(): takes existing data and groups specific variables together for future operations

  • filter(): only retain specific rows of data that meet the specified requirement(s)

  • select(): select only the columns (variables) that you want to see

  • arrange(): allows you arrange values within a variable in ascending or descending order

  • Display the data exclusively for female students.

head(filter(m111survey,sex=="female"))
height ideal_ht sleep fastest weight_feel love_first extra_life seat GPA enough_Sleep sex diff.ideal.act.
62 65 7 100 1_underweight no no 1_front 3.50 no female 3
59 61 7 90 2_about_right no yes 3_back 2.80 no female 2
65 69 6 100 2_about_right no no 1_front 2.10 yes female 4
62 62 7 60 3_overweight no no 1_front 2.50 yes female 0
59 62 5 80 2_about_right yes yes 1_front 3.89 no female 3
78 75 7 80 3_overweight no no 2_middle 3.20 yes female -3
  • Display the data exclusively for female students who believe in love at the first sight.
head(filter(m111survey,sex=="female" & love_first=="yes"))
height ideal_ht sleep fastest weight_feel love_first extra_life seat GPA enough_Sleep sex diff.ideal.act.
59 62 5 80 2_about_right yes yes 1_front 3.890 no female 3
65 70 8 120 3_overweight yes yes 1_front 3.750 yes female 5
65 68 7 125 3_overweight yes no 3_back 3.500 no female 3
66 66 7 120 3_overweight yes no 1_front 3.167 yes female 0
68 68 4 90 3_overweight yes no 2_middle 3.200 no female 0
54 54 4 130 3_overweight yes yes 1_front 3.413 no female 0
  • Display the data for GPA in descending order.
head(arrange(m111survey, desc(GPA))) #for increased order, ignore desc()
height ideal_ht sleep fastest weight_feel love_first extra_life seat GPA enough_Sleep sex diff.ideal.act.
63 68 7.5 75 3_overweight no no 1_front 4.000 yes female 5
74 76 5.0 115 3_overweight yes no 1_front 3.900 no male 2
65 67 6.5 90 2_about_right no yes 2_middle 3.900 yes female 2
59 62 5.0 80 2_about_right yes yes 1_front 3.890 no female 3
63 67 7.5 105 3_overweight yes no 1_front 3.787 yes female 4
65 70 8.0 120 3_overweight yes yes 1_front 3.750 yes female 5
  • Display only “height” and “ideal height” variables
head(select(m111survey,height,ideal_ht))
height ideal_ht
1 76 78
2 74 76
4 62 65
5 72 72
7 70 72
8 79 76
  • Display the data, including the new variable ‘ratio,’ which represents the ratio of height to ideal height.
head(mutate(m111survey, ratio = height/ideal_ht))
height ideal_ht sleep fastest weight_feel love_first extra_life seat GPA enough_Sleep sex diff.ideal.act. ratio
1 76 78 9.5 119 1_underweight no yes 1_front 3.56 no male 2 0.9743590
2 74 76 7.0 110 2_about_right no yes 2_middle 2.50 no male 2 0.9736842
4 62 65 7.0 100 1_underweight no no 1_front 3.50 no female 3 0.9538462
5 72 72 8.0 95 1_underweight no yes 3_back 3.20 no male 0 1.0000000
7 70 72 4.0 85 2_about_right no yes 1_front 3.68 no male 2 0.9722222
8 79 76 6.0 160 2_about_right no yes 3_back 2.70 yes male -3 1.0394737
  • Display the mean and median values of GPA.
summarise(m111survey,avg_gpa=mean(GPA,na.rm=T),med_gpa=median(GPA,na.rm=T))
avg_gpa med_gpa
3.187574 3.225
  • Display the average height for each seat level.
summarise(group_by(m111survey,seat),avg_height_byseat=mean(height,na.rm=T))
seat avg_height_byseat
1_front 65.22115
2_middle 69.73333
3_back 69.79167

Using %>% operator:

The operator helps you link different actions together. This makes your code shorter and easier to understand. It’s handy when you’re doing many things to data or working with different sets of data.

  • Display the mean, standard deviation and median for GPA.
m111survey %>% 
  mutate(m = mean(GPA),    
         sd = sd(GPA),      
         med = median(GPA))
height ideal_ht sleep fastest weight_feel love_first extra_life seat GPA enough_Sleep sex diff.ideal.act. m sd med
1 76.00 78.0 9.5 119 1_underweight no yes 1_front 3.560 no male 2.00 3.187574 0.5086646 3.225
2 74.00 76.0 7.0 110 2_about_right no yes 2_middle 2.500 no male 2.00 3.187574 0.5086646 3.225
4 62.00 65.0 7.0 100 1_underweight no no 1_front 3.500 no female 3.00 3.187574 0.5086646 3.225
5 72.00 72.0 8.0 95 1_underweight no yes 3_back 3.200 no male 0.00 3.187574 0.5086646 3.225
7 70.00 72.0 4.0 85 2_about_right no yes 1_front 3.680 no male 2.00 3.187574 0.5086646 3.225
8 79.00 76.0 6.0 160 2_about_right no yes 3_back 2.700 yes male -3.00 3.187574 0.5086646 3.225
9 59.00 61.0 7.0 90 2_about_right no yes 3_back 2.800 no female 2.00 3.187574 0.5086646 3.225
11 65.00 69.0 6.0 100 2_about_right no no 1_front 2.100 yes female 4.00 3.187574 0.5086646 3.225
12 62.00 62.0 7.0 60 3_overweight no no 1_front 2.500 yes female 0.00 3.187574 0.5086646 3.225
13 59.00 62.0 5.0 80 2_about_right yes yes 1_front 3.890 no female 3.00 3.187574 0.5086646 3.225
14 78.00 75.0 7.0 80 3_overweight no no 2_middle 3.200 yes female -3.00 3.187574 0.5086646 3.225
15 69.00 72.0 7.0 125 1_underweight no no 2_middle 3.200 no male 3.00 3.187574 0.5086646 3.225
16 68.00 68.0 4.5 100 1_underweight yes yes 2_middle 2.200 no male 0.00 3.187574 0.5086646 3.225
17 73.00 77.0 6.0 110 2_about_right yes yes 3_back 3.500 no male 4.00 3.187574 0.5086646 3.225
18 73.00 75.0 8.0 120 2_about_right no yes 2_middle 3.550 yes male 2.00 3.187574 0.5086646 3.225
19 65.00 70.0 8.0 120 3_overweight yes yes 1_front 3.750 yes female 5.00 3.187574 0.5086646 3.225
20 65.00 68.0 7.0 125 3_overweight yes no 3_back 3.500 no female 3.00 3.187574 0.5086646 3.225
21 66.00 68.0 7.0 75 3_overweight no no 1_front 3.400 yes female 2.00 3.187574 0.5086646 3.225
22 67.75 70.0 7.0 90 3_overweight no no 1_front 2.770 no male 2.25 3.187574 0.5086646 3.225
23 63.00 67.0 8.5 90 3_overweight no yes 1_front 3.000 yes female 4.00 3.187574 0.5086646 3.225
24 66.00 66.0 7.0 120 3_overweight yes no 1_front 3.167 yes female 0.00 3.187574 0.5086646 3.225
25 68.00 68.0 4.0 90 3_overweight yes no 2_middle 3.200 no female 0.00 3.187574 0.5086646 3.225
26 54.00 54.0 4.0 130 3_overweight yes yes 1_front 3.413 no female 0.00 3.187574 0.5086646 3.225
27 74.00 75.0 5.0 119 2_about_right yes yes 1_front 3.700 no male 1.00 3.187574 0.5086646 3.225
28 68.00 66.0 4.5 112 2_about_right yes no 1_front 3.500 no female -2.00 3.187574 0.5086646 3.225
29 68.00 68.0 6.0 93 2_about_right yes yes 1_front 3.750 no female 0.00 3.187574 0.5086646 3.225
30 69.00 67.0 6.0 145 3_overweight no no 3_back 2.800 no female -2.00 3.187574 0.5086646 3.225
31 72.00 90.0 9.0 125 3_overweight no yes 3_back 2.200 no male 18.00 3.187574 0.5086646 3.225
32 70.50 73.0 7.0 190 2_about_right no yes 3_back 3.600 no male 2.50 3.187574 0.5086646 3.225
33 70.00 75.0 7.5 90 2_about_right yes yes 2_middle 2.800 yes male 5.00 3.187574 0.5086646 3.225
34 75.00 78.0 7.0 143 3_overweight yes no 2_middle 3.100 yes male 3.00 3.187574 0.5086646 3.225
35 72.00 75.0 7.0 120 2_about_right yes no 2_middle 3.000 no male 3.00 3.187574 0.5086646 3.225
36 62.00 62.0 4.5 95 3_overweight no no 2_middle 3.500 no female 0.00 3.187574 0.5086646 3.225
37 74.00 76.0 5.0 115 3_overweight yes no 1_front 3.900 no male 2.00 3.187574 0.5086646 3.225
38 63.00 67.0 7.5 105 3_overweight yes no 1_front 3.787 yes female 4.00 3.187574 0.5086646 3.225
39 69.00 66.0 6.0 100 3_overweight no no 2_middle 3.200 no female -3.00 3.187574 0.5086646 3.225
40 60.00 66.0 5.0 95 3_overweight yes no 2_middle 3.700 no female 6.00 3.187574 0.5086646 3.225
41 68.00 69.5 9.0 95 3_overweight no no 2_middle 2.000 yes female 1.50 3.187574 0.5086646 3.225
42 73.00 76.0 8.0 110 3_overweight no yes 2_middle 3.100 yes male 3.00 3.187574 0.5086646 3.225
43 66.00 68.0 9.0 91 3_overweight yes no 1_front 3.500 no female 2.00 3.187574 0.5086646 3.225
44 70.00 67.0 5.5 85 3_overweight yes no 2_middle 3.700 no female -3.00 3.187574 0.5086646 3.225
45 51.00 54.0 7.0 130 2_about_right no no 1_front 2.550 no female 3.00 3.187574 0.5086646 3.225
46 67.00 68.0 7.0 104 3_overweight yes no 2_middle 3.730 no female 1.00 3.187574 0.5086646 3.225
47 69.00 70.0 5.0 95 3_overweight no no 2_middle 3.500 no female 1.00 3.187574 0.5086646 3.225
48 71.00 67.0 3.0 105 3_overweight no no 2_middle 3.200 no female -4.00 3.187574 0.5086646 3.225
49 74.00 74.0 5.0 90 3_overweight no no 2_middle 3.100 no male 0.00 3.187574 0.5086646 3.225
50 65.00 67.0 6.5 90 2_about_right no yes 2_middle 3.900 yes female 2.00 3.187574 0.5086646 3.225
51 63.50 65.0 6.5 105 2_about_right no no 2_middle 3.250 no female 1.50 3.187574 0.5086646 3.225
52 66.00 68.0 4.5 95 3_overweight yes yes 2_middle 3.200 no female 2.00 3.187574 0.5086646 3.225
53 69.00 65.0 8.0 110 3_overweight yes yes 3_back 3.294 no female -4.00 3.187574 0.5086646 3.225
54 75.00 77.0 7.0 105 3_overweight no no 2_middle 2.000 yes male 2.00 3.187574 0.5086646 3.225
55 65.00 75.0 6.0 130 2_about_right no yes 1_front 3.000 no male 10.00 3.187574 0.5086646 3.225
56 74.00 76.0 7.0 95 2_about_right no yes 3_back 3.700 yes male 2.00 3.187574 0.5086646 3.225
57 64.00 66.0 6.0 95 3_overweight yes no 1_front 3.300 no female 2.00 3.187574 0.5086646 3.225
58 76.00 77.0 7.0 100 2_about_right no yes 2_middle 3.200 yes male 1.00 3.187574 0.5086646 3.225
59 64.00 68.0 6.0 110 3_overweight no yes 2_middle 3.600 yes female 4.00 3.187574 0.5086646 3.225
60 71.50 74.0 6.0 108 2_about_right no no 2_middle 3.300 no male 2.50 3.187574 0.5086646 3.225
61 63.00 68.0 7.5 75 3_overweight no no 1_front 4.000 yes female 5.00 3.187574 0.5086646 3.225
62 64.00 68.0 7.5 102 3_overweight no no 1_front 3.400 no female 4.00 3.187574 0.5086646 3.225
63 68.00 72.0 6.5 105 3_overweight no yes 2_middle 2.700 yes male 4.00 3.187574 0.5086646 3.225
64 70.00 72.0 4.0 98 1_underweight yes yes 3_back 2.900 no male 2.00 3.187574 0.5086646 3.225
65 68.00 72.0 4.0 135 1_underweight no yes 2_middle 1.900 no male 4.00 3.187574 0.5086646 3.225
66 75.00 75.0 6.0 130 2_about_right no yes 2_middle 2.800 yes male 0.00 3.187574 0.5086646 3.225
67 69.00 67.0 2.0 85 3_overweight yes no 1_front 3.500 no female -2.00 3.187574 0.5086646 3.225
68 70.00 72.0 5.0 85 1_underweight no no 2_middle 3.300 no male 2.00 3.187574 0.5086646 3.225
69 61.00 68.0 5.0 130 2_about_right no no 1_front 3.700 no female 7.00 3.187574 0.5086646 3.225
70 65.00 66.0 8.0 120 2_about_right yes no 3_back 2.914 yes female 1.00 3.187574 0.5086646 3.225
71 70.00 73.0 5.0 110 1_underweight no no 1_front 2.700 no male 3.00 3.187574 0.5086646 3.225
  • Display the data of “sleep”, “GPA” and “sex” specifically for students who believe in love at first sight and extra life.
m111survey %>% filter(love_first == "yes", extra_life=="yes") %>% select(sleep, GPA, sex)
sleep GPA sex
5.0 3.890 female
4.5 2.200 male
6.0 3.500 male
8.0 3.750 female
4.0 3.413 female
5.0 3.700 male
6.0 3.750 female
7.5 2.800 male
4.5 3.200 female
8.0 3.294 female
4.0 2.900 male
  • Display the frequency distribution of students based on their thoughts about weight.
m111survey %>% group_by(weight_feel) %>% tally()
weight_feel n
1_underweight 9
2_about_right 24
3_overweight 35
  • Display the dataset that includes a variable named “gender”, where individuals identified as female are labeled as ‘girl’ and those identified as male are labeled as ‘boy,’ based on the variable ‘sex’.
m111survey %>% 
  mutate(gender = recode(sex,
                          "female" = "girl",
                          "male" = "boy"))
height ideal_ht sleep fastest weight_feel love_first extra_life seat GPA enough_Sleep sex diff.ideal.act. gender
1 76.00 78.0 9.5 119 1_underweight no yes 1_front 3.560 no male 2.00 boy
2 74.00 76.0 7.0 110 2_about_right no yes 2_middle 2.500 no male 2.00 boy
4 62.00 65.0 7.0 100 1_underweight no no 1_front 3.500 no female 3.00 girl
5 72.00 72.0 8.0 95 1_underweight no yes 3_back 3.200 no male 0.00 boy
7 70.00 72.0 4.0 85 2_about_right no yes 1_front 3.680 no male 2.00 boy
8 79.00 76.0 6.0 160 2_about_right no yes 3_back 2.700 yes male -3.00 boy
9 59.00 61.0 7.0 90 2_about_right no yes 3_back 2.800 no female 2.00 girl
11 65.00 69.0 6.0 100 2_about_right no no 1_front 2.100 yes female 4.00 girl
12 62.00 62.0 7.0 60 3_overweight no no 1_front 2.500 yes female 0.00 girl
13 59.00 62.0 5.0 80 2_about_right yes yes 1_front 3.890 no female 3.00 girl
14 78.00 75.0 7.0 80 3_overweight no no 2_middle 3.200 yes female -3.00 girl
15 69.00 72.0 7.0 125 1_underweight no no 2_middle 3.200 no male 3.00 boy
16 68.00 68.0 4.5 100 1_underweight yes yes 2_middle 2.200 no male 0.00 boy
17 73.00 77.0 6.0 110 2_about_right yes yes 3_back 3.500 no male 4.00 boy
18 73.00 75.0 8.0 120 2_about_right no yes 2_middle 3.550 yes male 2.00 boy
19 65.00 70.0 8.0 120 3_overweight yes yes 1_front 3.750 yes female 5.00 girl
20 65.00 68.0 7.0 125 3_overweight yes no 3_back 3.500 no female 3.00 girl
21 66.00 68.0 7.0 75 3_overweight no no 1_front 3.400 yes female 2.00 girl
22 67.75 70.0 7.0 90 3_overweight no no 1_front 2.770 no male 2.25 boy
23 63.00 67.0 8.5 90 3_overweight no yes 1_front 3.000 yes female 4.00 girl
24 66.00 66.0 7.0 120 3_overweight yes no 1_front 3.167 yes female 0.00 girl
25 68.00 68.0 4.0 90 3_overweight yes no 2_middle 3.200 no female 0.00 girl
26 54.00 54.0 4.0 130 3_overweight yes yes 1_front 3.413 no female 0.00 girl
27 74.00 75.0 5.0 119 2_about_right yes yes 1_front 3.700 no male 1.00 boy
28 68.00 66.0 4.5 112 2_about_right yes no 1_front 3.500 no female -2.00 girl
29 68.00 68.0 6.0 93 2_about_right yes yes 1_front 3.750 no female 0.00 girl
30 69.00 67.0 6.0 145 3_overweight no no 3_back 2.800 no female -2.00 girl
31 72.00 90.0 9.0 125 3_overweight no yes 3_back 2.200 no male 18.00 boy
32 70.50 73.0 7.0 190 2_about_right no yes 3_back 3.600 no male 2.50 boy
33 70.00 75.0 7.5 90 2_about_right yes yes 2_middle 2.800 yes male 5.00 boy
34 75.00 78.0 7.0 143 3_overweight yes no 2_middle 3.100 yes male 3.00 boy
35 72.00 75.0 7.0 120 2_about_right yes no 2_middle 3.000 no male 3.00 boy
36 62.00 62.0 4.5 95 3_overweight no no 2_middle 3.500 no female 0.00 girl
37 74.00 76.0 5.0 115 3_overweight yes no 1_front 3.900 no male 2.00 boy
38 63.00 67.0 7.5 105 3_overweight yes no 1_front 3.787 yes female 4.00 girl
39 69.00 66.0 6.0 100 3_overweight no no 2_middle 3.200 no female -3.00 girl
40 60.00 66.0 5.0 95 3_overweight yes no 2_middle 3.700 no female 6.00 girl
41 68.00 69.5 9.0 95 3_overweight no no 2_middle 2.000 yes female 1.50 girl
42 73.00 76.0 8.0 110 3_overweight no yes 2_middle 3.100 yes male 3.00 boy
43 66.00 68.0 9.0 91 3_overweight yes no 1_front 3.500 no female 2.00 girl
44 70.00 67.0 5.5 85 3_overweight yes no 2_middle 3.700 no female -3.00 girl
45 51.00 54.0 7.0 130 2_about_right no no 1_front 2.550 no female 3.00 girl
46 67.00 68.0 7.0 104 3_overweight yes no 2_middle 3.730 no female 1.00 girl
47 69.00 70.0 5.0 95 3_overweight no no 2_middle 3.500 no female 1.00 girl
48 71.00 67.0 3.0 105 3_overweight no no 2_middle 3.200 no female -4.00 girl
49 74.00 74.0 5.0 90 3_overweight no no 2_middle 3.100 no male 0.00 boy
50 65.00 67.0 6.5 90 2_about_right no yes 2_middle 3.900 yes female 2.00 girl
51 63.50 65.0 6.5 105 2_about_right no no 2_middle 3.250 no female 1.50 girl
52 66.00 68.0 4.5 95 3_overweight yes yes 2_middle 3.200 no female 2.00 girl
53 69.00 65.0 8.0 110 3_overweight yes yes 3_back 3.294 no female -4.00 girl
54 75.00 77.0 7.0 105 3_overweight no no 2_middle 2.000 yes male 2.00 boy
55 65.00 75.0 6.0 130 2_about_right no yes 1_front 3.000 no male 10.00 boy
56 74.00 76.0 7.0 95 2_about_right no yes 3_back 3.700 yes male 2.00 boy
57 64.00 66.0 6.0 95 3_overweight yes no 1_front 3.300 no female 2.00 girl
58 76.00 77.0 7.0 100 2_about_right no yes 2_middle 3.200 yes male 1.00 boy
59 64.00 68.0 6.0 110 3_overweight no yes 2_middle 3.600 yes female 4.00 girl
60 71.50 74.0 6.0 108 2_about_right no no 2_middle 3.300 no male 2.50 boy
61 63.00 68.0 7.5 75 3_overweight no no 1_front 4.000 yes female 5.00 girl
62 64.00 68.0 7.5 102 3_overweight no no 1_front 3.400 no female 4.00 girl
63 68.00 72.0 6.5 105 3_overweight no yes 2_middle 2.700 yes male 4.00 boy
64 70.00 72.0 4.0 98 1_underweight yes yes 3_back 2.900 no male 2.00 boy
65 68.00 72.0 4.0 135 1_underweight no yes 2_middle 1.900 no male 4.00 boy
66 75.00 75.0 6.0 130 2_about_right no yes 2_middle 2.800 yes male 0.00 boy
67 69.00 67.0 2.0 85 3_overweight yes no 1_front 3.500 no female -2.00 girl
68 70.00 72.0 5.0 85 1_underweight no no 2_middle 3.300 no male 2.00 boy
69 61.00 68.0 5.0 130 2_about_right no no 1_front 3.700 no female 7.00 girl
70 65.00 66.0 8.0 120 2_about_right yes no 3_back 2.914 yes female 1.00 girl
71 70.00 73.0 5.0 110 1_underweight no no 1_front 2.700 no male 3.00 boy

References:

https://ggplot2-book.org/getting-started

https://ggplot2.tidyverse.org/reference/geom_point.html

Cohen, Y., & Cohen, J. Y. (2008). Statistics and Data with R: An applied approach through examples. John Wiley & Sons.

Mount, J., & Zumel, N. (2019). Practical data science with R. Simon and Schuster.

Yozgatlıgil,C. (2024). Stat 412-Lecture Notes.