DA LAB

knitr::opts_chunk$set(echo = TRUE, message = FALSE, warning = FALSE)      #
                                                                          #
library(tidyverse)           #The tidyverse has all of our data tools     #

## ── Attaching packages ─────────────────────────────────────────────────────────── tidyverse 1.2.1 ──

## ✔ ggplot2 3.2.0     ✔ purrr   0.3.2
## ✔ tibble  2.1.3     ✔ dplyr   0.8.3
## ✔ tidyr   0.8.3     ✔ stringr 1.4.0
## ✔ readr   1.3.1     ✔ forcats 0.4.0

## ── Conflicts ────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()

library(ggthemes)            #ggthemes makes our graphs look nicer        #
                                                                          # 
theme_set(                                #this sets the theme for all    #
  theme_tufte() +                         #plots that we make             #
  theme(text=element_text(family="sans"))+                                #
  theme(axis.text.x = element_text(angle=-90))                            #
  )                                                                       #
                                                                          #
                                                                          #

Research Questions: (1)Which race tends to be more friendly at Denison?

(2)What gender tends to hold a conversation for longer?

read_csv('DA Data Lab2 - Sheet1 (2).csv') -> Dadata
view(Dadata)
head(Dadata)

## # A tibble: 6 x 9
##   Subject TimeDay Interactions InteractionTime… Place Gender Affection
##     <dbl> <time>         <dbl>            <dbl> <chr> <chr>      <dbl>
## 1       1 02:19              1                7 Slay… F              2
## 2       2 02:21             NA               NA Slay… M             NA
## 3       3 02:22              1               10 Slay… M              3
## 4       4 02:25              2               14 Slay… M              3
## 5       5 02:26             NA               NA Slay… M             NA
## 6       6 02:26             NA               NA Slay… M             NA
## # … with 2 more variables: AmntPeopleInteracting <dbl>, Race <chr>

ggplot(Dadata, aes(x=Affection, fill=Race)) + geom_histogram() + labs(title = "Affection of Races")

##This histogram shows the frequency of interactions based on the degree of affection. The level of affection in an interaction was rated on a scale from 1-5. We rated these interactions based on observed body language, conversation, and overall demeanor. The bars of the histogram were filled to show which races are represented in each level of affection. From the histogram it can be seen that the most common levels of affection were 1,2, and 3. From this we can see that the most common type of interactions are of low to mid levels of affection. The histogram also shows that interactions that are rated as the maximum level of affection are fairly rare and only represent a small portion of the data. From the histogram we can also see which races are present in each level of affection. The data shows that the only three races present in the top three levels of affection are white, asian, and latino. It is difficult to determine which race is the most friendly because 71% of our data is of white students. This is representative of the Denison community because 65% of the student body is made up of white students. It is also true that whites asians and latinos are heavily present within the lower levels of affection. Due to this, it would be difficult to draw a definite conclusion as to which race is the most friendly because no race stands out as the most friendly based this graph.

ggplot(Dadata, aes(x=Race,y=Affection)) + 
  geom_boxplot() + labs(title= 'Affection of Races')

AsianFilter <- filter(Dadata, Race=='Asian') 
mean(AsianFilter$Affection)

## [1] 2.5

##Through the boxplot shown above, the different races that where observed in our study are compared to the affection they show when interacting with others. At first, the Asian people at Denison have the highest mean of affection, averaging 2.5 on a scale over 5 (where 5 is high and 1 is low). On a different manner, the “Black” race demonstrates to not show as much affection as the “Asians” do as it can be seen how the median rounds about 2 on the scale, and from there, ranging to 1, showing how there is not much affection going on in their interactions towards the rest of the people. Opposed to this, the “Latino” people are showing a similar mean of around 2, and ranging upwards up to 3, yet with the 75th percentile laying in between 2 and 2.5. The white population, which is the largest in our sample, has a similar median close to 2, and ranges from 1 all the way to 5. Thus, the population that demonstrates to be the most affective one is the “Asian” one, while the “Black” proves to be the least affective.

ggplot(data = Dadata ,aes(x=Gender, y=InteractionTimeSeconds)) +
geom_boxplot() +
scale_y_log10() + labs(title = 'Interaction Time for Male and Females')

filter(Dadata, Gender == "M")-> DadataMale
head(DadataMale)

## # A tibble: 6 x 9
##   Subject TimeDay Interactions InteractionTime… Place Gender Affection
##     <dbl> <time>         <dbl>            <dbl> <chr> <chr>      <dbl>
## 1       2 02:21             NA               NA Slay… M             NA
## 2       3 02:22              1               10 Slay… M              3
## 3       4 02:25              2               14 Slay… M              3
## 4       5 02:26             NA               NA Slay… M             NA
## 5       6 02:26             NA               NA Slay… M             NA
## 6      10 02:34             NA               NA Slay… M             NA
## # … with 2 more variables: AmntPeopleInteracting <dbl>, Race <chr>

mean(DadataMale$InteractionTimeSeconds, na.rm = TRUE)

## [1] 66.35294

##As it can be observed in the boxplot above, males tend to hold conversation for longer, yet not for much difference as both have a mean Interaction time of about 10 seconds. Even though, in regards to the 75th percentile, males also have longer conversations lasting about 70 seconds, while females are around 50 seconds.

Conclusion In conclusion, our analysis allowed us to make use of the information from the data that we had collected. Our data allowed us to answer both of our research questions, yet they proved to be vaguely formatted as the question was too broad and was not as specific as we hoped to be. Based on our data we concluded that the race that shows more affection when interacting with other people are Asian with a mean of 2.5, on the scale we created from 1 to 5, compared to the rest of the races which surround a mean of 2. Though, there is not much difference in the mean of the affection with variations of less than one, yet with the highest 75th percentile reaching 3. Additionally, we were able to draw the conclusion that men tend to hold an interaction longer than women and also tend to have a more spread amount of data with interactions that last much longer compared to the females. With that being said, it is important to note that our study would have greatly benefitted from a much larger sample size over a longer period of time. This is due to a lack of variety of subjects in the recollection of data regarding the amount of subjects that belong to races other than white, as the white population in Denison is the highest percent compared to international students. The visualization of data into graphs was imperative to our understanding of our results and ultimately presented us with the statistical answers to our questions. Through the analysis of the graphs we were able to analyze the means and the ranges of our data in a more organized way through a visual representation. In the future we would recommend more time employed in order to collect data that spanned over multiple days and at various times of the day, as people would probably dispose of more time to hold interactions based on how busy they are. For example, at noon, people might be more able to hold an interaction as they would not be in a hurry to get to their classes, as subsequently, and in the morning they would not be able to do so. It could have also been beneficial to create concrete definitions for some of the more abstract variables that we collected, for example, degree of affection. We could have also sped up the data-tidying process if we had all collected data, and entered it into the computer in the same format. If we had the opportunity to collect a wider variety of data we could possibly answer other questions in the future, for example: What is the most popular place on campus for social interactions? Or What races tend to interact at which locations on campus? Other variables that could be collected are class(freshman-senior) and type of interaction(casual, formal, etc.)