Summary

This data analysis report evaluates the responses of a survey, which was administered to students (through Blackboard Learn) on the first day of the General Education Biology (BSC1005) course at Miami Dade College during the 2016-2 term (Fall 2016-2017 semester). After students completed the survey, which contain 8 questions (refer to Questions in the Survey below), the raw data was downloaded from Blackboard Learn into a local directory. The file with the raw data was saved as Microsoft Excel^TM file. For questions with character (i.e. word) responses, such as question 1, 7 and 8, the most frequent words as well as the total words per student’s responses were determined. For the remaining questions, the proportion of answers was determined.

Note: the codes, raw data, and RMarkdown files associated with this report can be found in the github repository located here.

Questions in the Survey
Load the Raw Dataset
Quick Look at the Dataset
Data Cleaning
Most Frequent Words
Students’ Technology Familiarity
Conclusion

Questions in the Survey

In class, students were briefly learned how to use Blackboard. Once they were familiarized enough with the Blackboard platform, students were instructed to access the survey (see Figure 1 below). The students’ responses were anonymous to the professor.

Figure 1: Survey on Blackboard

Questions

What do you expect to learn on this course?
How do you consider your technology familiarity?
Which of the following technologies are you familiar with?
How you every used your smartphone, tablet, or Personal Computer/Mac computer to view the professor presentation on real-time (i.e. at the same time the professor is giving the lecture)?
How you every used your smartphone, tablet, or Personal Computer/Mac computer to submit a response, in real-time, to a question posted by the professor on the board or in a PowerPointTM presentation?
In years, how long ago was your last biology course in which you were enrolled, including high school? (Note: If less than one year ago, write 0)
Considering any course you may have taken, which way do you consider is the best way you learn in the classroom?
Considering any course you may have taken, how do you usually study?

Load the Raw Dataset

library(xlsx) # load xlsx package to handle Excel files

# set the working directory
setwd("C:/Users/Felix/Dropbox/Teaching/1_Miami-Dade College/Statistical assessments/BSC1005_2016-2_1852/survey1")

survey <- read.xlsx("BSC1005_1852_2016-2_survey1.xlsx", 1) # read the file with survey responses

Quick Look at the Dataset

The raw information of the survey’s data, as extracted from Blackboard Learn, is shown below. The 35 observations shown in the output below represent 35 students that completed the survey. Although all variables were uploaded as factors, class of the variable was adjusted as necessary in the proceeding data analysis steps.

library(plyr); library(dplyr) # load plyr and dplyr packages
glimpse(survey) # take a quick look at the variables classes

## Observations: 35
## Variables: 24
## $ Question.ID.1 <fctr> Question ID 1, Question ID 1, Question ID 1, Qu...
## $ Question.1    <fctr> What do you expect to learn in this course?, Wh...
## $ Answer.1      <fctr> I expect to learn biology,Â not for the profess...
## $ Question.ID.2 <fctr> Question ID 2, Question ID 2, Question ID 2, Qu...
## $ Question.2    <fctr> How do you consider your technology familiarity...
## $ Answer.2      <fctr> Advanced technologically, Mid-level technologic...
## $ Question.ID.3 <fctr> Question ID 3, Question ID 3, Question ID 3, Qu...
## $ Question.3    <fctr> Which of the following technologies are you fam...
## $ Answer.3      <fctr> Social media (e.g.&nbsp;Facebook, Twitter, Inst...
## $ Question.ID.4 <fctr> Question ID 4, Question ID 4, Question ID 4, Qu...
## $ Question.4    <fctr> How you every used your smartphone, tablet, or ...
## $ Answer.4      <fctr> No, Yes, Yes, No, Yes, Yes, Yes, No, Yes, No, <...
## $ Question.ID.5 <fctr> Question ID 5, Question ID 5, Question ID 5, Qu...
## $ Question.5    <fctr> <span style="color: #000000; font-family: 'Helv...
## $ Answer.5      <fctr> Yes, No, Yes, Yes, Yes, No, Yes, Yes, Yes, No, ...
## $ Question.ID.6 <fctr> Question ID 6, Question ID 6, Question ID 6, Qu...
## $ Question.6    <fctr> In years,&nbsp;how long ago was your last biolo...
## $ Answer.6      <fctr> 0, 4, 3, 2012, 3, 6, 3, 3 years ago, 3 years ag...
## $ Question.ID.7 <fctr> Question ID 7, Question ID 7, Question ID 7, Qu...
## $ Question.7    <fctr> Considering any course you may have taken,&nbsp...
## $ Answer.7      <fctr> I learn best when the professor is passionate a...
## $ Question.ID.8 <fctr> Question ID 8, Question ID 8, Question ID 8, Qu...
## $ Question.8    <fctr> Considering any course you may have taken,&nbsp...
## $ Answer.8      <fctr> Yes I do,Â if not I wont understandÂ , Memorizi...

Data Cleaning

The cleaning of the dataset was carried out as follows: 1) create a vector for each column that represents answers to a question 2) convert from factor to character the class of the vectors with text (i.e. word) answers

survey_answers <- select(survey, starts_with("Ans")) # select all columns starting with "Anns"

# assign a vector for each individual column
survey_ans1 <- select(survey_answers, Answer.1)
survey_ans2 <- select(survey_answers, Answer.2)
survey_ans3 <- select(survey_answers, Answer.3)
survey_ans4 <- select(survey_answers, Answer.4)
survey_ans5 <- select(survey_answers, Answer.5)
survey_ans6 <- select(survey_answers, Answer.6)
survey_ans7 <- select(survey_answers, Answer.7)
survey_ans8 <- select(survey_answers, Answer.8)

#---- most frequent words in question 1, 7 and 8--------
# make sure that the variables with written answers have the strings as characters 
survey_ans1 %>% mutate_if(is.factor, as.character) -> survey_ans1
survey_ans7 %>% mutate_if(is.factor, as.character) -> survey_ans7
survey_ans8 %>% mutate_if(is.factor, as.character) -> survey_ans8

Most Frequent Words

Although Blackboard Learn does provide for summary statistics of assessments’ questions, including surveys, objectivity of analysis for open-ended question is limited. For this reason, the most frequent words among questions with text (i.e. word) responses, such as question 1, 7, and 8, were determined. Nevertheless, I did read each individual response from the students to get an initial look at what they had written in these three questions.

Question 1

Table 1: Most common words among students’ course expectations

# most freq words in answer 1----------------------------
library(qdap) # qualitative data analysis package (it masks %>%)
library(tm) # framework for text mining; it loads NLP package
library(Rgraphviz) # depict the terms within the tm package framework
library(SnowballC); library(RWeka); library(rJava); library(RWekajars)  # wordStem is masked from SnowballC
library(Rstem) 
library(stringr)

survey_a1 <- tolower(survey_ans1) # make all characeters lower case
survey_a1 <- tm::removeNumbers(survey_a1) # remove numbers from the string
survey_a1 <- str_replace_all(survey_a1 , "  ", "") # replace double spaces with single space
survey_a1 <- str_replace_all(survey_a1, pattern = "[[:punct:]]", " ") # remove punctuations from the string

survey_a1 <- tm::removeWords(x = survey_a1, stopwords(kind = "SMART")) # remove common words

corpus <- Corpus(VectorSource(survey_a1)) # turn into corpus

tdm <- TermDocumentMatrix(corpus) # create tdm from the corpus

answer_1 <- freq_terms(text.var = survey_a1, top = 20) # find the 20 most frequent words

library(pander) # load package to output a nice table
pander(head(answer_1, 15)) # print the top 15 most common words

	WORD	FREQ
19	biology	21
63	learn	17
46	expect	15
1	â	10
20	biologyâ	5
65	life	5
13	basic	4
15	basics	4
53	high	3
62	knowledge	3
98	things	3
8	animals	2
10	apply	2
30	concepts	2
64	learned	2

From the table 1 above, biology, learn, and expect were the three most common words. This would be expected as learn and expect are two keywords in question 1, and biology is a key word in the course’s name (i.e. General Education Biology). Important words to note were life, basic, knowledge, animals, apply, and concepts. This suggest that students expect to learn basic knowledge about life, including of animals, and to learn to apply concepts.

Question 7

Table 2: Most common words among students responses on how they learn in the classroom

# most frequent words in answer 7---------------------------
survey_a7 <- tolower(survey_ans7) # all charcters to lower case
survey_a7 <- tm::removeNumbers(survey_a7) # remove numbers
survey_a7 <- str_replace_all(survey_a7 , "  ", "") # replace double spaces with single space
survey_a7 <- str_replace_all(survey_a7, pattern = "[[:punct:]]", " ")

survey_a7 <- tm::removeWords(x = survey_a7, stopwords(kind = "SMART"))

corpus1 <- Corpus(VectorSource(survey_a7)) # turn into corpus

tdm1 <- TermDocumentMatrix(corpus) # create tdm from the corpus

answer_7 <- freq_terms(text.var = survey_a7, top = 20) # find the 25 most frequent words

pander(head(answer_7, 15)) # print top 15 common words

	WORD	FREQ
1	â	17
49	learn	10
39	hands	5
75	professor	5
96	teacher	5
108	visual	5
113	writing	5
51	learning	4
52	lecture	4
61	notes	4
78	questions	4
28	examples	3
53	lectures	3
70	powerpoints	3
105	understand	3

From table 2 above, the most common word was learn. Similarly to question 1, this was expected because learn was among the keywords in the question. The letter a with the special character was not considered as a word because it may represented a character with accent, which could have resulted from auto-correction in the student’s smartphones. Nevertheless, the remaining words evidence the high variety of methods by which students consider is the best way to learn in the classroom.

Question 8

Table 3: Most common words among students’ study habits

# most frequent words in answer 8--------------------------
survey_a8 <- tolower(survey_ans8)  # all charcters to lower case
survey_a8 <- tm::removeNumbers(survey_a8) # remove characters
survey_a8 <- str_replace_all(survey_a8 , "  ", "") # replace double spaces with single space
survey_a8 <- str_replace_all(survey_a8, pattern = "[[:punct:]]", " ") # remove punctuation

survey_a8 <- tm::removeWords(x = survey_a8, stopwords(kind = "SMART")) # remove common words

corpus2 <- Corpus(VectorSource(survey_a8)) # turn into corpus

tdm2 <- TermDocumentMatrix(corpus) # create tdm from the corpus

answer_8 <- freq_terms(text.var = survey_a8, top = 15) # find the 25 most frequent words

pander(head(answer_8, 15))

	WORD	FREQ
67	study	16
41	notes	9
1	â	6
77	unanswered	5
73	textbook	3
9	class	2
10	classâ	2
12	computer	2
27	important	2
32	learn	2
34	lecture	2
46	points	2
48	power	2
49	powerpoint	2
56	read	2

As seen in table 1 and 2, a keyword of the question, study, was the most common word among the students’ responses to question 8. Notes and textbook resulted the following two most common words, followed by numerous words having the same frequency. This suggests that students prefer to take notes, refer to the textbook, and revisit lectures (lecture, points, power, powerpoint, and read were among common words related to lectures). Nevertheless, various words having similar low frequencies also suggests different study habit among the students.

Distribution of Words in Questions 1, 7, and 8 per Student

Having identified frequent words related to students expectations of the course, how they associate learning in the classroom, and the approaches they reported to employ when studying, I wanted to address how student differ with regards to amount of words they wrote in their responses.

#--what is the distribution of words----------
# For each questions with words, 1) count the words, 2) make a data frame,
# 3) change the variable name, and 4) remove NA values

# question 1
library(stringr)
total_words_ans1 <- as.data.frame(str_count(survey_ans1$Answer.1, '\\s+')+1)
names(total_words_ans1) <- "words1"
total_words_ans1$words1 <- na.omit(total_words_ans1$words1)

# question 2
total_words_ans7 <- as.data.frame(str_count(survey_ans7$Answer.7, '\\s+')+1)
names(total_words_ans7) <- "words7"
total_words_ans7$words7 <- na.omit(total_words_ans7$words7)

# question 3 
total_words_ans8 <- as.data.frame(str_count(survey_ans8$Answer.8, '\\s+')+1)
names(total_words_ans8) <- "words8"
total_words_ans8$words8 <- na.omit(total_words_ans8$words8)

# histograms for total words in answer #1, #7, and #8
opar <- par(no.readonly=TRUE) # save the original settings

par(mfrow=c(1,3))
hist(total_words_ans1$words1, main="Question 1", xlab="Total # of Words", col="blue")
hist(total_words_ans7$words7, main="Question 7", xlab="Total # of Words", col="orange")
hist(total_words_ans8$words8, main="Question 8", xlab="Total # of Words", col="green")

par(opar) # reset to the original settings

Figure 2: Distribution of total words per students responses for question 1 (blue histogram), 7 (orange histogram), and 8 (green histogram).

A similar pattern is observed in the distribution of total words per students’ responses in all three questions. For question 1, most of the answers contain 15 words or less, while 10 words or less for question 7 and 8. If all three questions are combined, only 19 of 105 (18%) response contained more than 20 words.

Table 4: Summary statistics for the total number of words per students’ responses to questions 1, 7, and 8

# 1) creat a list of the answers dataframes, 2) rbind data frames of different lenghts
# and 3) rename the variables names
total_words_list <- list(total_words_ans1, total_words_ans7, total_words_ans8)
total_words <- do.call(rbind.fill, total_words_list)
names(total_words) <- c("q1", "q7", "q8")


pander(summary(total_words)) # generate summary statistics

q1	q7	q8
Min. : 2.0	Min. : 1.00	Min. : 1.000
1st Qu.: 6.5	1st Qu.: 5.00	1st Qu.: 4.000
Median :12.0	Median :10.00	Median : 7.000
Mean :14.2	Mean :12.71	Mean : 8.171
3rd Qu.:19.0	3rd Qu.:16.50	3rd Qu.:11.500
Max. :43.0	Max. :42.00	Max. :22.000
NA’s :70	NA’s :70	NA’s :70

All descriptive statistics decreased from question 1 to question 8. The mean for questions 1, 7, 8 were 14.2, 12.7, and 8.2 words, respectively. A similar pattern was observed with the median (12, 10, and 7 words for questions 1, 7, and 8, respectively). It is worthwhile evaluating if this would be an effect of questions being online or an artifact of the questions themselves. Important to note that all questions required students’ opinions and self-reflections, but question 7 and 8 were related.

library(tidyr) # load tidyr to generate a tidy dataset

# gather the variabels and generate a tidy dataset
total_words_long <- gather(total_words, question, words, q1:q8) 

# compare the distribution of total words per student per answer
boxplot(words ~ question, total_words_long, main="Comparison of Words Distributions",
        xlab="Questions Number", ylab="Frequency of Total Words", col=c("blue", "orange", "green"))

Figure 3: Boxplot of total words per students’ responses for questions 1(blue boxplot), 7(orange boxplot), and 8(green. This graph further evidence the decreased in total words per student response from question 1 to question 8.

Students’ Technology Familiarity

Question 2

Table 5: Percentage of students self-reported technology familiarity

#---question 2--------------------------------
# rename the levels in answers from question #2
levels(survey_ans2$Answer.2) <- c("Advanced Tech", "Low-Level Tech", "Mid-Level Tech")
options(digits=3)
pander(prop.table(table(survey_ans2))*100)

Advanced Tech	Low-Level Tech	Mid-Level Tech
57.14	5.714	37.14

As reported in table 4 above, most students (nearly 95%) reported been familiar with technology, with more than half of the classroom (57%) reporting having advanced technology familiarity. It is important to mention that, although there were only three choices, students were not provided with details of the categories.

Question 3

Table 6: Percentage of students with familiarity to specific technology platforms

# rename the levels in answers from question #3
levels(survey_ans3$Answer.3) <- c("Unanswered","LSM", "Social Media and Email", "Social Media and Email", 
                    "Social Media and Email", "Social Media and Email")
options(digits=3)
pander(prop.table(table(survey_ans3))*100)

Unanswered	LSM	Social Media and Email
2.857	2.857	94.29

Most of the students (94%) reported being familiar with commonly used platforms, such as social media (e.g. Facebook, Twitter, Instagram, Snapchat) and email. On the contrary, very few reported familiarity Learning Management Systems, such as Blackboard. In this questions, students had the option to select more than one choice, and these were the following: a) Social Media (e.g. Facebook, Twitter, Instagram, Snapchat, etc.), b) Could storage (e.g. Dropbox^TM, GoogleDrive^TM, OneDrive^TM, iCloud^TM, etc.), c) Learning Management Systems (e.g. Blackboard^TM, Schoology^TM), d) using email (e.g. Gmail, mymdc.net, or any other email service).

Question 4

Table 7: Percentage of students with familiarity to specific technology platforms

#---question 4--------------------------------
# rename the levels in answers from question #4
levels(survey_ans4$Answer.4) <- c("Unanswered", "No", "Yes")
options(digits=3)
pander(prop.table(table(survey_ans4))*100)

Unanswered	No	Yes
2.857	54.29	42.86

The majority of the classroom reported not having experienced with using a platform in which they can see the lecture remotely on their mobile equipment (smartphones, tables, PCs or Mac), while at the same time the professor is lecturing. Based on this finding, a platform with these capabilities was launched in the classroom, but was shortly discontinued given that it did not performed as expected.

Question 5

Table 8: Percentage of students with familiarity to specific technology platforms

#---question 5--------------------------------
# rename the levels in answers from question #5
levels(survey_ans5$Answer.5) <- c("Unanswered", "No", "Yes")
options(digits=3)
pander(prop.table(table(survey_ans5))*100)

Unanswered	No	Yes
2.857	40	57.14

The proportion of students to have experience with remote response applications, such as iClicker^Tm(https://www1.iclicker.com/) or Reef-Polling^TM(https://www1.iclicker.com/products/reef-polling/), was opposite to table 7: more than half of the classroom reported having had experienced with these or similar types of applications. As a result, an application of this type was implemented a the substitute for the platform discontinued and discussed in table 7.

Question 6

#---question 6--------------------------------
# rename the levels in answers from question #6
levels(survey_ans6$Answer.6) <- c("Unanswered","0", "1", "2", "2-3", "2", "5", "3", "3",
                                "39", "3", "4", "4", "4", "funny number", 
                                "6", "0")
survey_ans6$Answer.6 <- factor(survey_ans6$Answer.6, order=TRUE)

options(digits=3)
pander(prop.table(table(survey_ans6))*100)

Unanswered	0	1	2	2-3	5	3	39	4	funny number	6
2.857	11.43	2.857	22.86	2.857	2.857	25.71	2.857	17.14	2.857	5.714

Recent biological education exposure varied within in the classroom, mostly ranging from 0 to 5 years since their last biology or science course.

Conclusion

The data analysis report presented here evaluated the students’ responses from a survey administered the first day of class. This survey addressed the 1) students’ expectations of the course, 2) their familiarity with technology, 3) their previous biological or science exposure, 4) how they learn in the classroom, and 5) their study habits. Questions 1, 7, and 8, through open-ended approaches, assessed students’ expectations of the course, how they learn in the classroom, and their study habits.

Gaining basic biological knowledge and being able to apply concepts seems to be among the students’ expectations for the course. Although biology, learn, and expect were the most common words (they are keywords) in question 1, life, basic, knowledge, animals, apply, and concepts were also frequent words extracted from text-mining.

As expected, students reported different preferred learning methods in and outside the classroom. As professors, we must take into consideration the variety of learning approaches students bring into the classroom and, for this reason, present the topics to be discussed in different formats. The benefit is not only for students to get comfortable with a specific format of information but to also encourage them outside their learning comfort zones.

Before implementing a technological approach in the classroom, it is worthwhile to evaluate students’ prior experience with the given or similar technology. For example, most of the students reported not having experience with applications in which they would be able to see in their mobile device the lecture, in Microsoft PowerPoint^TM, in real-time. On the contrary, most students did report having experience with remote-response systems. Although the former option was initially attempted in the classroom, the latter option was able to be implemented successfully because the majority of the students had seen or used similar applications before. If the current application fails to perform, social media and email can be implemented as alternative given that the vast majority reported being familiar with these platforms.

Lastly, and similar to different learning approaches and study habits, the classroom reported a considerable wide range of years from the last biology or science course. As with learning approaches in the classroom, prior knowledge of the students must be consider throughout prospective course plans and any needed adjustments.

In conclusion, carefully designed questions in brief surveys may serve as a tool to get to know the students, such as what they look to gain from the course, the technology they are familiar with and we, as professor, seek to implement, and there does their biology experience stands.

Data Analysis on a Survey Administered to Students on the 1st Day of Class in a General Education Biology Course

Felix E. Rivera-Mariani, PhD

January 14, 2017