This data analysis report evaluates the responses of a survey, which was administered to students (through Blackboard Learn) on the first day of the General Education Biology (BSC1005) course at Miami Dade College during the 2016-2 term (Fall 2016-2017 semester). After students completed the survey, which contain 8 questions (refer to Questions in the Survey below), the raw data was downloaded from Blackboard Learn into a local directory. The file with the raw data was saved as Microsoft ExcelTM file. For questions with character (i.e. word) responses, such as question 1, 7 and 8, the most frequent words as well as the total words per student’s responses were determined. For the remaining questions, the proportion of answers was determined.
Note: the codes, raw data, and RMarkdown files associated with this report can be found in the github repository located here.
In class, students were briefly learned how to use Blackboard. Once they were familiarized enough with the Blackboard platform, students were instructed to access the survey (see Figure 1 below). The students’ responses were anonymous to the professor.
Figure 1: Survey on Blackboard
Questions
library(xlsx) # load xlsx package to handle Excel files
# set the working directory
setwd("C:/Users/Felix/Dropbox/Teaching/1_Miami-Dade College/Statistical assessments/BSC1005_2016-2_1852/survey1")
survey <- read.xlsx("BSC1005_1852_2016-2_survey1.xlsx", 1) # read the file with survey responses
The raw information of the survey’s data, as extracted from Blackboard Learn, is shown below. The 35 observations shown in the output below represent 35 students that completed the survey. Although all variables were uploaded as factors, class of the variable was adjusted as necessary in the proceeding data analysis steps.
library(plyr); library(dplyr) # load plyr and dplyr packages
glimpse(survey) # take a quick look at the variables classes
## Observations: 35
## Variables: 24
## $ Question.ID.1 <fctr> Question ID 1, Question ID 1, Question ID 1, Qu...
## $ Question.1 <fctr> What do you expect to learn in this course?, Wh...
## $ Answer.1 <fctr> I expect to learn biology, not for the profess...
## $ Question.ID.2 <fctr> Question ID 2, Question ID 2, Question ID 2, Qu...
## $ Question.2 <fctr> How do you consider your technology familiarity...
## $ Answer.2 <fctr> Advanced technologically, Mid-level technologic...
## $ Question.ID.3 <fctr> Question ID 3, Question ID 3, Question ID 3, Qu...
## $ Question.3 <fctr> Which of the following technologies are you fam...
## $ Answer.3 <fctr> Social media (e.g. Facebook, Twitter, Inst...
## $ Question.ID.4 <fctr> Question ID 4, Question ID 4, Question ID 4, Qu...
## $ Question.4 <fctr> How you every used your smartphone, tablet, or ...
## $ Answer.4 <fctr> No, Yes, Yes, No, Yes, Yes, Yes, No, Yes, No, <...
## $ Question.ID.5 <fctr> Question ID 5, Question ID 5, Question ID 5, Qu...
## $ Question.5 <fctr> <span style="color: #000000; font-family: 'Helv...
## $ Answer.5 <fctr> Yes, No, Yes, Yes, Yes, No, Yes, Yes, Yes, No, ...
## $ Question.ID.6 <fctr> Question ID 6, Question ID 6, Question ID 6, Qu...
## $ Question.6 <fctr> In years, how long ago was your last biolo...
## $ Answer.6 <fctr> 0, 4, 3, 2012, 3, 6, 3, 3 years ago, 3 years ag...
## $ Question.ID.7 <fctr> Question ID 7, Question ID 7, Question ID 7, Qu...
## $ Question.7 <fctr> Considering any course you may have taken, ...
## $ Answer.7 <fctr> I learn best when the professor is passionate a...
## $ Question.ID.8 <fctr> Question ID 8, Question ID 8, Question ID 8, Qu...
## $ Question.8 <fctr> Considering any course you may have taken, ...
## $ Answer.8 <fctr> Yes I do, if not I wont understand , Memorizi...
The cleaning of the dataset was carried out as follows: 1) create a vector for each column that represents answers to a question 2) convert from factor to character the class of the vectors with text (i.e. word) answers
survey_answers <- select(survey, starts_with("Ans")) # select all columns starting with "Anns"
# assign a vector for each individual column
survey_ans1 <- select(survey_answers, Answer.1)
survey_ans2 <- select(survey_answers, Answer.2)
survey_ans3 <- select(survey_answers, Answer.3)
survey_ans4 <- select(survey_answers, Answer.4)
survey_ans5 <- select(survey_answers, Answer.5)
survey_ans6 <- select(survey_answers, Answer.6)
survey_ans7 <- select(survey_answers, Answer.7)
survey_ans8 <- select(survey_answers, Answer.8)
#---- most frequent words in question 1, 7 and 8--------
# make sure that the variables with written answers have the strings as characters
survey_ans1 %>% mutate_if(is.factor, as.character) -> survey_ans1
survey_ans7 %>% mutate_if(is.factor, as.character) -> survey_ans7
survey_ans8 %>% mutate_if(is.factor, as.character) -> survey_ans8
Although Blackboard Learn does provide for summary statistics of assessments’ questions, including surveys, objectivity of analysis for open-ended question is limited. For this reason, the most frequent words among questions with text (i.e. word) responses, such as question 1, 7, and 8, were determined. Nevertheless, I did read each individual response from the students to get an initial look at what they had written in these three questions.
Table 1: Most common words among students’ course expectations
# most freq words in answer 1----------------------------
library(qdap) # qualitative data analysis package (it masks %>%)
library(tm) # framework for text mining; it loads NLP package
library(Rgraphviz) # depict the terms within the tm package framework
library(SnowballC); library(RWeka); library(rJava); library(RWekajars) # wordStem is masked from SnowballC
library(Rstem)
library(stringr)
survey_a1 <- tolower(survey_ans1) # make all characeters lower case
survey_a1 <- tm::removeNumbers(survey_a1) # remove numbers from the string
survey_a1 <- str_replace_all(survey_a1 , " ", "") # replace double spaces with single space
survey_a1 <- str_replace_all(survey_a1, pattern = "[[:punct:]]", " ") # remove punctuations from the string
survey_a1 <- tm::removeWords(x = survey_a1, stopwords(kind = "SMART")) # remove common words
corpus <- Corpus(VectorSource(survey_a1)) # turn into corpus
tdm <- TermDocumentMatrix(corpus) # create tdm from the corpus
answer_1 <- freq_terms(text.var = survey_a1, top = 20) # find the 20 most frequent words
library(pander) # load package to output a nice table
pander(head(answer_1, 15)) # print the top 15 most common words
| WORD | FREQ | |
|---|---|---|
| 19 | biology | 21 |
| 63 | learn | 17 |
| 46 | expect | 15 |
| 1 | â | 10 |
| 20 | biologyâ | 5 |
| 65 | life | 5 |
| 13 | basic | 4 |
| 15 | basics | 4 |
| 53 | high | 3 |
| 62 | knowledge | 3 |
| 98 | things | 3 |
| 8 | animals | 2 |
| 10 | apply | 2 |
| 30 | concepts | 2 |
| 64 | learned | 2 |
From the table 1 above, biology, learn, and expect were the three most common words. This would be expected as learn and expect are two keywords in question 1, and biology is a key word in the course’s name (i.e. General Education Biology). Important words to note were life, basic, knowledge, animals, apply, and concepts. This suggest that students expect to learn basic knowledge about life, including of animals, and to learn to apply concepts.
Table 2: Most common words among students responses on how they learn in the classroom
# most frequent words in answer 7---------------------------
survey_a7 <- tolower(survey_ans7) # all charcters to lower case
survey_a7 <- tm::removeNumbers(survey_a7) # remove numbers
survey_a7 <- str_replace_all(survey_a7 , " ", "") # replace double spaces with single space
survey_a7 <- str_replace_all(survey_a7, pattern = "[[:punct:]]", " ")
survey_a7 <- tm::removeWords(x = survey_a7, stopwords(kind = "SMART"))
corpus1 <- Corpus(VectorSource(survey_a7)) # turn into corpus
tdm1 <- TermDocumentMatrix(corpus) # create tdm from the corpus
answer_7 <- freq_terms(text.var = survey_a7, top = 20) # find the 25 most frequent words
pander(head(answer_7, 15)) # print top 15 common words
| WORD | FREQ | |
|---|---|---|
| 1 | â | 17 |
| 49 | learn | 10 |
| 39 | hands | 5 |
| 75 | professor | 5 |
| 96 | teacher | 5 |
| 108 | visual | 5 |
| 113 | writing | 5 |
| 51 | learning | 4 |
| 52 | lecture | 4 |
| 61 | notes | 4 |
| 78 | questions | 4 |
| 28 | examples | 3 |
| 53 | lectures | 3 |
| 70 | powerpoints | 3 |
| 105 | understand | 3 |
From table 2 above, the most common word was learn. Similarly to question 1, this was expected because learn was among the keywords in the question. The letter a with the special character was not considered as a word because it may represented a character with accent, which could have resulted from auto-correction in the student’s smartphones. Nevertheless, the remaining words evidence the high variety of methods by which students consider is the best way to learn in the classroom.
Table 3: Most common words among students’ study habits
# most frequent words in answer 8--------------------------
survey_a8 <- tolower(survey_ans8) # all charcters to lower case
survey_a8 <- tm::removeNumbers(survey_a8) # remove characters
survey_a8 <- str_replace_all(survey_a8 , " ", "") # replace double spaces with single space
survey_a8 <- str_replace_all(survey_a8, pattern = "[[:punct:]]", " ") # remove punctuation
survey_a8 <- tm::removeWords(x = survey_a8, stopwords(kind = "SMART")) # remove common words
corpus2 <- Corpus(VectorSource(survey_a8)) # turn into corpus
tdm2 <- TermDocumentMatrix(corpus) # create tdm from the corpus
answer_8 <- freq_terms(text.var = survey_a8, top = 15) # find the 25 most frequent words
pander(head(answer_8, 15))
| WORD | FREQ | |
|---|---|---|
| 67 | study | 16 |
| 41 | notes | 9 |
| 1 | â | 6 |
| 77 | unanswered | 5 |
| 73 | textbook | 3 |
| 9 | class | 2 |
| 10 | classâ | 2 |
| 12 | computer | 2 |
| 27 | important | 2 |
| 32 | learn | 2 |
| 34 | lecture | 2 |
| 46 | points | 2 |
| 48 | power | 2 |
| 49 | powerpoint | 2 |
| 56 | read | 2 |
As seen in table 1 and 2, a keyword of the question, study, was the most common word among the students’ responses to question 8. Notes and textbook resulted the following two most common words, followed by numerous words having the same frequency. This suggests that students prefer to take notes, refer to the textbook, and revisit lectures (lecture, points, power, powerpoint, and read were among common words related to lectures). Nevertheless, various words having similar low frequencies also suggests different study habit among the students.
Having identified frequent words related to students expectations of the course, how they associate learning in the classroom, and the approaches they reported to employ when studying, I wanted to address how student differ with regards to amount of words they wrote in their responses.
#--what is the distribution of words----------
# For each questions with words, 1) count the words, 2) make a data frame,
# 3) change the variable name, and 4) remove NA values
# question 1
library(stringr)
total_words_ans1 <- as.data.frame(str_count(survey_ans1$Answer.1, '\\s+')+1)
names(total_words_ans1) <- "words1"
total_words_ans1$words1 <- na.omit(total_words_ans1$words1)
# question 2
total_words_ans7 <- as.data.frame(str_count(survey_ans7$Answer.7, '\\s+')+1)
names(total_words_ans7) <- "words7"
total_words_ans7$words7 <- na.omit(total_words_ans7$words7)
# question 3
total_words_ans8 <- as.data.frame(str_count(survey_ans8$Answer.8, '\\s+')+1)
names(total_words_ans8) <- "words8"
total_words_ans8$words8 <- na.omit(total_words_ans8$words8)
# histograms for total words in answer #1, #7, and #8
opar <- par(no.readonly=TRUE) # save the original settings
par(mfrow=c(1,3))
hist(total_words_ans1$words1, main="Question 1", xlab="Total # of Words", col="blue")
hist(total_words_ans7$words7, main="Question 7", xlab="Total # of Words", col="orange")
hist(total_words_ans8$words8, main="Question 8", xlab="Total # of Words", col="green")
par(opar) # reset to the original settings
Figure 2: Distribution of total words per students responses for question 1 (blue histogram), 7 (orange histogram), and 8 (green histogram).
A similar pattern is observed in the distribution of total words per students’ responses in all three questions. For question 1, most of the answers contain 15 words or less, while 10 words or less for question 7 and 8. If all three questions are combined, only 19 of 105 (18%) response contained more than 20 words.
Table 4: Summary statistics for the total number of words per students’ responses to questions 1, 7, and 8
# 1) creat a list of the answers dataframes, 2) rbind data frames of different lenghts
# and 3) rename the variables names
total_words_list <- list(total_words_ans1, total_words_ans7, total_words_ans8)
total_words <- do.call(rbind.fill, total_words_list)
names(total_words) <- c("q1", "q7", "q8")
pander(summary(total_words)) # generate summary statistics
| q1 | q7 | q8 |
|---|---|---|
| Min. : 2.0 | Min. : 1.00 | Min. : 1.000 |
| 1st Qu.: 6.5 | 1st Qu.: 5.00 | 1st Qu.: 4.000 |
| Median :12.0 | Median :10.00 | Median : 7.000 |
| Mean :14.2 | Mean :12.71 | Mean : 8.171 |
| 3rd Qu.:19.0 | 3rd Qu.:16.50 | 3rd Qu.:11.500 |
| Max. :43.0 | Max. :42.00 | Max. :22.000 |
| NA’s :70 | NA’s :70 | NA’s :70 |
All descriptive statistics decreased from question 1 to question 8. The mean for questions 1, 7, 8 were 14.2, 12.7, and 8.2 words, respectively. A similar pattern was observed with the median (12, 10, and 7 words for questions 1, 7, and 8, respectively). It is worthwhile evaluating if this would be an effect of questions being online or an artifact of the questions themselves. Important to note that all questions required students’ opinions and self-reflections, but question 7 and 8 were related.
library(tidyr) # load tidyr to generate a tidy dataset
# gather the variabels and generate a tidy dataset
total_words_long <- gather(total_words, question, words, q1:q8)
# compare the distribution of total words per student per answer
boxplot(words ~ question, total_words_long, main="Comparison of Words Distributions",
xlab="Questions Number", ylab="Frequency of Total Words", col=c("blue", "orange", "green"))
Figure 3: Boxplot of total words per students’ responses for questions 1(blue boxplot), 7(orange boxplot), and 8(green. This graph further evidence the decreased in total words per student response from question 1 to question 8.
Table 5: Percentage of students self-reported technology familiarity
#---question 2--------------------------------
# rename the levels in answers from question #2
levels(survey_ans2$Answer.2) <- c("Advanced Tech", "Low-Level Tech", "Mid-Level Tech")
options(digits=3)
pander(prop.table(table(survey_ans2))*100)
| Advanced Tech | Low-Level Tech | Mid-Level Tech |
|---|---|---|
| 57.14 | 5.714 | 37.14 |
As reported in table 4 above, most students (nearly 95%) reported been familiar with technology, with more than half of the classroom (57%) reporting having advanced technology familiarity. It is important to mention that, although there were only three choices, students were not provided with details of the categories.
Table 6: Percentage of students with familiarity to specific technology platforms
# rename the levels in answers from question #3
levels(survey_ans3$Answer.3) <- c("Unanswered","LSM", "Social Media and Email", "Social Media and Email",
"Social Media and Email", "Social Media and Email")
options(digits=3)
pander(prop.table(table(survey_ans3))*100)
| Unanswered | LSM | Social Media and Email |
|---|---|---|
| 2.857 | 2.857 | 94.29 |
Most of the students (94%) reported being familiar with commonly used platforms, such as social media (e.g. Facebook, Twitter, Instagram, Snapchat) and email. On the contrary, very few reported familiarity Learning Management Systems, such as Blackboard. In this questions, students had the option to select more than one choice, and these were the following: a) Social Media (e.g. Facebook, Twitter, Instagram, Snapchat, etc.), b) Could storage (e.g. DropboxTM, GoogleDriveTM, OneDriveTM, iCloudTM, etc.), c) Learning Management Systems (e.g. BlackboardTM, SchoologyTM), d) using email (e.g. Gmail, mymdc.net, or any other email service).
Table 7: Percentage of students with familiarity to specific technology platforms
#---question 4--------------------------------
# rename the levels in answers from question #4
levels(survey_ans4$Answer.4) <- c("Unanswered", "No", "Yes")
options(digits=3)
pander(prop.table(table(survey_ans4))*100)
| Unanswered | No | Yes |
|---|---|---|
| 2.857 | 54.29 | 42.86 |
The majority of the classroom reported not having experienced with using a platform in which they can see the lecture remotely on their mobile equipment (smartphones, tables, PCs or Mac), while at the same time the professor is lecturing. Based on this finding, a platform with these capabilities was launched in the classroom, but was shortly discontinued given that it did not performed as expected.
Table 8: Percentage of students with familiarity to specific technology platforms
#---question 5--------------------------------
# rename the levels in answers from question #5
levels(survey_ans5$Answer.5) <- c("Unanswered", "No", "Yes")
options(digits=3)
pander(prop.table(table(survey_ans5))*100)
| Unanswered | No | Yes |
|---|---|---|
| 2.857 | 40 | 57.14 |
The proportion of students to have experience with remote response applications, such as iClickerTm(https://www1.iclicker.com/) or Reef-PollingTM(https://www1.iclicker.com/products/reef-polling/), was opposite to table 7: more than half of the classroom reported having had experienced with these or similar types of applications. As a result, an application of this type was implemented a the substitute for the platform discontinued and discussed in table 7.
#---question 6--------------------------------
# rename the levels in answers from question #6
levels(survey_ans6$Answer.6) <- c("Unanswered","0", "1", "2", "2-3", "2", "5", "3", "3",
"39", "3", "4", "4", "4", "funny number",
"6", "0")
survey_ans6$Answer.6 <- factor(survey_ans6$Answer.6, order=TRUE)
options(digits=3)
pander(prop.table(table(survey_ans6))*100)
| Unanswered | 0 | 1 | 2 | 2-3 | 5 | 3 | 39 | 4 | funny number | 6 |
|---|---|---|---|---|---|---|---|---|---|---|
| 2.857 | 11.43 | 2.857 | 22.86 | 2.857 | 2.857 | 25.71 | 2.857 | 17.14 | 2.857 | 5.714 |
Recent biological education exposure varied within in the classroom, mostly ranging from 0 to 5 years since their last biology or science course.
The data analysis report presented here evaluated the students’ responses from a survey administered the first day of class. This survey addressed the 1) students’ expectations of the course, 2) their familiarity with technology, 3) their previous biological or science exposure, 4) how they learn in the classroom, and 5) their study habits. Questions 1, 7, and 8, through open-ended approaches, assessed students’ expectations of the course, how they learn in the classroom, and their study habits.
Gaining basic biological knowledge and being able to apply concepts seems to be among the students’ expectations for the course. Although biology, learn, and expect were the most common words (they are keywords) in question 1, life, basic, knowledge, animals, apply, and concepts were also frequent words extracted from text-mining.
As expected, students reported different preferred learning methods in and outside the classroom. As professors, we must take into consideration the variety of learning approaches students bring into the classroom and, for this reason, present the topics to be discussed in different formats. The benefit is not only for students to get comfortable with a specific format of information but to also encourage them outside their learning comfort zones.
Before implementing a technological approach in the classroom, it is worthwhile to evaluate students’ prior experience with the given or similar technology. For example, most of the students reported not having experience with applications in which they would be able to see in their mobile device the lecture, in Microsoft PowerPointTM, in real-time. On the contrary, most students did report having experience with remote-response systems. Although the former option was initially attempted in the classroom, the latter option was able to be implemented successfully because the majority of the students had seen or used similar applications before. If the current application fails to perform, social media and email can be implemented as alternative given that the vast majority reported being familiar with these platforms.
Lastly, and similar to different learning approaches and study habits, the classroom reported a considerable wide range of years from the last biology or science course. As with learning approaches in the classroom, prior knowledge of the students must be consider throughout prospective course plans and any needed adjustments.
In conclusion, carefully designed questions in brief surveys may serve as a tool to get to know the students, such as what they look to gain from the course, the technology they are familiar with and we, as professor, seek to implement, and there does their biology experience stands.