In this project, I am interested in exploring the topics from a sample of tweets about the Next Generation Science Standards (NGSS) and the Common Core State Standards (CCSS). In particular, I am interested in learning how to define the āk valueā which allows meaningful interpretation for a substantial amount of text data. I used both the LDA and the Structural Topic models in this analysis.
library(tidyverse)
## āā Attaching packages āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā tidyverse 1.3.2 āā
## ā ggplot2 3.4.1 ā purrr 1.0.1
## ā tibble 3.1.8 ā dplyr 1.1.0
## ā tidyr 1.3.0 ā stringr 1.5.0
## ā readr 2.1.4 ā forcats 1.0.0
## āā Conflicts āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā tidyverse_conflicts() āā
## ā dplyr::filter() masks stats::filter()
## ā dplyr::lag() masks stats::lag()
library(tidytext)
library(SnowballC)
library(topicmodels)
library(stm)
## stm v1.3.6 successfully loaded. See ?stm for help.
## Papers, resources, and other materials at structuraltopicmodel.com
library(ldatuning)
library(knitr)
library(LDAvis)
library(readxl)
In this analysis, I used the NGSS and the CCSS tweets sample from my previous analysis.
ngss_tweets <- read_xlsx("ngss_tweets.xlsx")
ccss_tweets <- read_xlsx("csss_tweets.xlsx")
ngss_aim <- ngss_tweets %>%
filter(lang == "en") %>%
select(user_id,text) %>%
drop_na() %>%
mutate (standard = "NGSS")
#dim(ngss_aim) #325 3
ccss_aim <- ccss_tweets %>%
filter(lang == "en") %>%
select(user_id,text) %>%
drop_na() %>%
mutate (standard = "CCSS")
#dim(ccss_aim) #1116 3
Before the topic modeling analysis, I first tokenized the text as shown below:
ngss_tidy <- ngss_aim %>%
unnest_tokens(output = word, input = text) %>%
anti_join(stop_words, by = "word") %>%
filter(!word %in% c("amp","https","t.co"))
ngss_tidy %>%
count(word, sort = TRUE)
## # A tibble: 2,554 Ć 2
## word n
## <chr> <int>
## 1 ngss 217
## 2 science 125
## 3 ngsschat 70
## 4 students 56
## 5 standards 40
## 6 stem 38
## 7 teachers 38
## 8 learning 37
## 9 ngss_tweeps 28
## 10 school 24
## # ⦠with 2,544 more rows
ccss_tidy <- ccss_aim %>%
unnest_tokens(output = word, input = text) %>%
anti_join(stop_words, by = "word") %>%
filter(!word %in% c("amp","https","t.co"))
ccss_tidy %>%
count(word, sort = TRUE)
## # A tibble: 5,422 Ć 2
## word n
## <chr> <int>
## 1 common 1104
## 2 core 1100
## 3 math 444
## 4 school 104
## 5 kids 103
## 6 grade 92
## 7 education 90
## 8 schools 85
## 9 students 85
## 10 teachers 84
## # ⦠with 5,412 more rows
ngss_dtm <- ngss_tidy %>%
count(user_id, word) %>%
cast_dtm(user_id, word, n)
ccss_dtm <- ccss_tidy %>%
count(user_id, word) %>%
cast_dtm(user_id, word, n)
I used two methods: the FindToicsNumber () function and the toLDAvis () function. Although the FindTopicsNumber () shows some suggestions for finding the K value, I used the toLDAvis () feature to see how the clusters/topics overlap with each other. The toLDAvis() results will be shown in the STM model section. In summary, finding the K value requires many rounds of iterations to try different options. As a result, I set the value of the K to 10 for the NGSS sample; and the K to 8 for the CCSS sample.
k_metrics_ngss <- FindTopicsNumber(
ngss_dtm,
topics = seq(5, 50, by = 5),
metrics = "Griffiths2004",
method = "Gibbs",
control = list(),
mc.cores = NA,
return_models = FALSE,
verbose = FALSE,
libpath = NULL
)
FindTopicsNumber_plot(k_metrics_ngss)#15
## Warning: The `<scale>` argument of `guides()` cannot be `FALSE`. Use "none" instead as
## of ggplot2 3.3.4.
## ā¹ The deprecated feature was likely used in the ldatuning package.
## Please report the issue at <]8;;https://github.com/nikita-moor/ldatuning/issueshttps://github.com/nikita-moor/ldatuning/issues]8;;>.
k_metrics_ccss <- FindTopicsNumber(
ccss_dtm,
topics = seq(5, 50, by = 5),
metrics = "Griffiths2004",
method = "Gibbs",
control = list(),
mc.cores = NA,
return_models = FALSE,
verbose = FALSE,
libpath = NULL
)
FindTopicsNumber_plot(k_metrics_ccss)#10
####lda model
ngss_lda <- LDA(ngss_dtm,
k = 10,
control = list(seed = 588)
)
ngss_lda
## A LDA_VEM topic model with 10 topics.
terms(ngss_lda, 5)
## Topic 1 Topic 2 Topic 3 Topic 4 Topic 5 Topic 6
## [1,] "ngss" "ngss" "science" "ngss" "ngss" "ngss"
## [2,] "science" "stem" "ngss" "ngsschat" "students" "science"
## [3,] "phenomena" "2" "standards" "heat" "teachers" "students"
## [4,] "tjscience" "workshop" "learning" "science" "ngss_tweeps" "ngss_tweeps"
## [5,] "students" "register" "students" "school" "science" "learning"
## Topic 7 Topic 8 Topic 9 Topic 10
## [1,] "ngss" "ngss" "ngss" "science"
## [2,] "ngsschat" "science" "teachers" "ngsschat"
## [3,] "science" "engineering" "openscied" "educhat"
## [4,] "stem" "core" "ngsschat" "apchem"
## [5,] "loved" "school" "1" "apchemistry"
ngss_lda <- tidy(ngss_lda)
top_terms_ngss <- ngss_lda %>%
group_by(topic) %>%
slice_max(beta, n = 5, with_ties = FALSE) %>%
ungroup() %>%
arrange(topic, -beta)
top_terms_ngss %>%
mutate(term = reorder_within(term, beta, topic)) %>%
group_by(topic, term) %>%
arrange(desc(beta)) %>%
ungroup() %>%
ggplot(aes(beta, term, fill = as.factor(topic))) +
geom_col(show.legend = FALSE) +
scale_y_reordered() +
labs(title = "Top 5 terms in each LDA topic",
x = expression(beta), y = NULL) +
facet_wrap(~ topic, ncol = 4, scales = "free")
ccss_lda <- LDA(ccss_dtm,
k = 8,
control = list(seed = 588)
)
ccss_lda
## A LDA_VEM topic model with 8 topics.
terms(ccss_lda, 5)
## Topic 1 Topic 2 Topic 3 Topic 4 Topic 5 Topic 6
## [1,] "common" "core" "common" "grade" "core" "common"
## [2,] "core" "common" "core" "students" "common" "core"
## [3,] "schools" "math" "math" "commoncore" "math" "math"
## [4,] "school" "children" "kids" "level" "learning" "teachers"
## [5,] "children" "education" "lol" "technology" "curriculum" "education"
## Topic 7 Topic 8
## [1,] "core" "core"
## [2,] "common" "common"
## [3,] "math" "math"
## [4,] "education" "grade"
## [5,] "kids" "people"
ccss_lda <- tidy(ccss_lda)
top_terms_ccss <- ccss_lda %>%
group_by(topic) %>%
slice_max(beta, n = 5, with_ties = FALSE) %>%
ungroup() %>%
arrange(topic, -beta)
top_terms_ccss %>%
mutate(term = reorder_within(term, beta, topic)) %>%
group_by(topic, term) %>%
arrange(desc(beta)) %>%
ungroup() %>%
ggplot(aes(beta, term, fill = as.factor(topic))) +
geom_col(show.legend = FALSE) +
scale_y_reordered() +
labs(title = "Top 5 terms in each LDA topic",
x = expression(beta), y = NULL) +
facet_wrap(~ topic, ncol = 4, scales = "free")
#processing and stemming for STM
ngss_temp <- textProcessor(ngss_aim$text,
metadata = ngss_aim,
lowercase=TRUE,
removestopwords=TRUE,
removenumbers=TRUE,
removepunctuation=TRUE,
wordLengths=c(3,Inf),
stem=TRUE,
onlycharacter= FALSE,
striphtml=TRUE,
customstopwords=NULL)
## Building corpus...
## Converting to Lower Case...
## Removing punctuation...
## Removing stopwords...
## Removing numbers...
## Stemming...
## Creating Output...
ngss_meta <- ngss_temp$meta
ngss_vocab <- ngss_temp$vocab
ngss_docs <- ngss_temp$documents
ccss_temp <- textProcessor(ccss_aim$text,
metadata = ccss_aim,
lowercase=TRUE,
removestopwords=TRUE,
removenumbers=TRUE,
removepunctuation=TRUE,
wordLengths=c(3,Inf),
stem=TRUE,
onlycharacter= FALSE,
striphtml=TRUE,
customstopwords=NULL)
## Building corpus...
## Converting to Lower Case...
## Removing punctuation...
## Removing stopwords...
## Removing numbers...
## Stemming...
## Creating Output...
ccss_meta <- ccss_temp$meta
ccss_vocab <- ccss_temp$vocab
ccss_docs <- ccss_temp$documents
ngss_stm <- stm(documents=ngss_docs,
data=ngss_meta,
vocab=ngss_vocab,
prevalence =~ user_id,
K=10,
max.em.its=25,
verbose = FALSE)
ngss_stm
## A topic model with 10 topics, 325 documents and a 2321 word dictionary.
plot.STM(ngss_stm, n = 5)
toLDAvis(mod = ngss_stm, docs = ngss_docs) #k=20 and other options, too many overlap
## Loading required namespace: servr
ccss_stm <- stm(documents=ccss_docs,
data=ccss_meta,
vocab=ccss_vocab,
#prevalence =~ user_id,
K=8,
max.em.its=25,
verbose = FALSE)
ccss_stm
## A topic model with 8 topics, 1116 documents and a 4757 word dictionary.
plot.STM(ccss_stm, n = 5)
toLDAvis(mod = ccss_stm, docs = ccss_docs)#k=15,10 and other options. too many overlap
## Loading required namespace: servr
According to the topic options shown in the previous two sections using the LDA and the STM models, I looked at the words from each topic, using the findThoughts () function to understand the meanings of each topic. I reported my findings in the next section.
findThoughts(ngss_stm,
texts = ngss_aim$text,
topics = 7,
n = 20,
thresh = 0.5)
##
## Topic 7:
## How does increasing the mass and speed affect the damage result in a collision? 8th grade sacrificed crackers to find out. @AELCondors #NGSS #BackInTheLab https://t.co/v2HW59G0vI
## š ATTN: Teachers! š Inspire your students and spark their curiosity with #VirtualFieldTrips! NGSS-aligned content for K-5. FREE activity guides available in English and Spanish. š„ Access them at https://t.co/LXSCiAxSXy #CaliforniaScienceCenter https://t.co/hysA8tJ3y1
## @ngss_official Hi Santhosh! We understand and apologize as this is taking a while. We will share an update with you at the earliest. Thank you, Alima B https://t.co/2G23qpsqlP
## @TomSchimmer @xphils @dramaqueenbrc @kenoc7 @kenmattingly @josh_cormier Totally agree. We must consider the DOK we are assessing. Level 1 then MC is ok. Not needed or necessary, but valid. There are so many better ways and the CCSS, NGSS, and C3 are all conceptual, so I hope MC is the outlier assessment. Neither authentic or rigorous! #ATAssessment
## āBanquet tableā style chemistry labs today. Who says partner science canāt happen with physical distancing protocols? Split materials touched, share the data and find conclusions with your partner! #elesci #fifthgrade #NGSS @UASLearns https://t.co/uPPeRLsFcN
## Why do some animals live together in groups? In our third grade informational text for 3-LS2-1, students learn about the ways animals stick together to stay safe. Read our latest blog post to access the text and accompanying activities for free! https://t.co/I1SWGQGnrK https://t.co/u0uWBkLGDE
## If only you could HEAR these kazoos! Sorry families... Best Friday science-sound-wave jam session ever! #ngss @KBurke4242 @MuziLearningLab https://t.co/Ef12npD2cU
## Chemistry in 5th grade... investigating solutions with particles. Can we tell if layers are removed or substances changed? #NGSS #elemsci @UASLearns https://t.co/aF0XpS1bhj
## I love my job! I partnered a student and scientist for an interview about STEM careers. Thank you ā¦@forgedonyxā© for sharing with us and showing us what a scientist is! https://t.co/VRxxnKXR6H
## Congratulations to Cohort 15 educators who completed the NGSS science and equity program! Thank you for taking the time reflect and to disrupt science education to reflect needs of the 21st century! https://t.co/BgDF6w1zJR
## tl;dr it's hard and expecting teacher candidates to bust out NGSS units before they've seen any examples is unrealistic. I've been working on this since 2014 and it's still hard. But doubly so when you're teaching a subject that is less-developed in general. 6/
## How should we discuss climate change and human impacts on the environment? Coping? Managing? What are ways in which we can move to a caring stance on being part of the global system? #ngss #ngsschat https://t.co/0vPefdlAZ4
## Thank you @Meliseymo for continuing our interview series, SRVUSD Meets #BlackInScience. This is what what a scientist looks like. Another great interview from one of our students! https://t.co/WCobTTiT0Y
## Explore #transmission, #absorption and #reflection of waves! Build a wave machine with our experiment and examine how different materials react to create these wave behaviours. Aligned to NGSS-PS4-2, great for all ages. https://t.co/j4ExH28NO2
## #ScienceAtHome #STEMeducation https://t.co/y1TSQ0RAbF
## Like a squid at sea, SquidBooks adjusts to the needs of diverse learners. #scienceteachers #NGSS https://t.co/ZJWIFaHyN8
## šš¬
##
## Did you know your #students can download @NASA footageā
##
## They can. And it makes great material for a WeVideo. Check it out...
##
## @ValeriaTeaches #Mars2020 @NGSS_tweeps @NSTA #STEM #science #edtech #learning https://t.co/90sv4mNGdm
## Great video linking climate change w/ extreme weather events. Thx @KHayhoe for clear explanation of what scientists know as well as what we are still figuring out. Useful resource for tchg climate change & nature of science @NSTA @NGSS_tweeps @NGSSchat https://t.co/nDLwcipbxe
## Kindergartners wrote their own weather report that included information about the weather, what you should wear and something you can do for fun in that kind of weather! They then recorded their weather forecasts on @Flipgrid! #STEMeducation #ngss https://t.co/2hemwCRAbx
## Using peer feedback to adjust our climate models for our first benchmark in Ms Scallonās class, proud of our work! Building technology to support life in extreme climates.
##
## #NGSS #PBL #AODL
##
## @cmsadmins @MJDAmico_GPS @DrJones_GPS https://t.co/kJndYTnOVy
## I loved this video... thanks for sharing.#ngsschat #scienceed #STEM https://t.co/6NhRSHNrZR
findThoughts(ccss_stm,
texts = ccss_aim$text,
topics = 4,
n = 30,
thresh = 0.5)
##
## Topic 4:
## āIf weāre not teaching graphic novels and comics, weāre actually not meeting common core!ā Thank you @historycomics @4csla #graphicnovels #visualliteracy #4CSLA https://t.co/bqX63gb31t
## @Dale_Chu @FiftyCAN @jeanjeanielindz @collabteacher @afhyslop @alspur @slamsonite @epfp_iel @moh_choudhury @absofabucorn @sagenwok @mdjtooley @KenKt13 @matthewladner @MelineMD @weiss_edu @SmPotatoes @angrybklynmom @BarbarianCap @amyrberman @Education_44 @jenreesman It looks like this organization is in favor of standardized testing because of data? And it's a project of something called New Venture fund? And your main goal is nationalized Common Core? Your biggest donor is the Bill & Melinda Gates Foundation?
## Where do you teach, exactly?
## @amuse Is that all? Now what was it dear old Adolph said about his jewish ancestors? Now they have their own Nation! These kind of professors only annoy me when their captive audience is below puberty! Now a common core principle!
## @BullsBearsBTC @StanTradingMan @CRyanSchadel Simple math...this isnāt common core...total is 8...subtract his 2...6 is the float. This isnāt hard to comprehend...or is it?
## @deutsch29blog @Network4pubEd @DianeRavitch @carolburris @leoniehaimson @clanghoff1 @valeriestrauss @palan57 What did everyone expect? He was a part of Obama's administration that began the horrendous Common Core which "cut out" elder experienced educators and "demoralized" millions of children w/synthetic and unproven measures. @DianeRavitch @carolburris A team with weak links.
## @Victori35619148 @DenSow57 The gubment's money corrupts everything. Education, research, arts. Common Core, originally, a common curriculum agreement between states. Fed money moved in and hijacked it to push an agenda. Killed a good idea.
## Gubment money is not free. It comes with strings.
## @no1worries1 @geostylegeo I FUCKING KNEW IT! Common core didnāt work out the way they wanted, so now what they tried changing is automatically racist!
##
## And theyāll fund research to prove what they want āout thereā.
## Common Core English Workbook: Grade 5 English by Ace Academic Publishing
## Last access : 65849 user
## Last server checked : 13 Minutes ago!
##
## Common Core English Workbook: Grade 5 English by Ace Academic Publishing PDF EBOOK EPUB MOBI Kindle
## Common Core English Workbook: Grade
## So, how does this make any kind of sense AT ALL.
## āLimited seating indoors, but full seating outdoors inside of a heated tentā. Is this common core common sense @GovMurphy ??
## @jordangerous Itās funny re:āGreek to you.ā Yes me too. One of the common core ideas too (if I understand it) is the teacher can encourage students to assist other students. If a student can teach the concept to his neighbor, it helps HIM/HER understand it. You always sound like a great mom!!
In this weekās analysis, I looked into the topics among a sample of tweets related to the NGSS and the CCSS. I used the LDA and the STM methods to generate topics. Based on the K values, I looked into different tweet samples and compared them and found : - Ā Resources for teaching NGSS: The topics about how to teach NGSS centered on the term NGSS standard and teacher. For instance, some tweets from Topic 5 or Topic 2 show teaching NGSS used different resources, as mentioned in the tweets, which advertise lesson ideas, virtual events, or teaching examples/questions. - Ā Student interactions with NGSS: The topics about how students approach NGSS learning opportunities with a focus on the term students, models, learning, and science. For instance, some tweets from Topic 3, Topic 1, or Topic 7 could reflect some evidence of how students learn about science at school.
Regarding the CCSS tweets, I found that the Twitter usersā overall attitude toward the CCSS appeared negative. Given some of the tweets do not show a precise meaning, I put one theme for the CCSS tweets: - Ā Questioning CCSS Math: Interestingly, the tweets about CCSS (Topics 6,7,8) seemed to have negative feedback about the math standards. Such evidence showed that the users in this sample questioned how the CCSS math standard would benefit students learning math at school.
In summary, when conducting topic modeling for text, evaluating the interpretation of meanings among the text sample using different k values is essential. Moreover, given several rounds of adjustment analysis for the k values, it is always crucial to connect some text samples to understand better whether the topic(s) were grouped meaningfully. However, I noticed that additional data cleaning is needed for the tweeter data as some of the user names and tagging should be trimmed even though the stemming process was applied for the data sets. I will address this issue if I use the tweeter data again in the upcoming analysis.Ā