Topic Modeling Badge

The final activity for each learning lab provides space to work with data and to reflect on how the concepts and techniques introduced in each lab might apply to your own research.

To earn a badge for each lab, you are required to respond to a set of prompts for two parts:

Part I: Reflect and Plan

Use the institutional library (e.g. NCSU Library), Google Scholar or search engine to locate a research article, presentation, or resource that applies text mining to an educational context or topic of interest. More specifically, locate a text mining study that visualize text data.

Provide an APA citation for your selected study.
How does topic modeling address research questions?

Draft a research question for a population you may be interested in studying, or that would be of interest to educational researchers, and that would require the collection of text data and answer the following questions:

What text data would need to be collected?
For what reason would text data need to be collected in order to address this question?
Explain the analytical level at which these text data would need to be collected and analyzed.

Part II: Data Product

Use your case study file to try a small number of topics (e.g., 3) or a large number of topics (e.g., 30) and explain how changing number of topics shape the way you interpret results.

The change of the number of topics lead to a totally different set of topics. The explanations of the 3 topics are different from what was listed in the case study. It might or might not be align with the corresponding theory. Therefore, it is worth exploring the number of topics while conducting the analysis.

I highly recommend creating a new R script in your lab-3 folder to complete this task. When your code is ready to share, use the code chunk below to share the final code for your model and answer the questions that follow.

# YOUR FINAL CODE HERE
library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.2     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.4.2     ✔ tibble    3.2.1
## ✔ lubridate 1.9.2     ✔ tidyr     1.3.0
## ✔ purrr     1.0.1     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(tidytext)
library(SnowballC)
library(topicmodels)
library(stm)

## stm v1.3.6 successfully loaded. See ?stm for help. 
##  Papers, resources, and other materials at structuraltopicmodel.com

library(ldatuning)
library(knitr)
library(LDAvis)

#read data
ts_forum_data <- read_csv("/cloud/project/lab-3/data/ts_forum_data.csv", 
                          col_types = cols(course_id = col_character(),
                                           forum_id = col_character(), 
                                           discussion_id = col_character(), 
                                           post_id = col_character()
                          )
)

#tidying data
forums_tidy <- ts_forum_data %>%
  unnest_tokens(output = word, input = post_content) %>%
  anti_join(stop_words, by = "word")

forums_tidy

## # A tibble: 192,160 × 14
##    course_id course_name       forum_id forum_name discussion_id discussion_name
##    <chr>     <chr>             <chr>    <chr>      <chr>         <chr>          
##  1 9         Teaching Statist… 126      Investiga… 6822          Not much compa…
##  2 9         Teaching Statist… 126      Investiga… 6822          Not much compa…
##  3 9         Teaching Statist… 126      Investiga… 6822          Not much compa…
##  4 9         Teaching Statist… 126      Investiga… 6822          Not much compa…
##  5 9         Teaching Statist… 126      Investiga… 6822          Not much compa…
##  6 9         Teaching Statist… 126      Investiga… 6822          Not much compa…
##  7 9         Teaching Statist… 126      Investiga… 6822          Not much compa…
##  8 9         Teaching Statist… 126      Investiga… 6822          Not much compa…
##  9 9         Teaching Statist… 126      Investiga… 6822          Not much compa…
## 10 9         Teaching Statist… 126      Investiga… 6822          Not much compa…
## # ℹ 192,150 more rows
## # ℹ 8 more variables: discussion_creator <dbl>, discussion_poster <dbl>,
## #   discussion_reference <chr>, parent_id <dbl>, post_date <chr>,
## #   post_id <chr>, post_title <chr>, word <chr>

forums_tidy %>%
  count(word, sort = TRUE)

## # A tibble: 13,620 × 2
##    word           n
##    <chr>      <int>
##  1 students    6841
##  2 data        4365
##  3 statistics  3103
##  4 school      1488
##  5 questions   1470
##  6 class       1426
##  7 font        1311
##  8 span        1267
##  9 time        1253
## 10 style       1150
## # ℹ 13,610 more rows

forums_dtm <- forums_tidy %>%
  count(post_id, word) %>%
  cast_dtm(post_id, word, n)

#stemming for STM
library(stm)
temp <- textProcessor(ts_forum_data$post_content, 
                      metadata = ts_forum_data,  
                      lowercase=TRUE, 
                      removestopwords=TRUE, 
                      removenumbers=TRUE,  
                      removepunctuation=TRUE, 
                      wordLengths=c(3,Inf),
                      stem=TRUE,
                      onlycharacter= FALSE, 
                      striphtml=TRUE, 
                      customstopwords=NULL)

## Building corpus... 
## Converting to Lower Case... 
## Removing punctuation... 
## Removing stopwords... 
## Removing numbers... 
## Stemming... 
## Creating Output...

meta <- temp$meta
vocab <- temp$vocab
docs <- temp$documents

stemmed_forums <- ts_forum_data %>%
  unnest_tokens(output = word, input = post_content) %>%
  anti_join(stop_words, by = "word") %>%
  mutate(stem = wordStem(word))

stemmed_forums

## # A tibble: 192,160 × 15
##    course_id course_name       forum_id forum_name discussion_id discussion_name
##    <chr>     <chr>             <chr>    <chr>      <chr>         <chr>          
##  1 9         Teaching Statist… 126      Investiga… 6822          Not much compa…
##  2 9         Teaching Statist… 126      Investiga… 6822          Not much compa…
##  3 9         Teaching Statist… 126      Investiga… 6822          Not much compa…
##  4 9         Teaching Statist… 126      Investiga… 6822          Not much compa…
##  5 9         Teaching Statist… 126      Investiga… 6822          Not much compa…
##  6 9         Teaching Statist… 126      Investiga… 6822          Not much compa…
##  7 9         Teaching Statist… 126      Investiga… 6822          Not much compa…
##  8 9         Teaching Statist… 126      Investiga… 6822          Not much compa…
##  9 9         Teaching Statist… 126      Investiga… 6822          Not much compa…
## 10 9         Teaching Statist… 126      Investiga… 6822          Not much compa…
## # ℹ 192,150 more rows
## # ℹ 9 more variables: discussion_creator <dbl>, discussion_poster <dbl>,
## #   discussion_reference <chr>, parent_id <dbl>, post_date <chr>,
## #   post_id <chr>, post_title <chr>, word <chr>, stem <chr>

#Model Fitting: LDA

n_distinct(ts_forum_data$forum_name)

## [1] 21

forums_lda <- LDA(forums_dtm, 
                  k = 3, 
                  control = list(seed = 588)
)

forums_lda

## A LDA_VEM topic model with 3 topics.

#Model Fitting: STM

docs <- temp$documents 
meta <- temp$meta 
vocab <- temp$vocab 

forums_stm <- stm(documents=docs, 
                  data=meta,
                  vocab=vocab, 
                  prevalence =~ course_id + forum_id,
                  K=3,
                  max.em.its=25,
                  verbose = FALSE)

forums_stm

## A topic model with 3 topics, 5781 documents and a 7820 word dictionary.

toLDAvis(mod = forums_stm, docs = docs)

## Loading required namespace: servr

#Exploring beta values
terms(forums_lda, 5)

##      Topic 1  Topic 2      Topic 3     
## [1,] "font"   "statistics" "students"  
## [2,] "span"   "href"       "data"      
## [3,] "style"  "li"         "statistics"
## [4,] "text"   "strong"     "questions" 
## [5,] "normal" "https"      "school"

tidy_lda <- tidy(forums_lda)
tidy_lda

## # A tibble: 40,860 × 3
##    topic term          beta
##    <int> <chr>        <dbl>
##  1     1 2015      1.98e- 4
##  2     2 2015      5.59e- 4
##  3     3 2015      5.69e- 5
##  4     1 21        1.44e-40
##  5     2 21        1.32e- 4
##  6     3 21        1.29e-17
##  7     1 beginning 5.02e- 5
##  8     2 beginning 1.34e- 4
##  9     3 beginning 8.14e- 4
## 10     1 content   5.24e- 4
## # ℹ 40,850 more rows

top_terms <- tidy_lda %>%
  group_by(topic) %>%
  slice_max(beta, n = 5, with_ties = FALSE) %>%
  ungroup() %>%
  arrange(topic, -beta)

top_terms %>%
  mutate(term = reorder_within(term, beta, topic)) %>%
  group_by(topic, term) %>%    
  arrange(desc(beta)) %>%  
  ungroup() %>%
  ggplot(aes(beta, term, fill = as.factor(topic))) +
  geom_col(show.legend = FALSE) +
  scale_y_reordered() +
  labs(title = "Top 5 terms in each LDA topic",
       x = expression(beta), y = NULL) +
  facet_wrap(~ topic, ncol = 4, scales = "free")

#exploring gamma value
td_beta <- tidy(forums_lda)
td_beta

## # A tibble: 40,860 × 3
##    topic term          beta
##    <int> <chr>        <dbl>
##  1     1 2015      1.98e- 4
##  2     2 2015      5.59e- 4
##  3     3 2015      5.69e- 5
##  4     1 21        1.44e-40
##  5     2 21        1.32e- 4
##  6     3 21        1.29e-17
##  7     1 beginning 5.02e- 5
##  8     2 beginning 1.34e- 4
##  9     3 beginning 8.14e- 4
## 10     1 content   5.24e- 4
## # ℹ 40,850 more rows

td_gamma <- tidy(forums_lda, matrix = "gamma")
td_gamma

## # A tibble: 17,298 × 3
##    document topic    gamma
##    <chr>    <int>    <dbl>
##  1 11295        1 0.00335 
##  2 12711        1 0.000413
##  3 12725        1 0.0717  
##  4 12733        1 0.00393 
##  5 12743        1 0.0146  
##  6 12744        1 0.00688 
##  7 12756        1 0.0717  
##  8 12757        1 0.00500 
##  9 12775        1 0.110   
## 10 12816        1 0.00500 
## # ℹ 17,288 more rows

top_terms <- td_beta %>%
  arrange(beta) %>%
  group_by(topic) %>%
  top_n(7, beta) %>%
  arrange(-beta) %>%
  #select(topic, term) %>%
  summarise(terms = list(term)) %>%
  mutate(terms = map(terms, paste, collapse = ", ")) %>% 
  unnest()

## Warning: `cols` is now required when using `unnest()`.
## ℹ Please use `cols = c(terms)`.

top_terms

## # A tibble: 3 × 2
##   topic terms                                                     
##   <int> <chr>                                                     
## 1     1 font, span, style, text, normal, 0px, height              
## 2     2 statistics, href, li, strong, https, resources, target    
## 3     3 students, data, statistics, questions, school, class, time

gamma_terms <- td_gamma %>%
  group_by(topic) %>%
  summarise(gamma = mean(gamma)) %>%
  arrange(desc(gamma)) %>%
  left_join(top_terms, by = "topic") %>%
  mutate(topic = paste0("Topic ", topic),
         topic = reorder(topic, gamma))

gamma_terms %>%
  #select(topic, gamma, terms) %>%
  kable(digits = 3, 
        col.names = c("Topic", "Expected topic proportion", "Top 7 terms"))

Topic	Expected topic proportion	Top 7 terms
Topic 3	0.780	students, data, statistics, questions, school, class, time
Topic 2	0.176	statistics, href, li, strong, https, resources, target
Topic 1	0.044	font, span, style, text, normal, 0px, height

#reading tea leaves
ts_forum_data_reduced <-ts_forum_data$post_content[-temp$docs.removed]

findThoughts(forums_stm,
             texts = ts_forum_data_reduced,
             topics = 2, 
             n = 10,
             thresh = 0.5) #topic 2

## 
##  Topic 2: 
##       <!--[if !supportLists]-->¬∑         <!--[endif]--><span dir=\LTR\"></span>Do wooden coasters tend to have the same maximum height as coasters made from steel? Does anything surprise you?      The height of steel coaster tends to be higher than the height of wooden coaster back in the populations.      What surprising me:      Although the box-plot of height of both types of coasters have outliers  the mean and median in both are almost same. In addition  sample observations looks like follow approximately normal distribution although outliers are exist.      In terms of spread  the observations of wooden type  which is the old version  is more homogeneous than the steel type which is the newest version. In addition  the standard deviation of steel group is almost double compared to the wooden group      <!--[if !supportLists]-->¬∑         <!--[endif]--><span dir=\"LTR\"></span>Do steel roller coasters tend to have longer drops than wooden roller coasters?      The drop of steel coaster tends to be longer than the drop of wooden coaster back in the populations because the median for the drops of steel coaster lies outside the box of wooden coaster (more than half of the steel coaster are above than three quarters of the wooden group).      I confused about using Age-14 or Age-15 guidelines in the guidelines for analysis file. The sample sizes of 54 and 100 are outside the range 20 and 40 as found in Age-14. So which one we have to use here is it Age-14 or Age-15 guidelines?      <!--[if !supportLists]-->¬∑         <!--[endif]--><span dir=\"LTR\"></span>Based on what you found  predict what might be reasonable to expect for the height of the new wooden or steel roller coasters opening soon.      I expect the height would be ft100 and ft150 for the wooden and steel coaster respectively   "
##      Ana  after reading your post  as well as the first post by Pat Engle  it got me thinking about your investigation.  You concluded that ultimately  steel coasters have a higher max height than wood tracks.  An unsurprising result is that steel coasters also have a longer track length than wood tracks.  Pat  however  was interested in whether or not the duration of these rides were different.  In spite of the higher max heights and longer tracks  he was unable to conclude that one type of track had a longer duration.  I wonder how much of this result is confounded by the notion that longer tracks with higher max heights also require a longer climb to get up to the peak.  In my roller coaster riding experience  this seems to be the longest portion of the entire ride.
##      I decided to explore the distribution of fuel type in CODAP. I thought that the majority of vehicles would use regular fuel. My experience with owning cars (and being not rich) has been that I've only ever driven a car that used regular gas. I was surprised by what I saw in the two samples. The first sample (#86) had three more cars that used premium gas than used regular gas  with only one car using diesel. The second sample (#11) had the same number of cars using premium and regular gas  with one car using mid grade and one car using diesel.     Once I looked at the brand name of the cars  the graphs made more sense. It seems like there are a lot of BMWs  Mercedes  Porsches  and other luxury cars  which are probably going to use premium gasoline.     This made me think about the difference between this data set  and what a sample of 100 cars taken from a highway or parking lot would look like. My guess is that a real-life sample of 100 cars would be much different (it would also depend a lot on where you were taking the sample).   <img src=\@@PLUGINFILE@@/Fuel%20Type%201.jpeg\" alt=\"\" width=\"468\" height=\"363\" role=\"presentation\" class=\"img-responsive atto_image_button_text-bottom\">   <img src=\"@@PLUGINFILE@@/Fuel%20Type%202.jpeg\" alt=\"\" width=\"557\" height=\"475\" role=\"presentation\" class=\"img-responsive atto_image_button_text-bottom\">  "
##      <!--[if !supportLists]-->¬∑         <!--[endif]--><span dir=\LTR\"></span>What proportion of vehicles manufactured in 2015 are classified as SUVs (Sport Utility Vehicle)?      Using the CarSample_00  the proportion of vehicles manufactured in 2015 which are classified as SUVs is 61% from a sample of size 100. So  we can say that the proportion of SUVs in the population of vehicles manufactured in 2015 in US is probably approximately two thirds.      <!--[if !supportLists]-->¬∑         <!--[endif]--><span dir=\"LTR\"></span>What was the typical estimated annual fuel cost for vehicles manufactured in 2015?      The estimated annual fuel cost for vehicles manufactured in 2015 is $2300 based on a sample size of 100. The median must be used better than the mean because of two outliers are exist. The data distribution does not follow a normal distribution  where there is a positive skewness         <!--[if !supportLists]-->¬∑         <!--[endif]--><span dir=\"LTR\"></span>What is the typical relationship between a vehicle's fuel economy in the City and Highway?      The relationship between a vehicle's fuel economy in the City and Highway is positive based on the graph of scatter diagram and the regression line. Although the coefficient of determination was found high  approximately 83% we cannot say the regression coefficient (slope) is significant unless we can test it using an appropriate statistic. Of course  we can suggest other attributes to be used in the regression equation  which is called multiple regression equation.   "
##      Hi Erin- There are so many online data portals available it is ironically sometimes hard to find a good one. We have used the following criteria when assisting teachers in assessing the quality of online data portals:  <table border=\1\" cellpadding=\"0\" cellspacing=\"0\">   <tbody><tr>    <td valign=\"top\" width=\"162\">     <b>Quality of Information</b>     </td>    <td valign=\"top\" width=\"162\">     <b>Use</b>     </td>    <td valign=\"top\" width=\"162\">     <b>Do Not Use</b>     </td>   </tr>   <tr>    <td valign=\"top\" width=\"162\">     <b>Credibility/Objectivity</b>     </td>    <td valign=\"top\" width=\"162\">     <b> </b>     </td>    <td valign=\"top\" width=\"162\">     <b> </b>     </td>   </tr>   <tr>    <td valign=\"top\" width=\"162\">     Authorship - Is the    author/organization clearly identified?     </td>    <td valign=\"top\" width=\"162\">           </td>    <td valign=\"top\" width=\"162\">           </td>   </tr>   <tr>    <td valign=\"top\" width=\"162\">     Does the URL/domain name    provide insight about the affiliation?     </td>    <td valign=\"top\" width=\"162\">           </td>    <td valign=\"top\" width=\"162\">           </td>   </tr>   <tr>    <td valign=\"top\" width=\"162\">     What is the primary    purpose  and scope of the site?     </td>    <td valign=\"top\" width=\"162\">           </td>    <td valign=\"top\" width=\"162\">           </td>   </tr>   <tr>    <td valign=\"top\" width=\"162\">     Is it a well regarded    author or organization?     </td>    <td valign=\"top\" width=\"162\">           </td>    <td valign=\"top\" width=\"162\">           </td>   </tr>   <tr>    <td valign=\"top\" width=\"162\">     Are there signs of bias    or data interpretation provided?     </td>    <td valign=\"top\" width=\"162\">           </td>    <td valign=\"top\" width=\"162\">           </td>   </tr>   <tr>    <td valign=\"top\" width=\"162\">           </td>    <td valign=\"top\" width=\"162\">           </td>    <td valign=\"top\" width=\"162\">           </td>   </tr>   <tr>    <td valign=\"top\" width=\"162\">           </td>    <td valign=\"top\" width=\"162\">           </td>    <td valign=\"top\" width=\"162\">           </td>   </tr>   <tr>    <td valign=\"top\" width=\"162\">     <b>Accuracy/Verifiability</b>     </td>    <td valign=\"top\" width=\"162\">           </td>    <td valign=\"top\" width=\"162\">           </td>   </tr>   <tr>    <td valign=\"top\" width=\"162\">     Are the data gathering    methodologies explained?     </td>    <td valign=\"top\" width=\"162\">           </td>    <td valign=\"top\" width=\"162\">           </td>   </tr>   <tr>    <td valign=\"top\" width=\"162\">     Is the data time    stamped?     </td>    <td valign=\"top\" width=\"162\">           </td>    <td valign=\"top\" width=\"162\">           </td>   </tr>   <tr>    <td valign=\"top\" width=\"162\">     Is the site current?    When was the site last modified?     </td>    <td valign=\"top\" width=\"162\">           </td>    <td valign=\"top\" width=\"162\">           </td>   </tr>   <tr>    <td valign=\"top\" width=\"162\">           </td>    <td valign=\"top\" width=\"162\">           </td>    <td valign=\"top\" width=\"162\">           </td>   </tr>   <tr>    <td valign=\"top\" width=\"162\">     <b>Metadata</b>     </td>    <td valign=\"top\" width=\"162\">           </td>    <td valign=\"top\" width=\"162\">           </td>   </tr>   <tr>    <td valign=\"top\" width=\"162\">     How was the data collected     </td>    <td valign=\"top\" width=\"162\">           </td>    <td valign=\"top\" width=\"162\">           </td>   </tr>   <tr>    <td valign=\"top\" width=\"162\">     When was the data    collected     </td>    <td valign=\"top\" width=\"162\">           </td>    <td valign=\"top\" width=\"162\">           </td>   </tr>   <tr>    <td valign=\"top\" width=\"162\">     What organization or    researcher collected the data     </td>    <td valign=\"top\" width=\"162\">           </td>    <td valign=\"top\" width=\"162\">           </td>   </tr>   <tr>    <td valign=\"top\" width=\"162\">           </td>    <td valign=\"top\" width=\"162\">           </td>    <td valign=\"top\" width=\"162\">           </td>   </tr>   <tr>    <td valign=\"top\" width=\"162\">           </td>    <td valign=\"top\" width=\"162\">           </td>    <td valign=\"top\" width=\"162\">           </td>   </tr>   <tr>    <td valign=\"top\" width=\"162\">     <b>Quality of Web Site</b>     </td>    <td valign=\"top\" width=\"162\">     <b>Use</b>     </td>    <td valign=\"top\" width=\"162\">     <b>Do Not Use</b>     </td>   </tr>   <tr>    <td valign=\"top\" width=\"162\">     Is the site accessible?    Will it load quickly with many students accessing the site at once? It is    viewable in different browsers? Will it meet the needs of students with    disabilities?     </td>    <td valign=\"top\" width=\"162\">           </td>    <td valign=\"top\" width=\"162\">           </td>   </tr>   <tr>    <td valign=\"top\" width=\"162\">     Is the site easy to    navigate and read?     </td>    <td valign=\"top\" width=\"162\">           </td>    <td valign=\"top\" width=\"162\">           </td>   </tr>   <tr>    <td valign=\"top\" width=\"162\">     Usability ‚Äì can you find    the information you need quickly?      </td>    <td valign=\"top\" width=\"162\">           </td>    <td valign=\"top\" width=\"162\">           </td>   </tr>   <tr>    <td valign=\"top\" width=\"162\">     Grade-level appropriate    graphics  data display  language (too much jargon?)     </td>    <td valign=\"top\" width=\"162\">           </td>    <td valign=\"top\" width=\"162\">           </td>   </tr>  </tbody></table>            <b>References</b>      TRIO Training  University  of Washington      <a href=\"http://depts.washington.edu/trio/trioquest/resources/web/assess.php\">http://depts.washington.edu/trio/trioquest/resources/web/assess.php</a>     <b>* </b>Attached also is a list of online oceanographic and environmental science data portals.  "
##      Hi Tom.  I give my strongest recommendation to the Against All Odds series on Learner.org.  They're narrated by Pardis Sabeti - a Harvard Computational Biologist.  I mention them because those videos can serves as a framework for what you want your students to do.  \Here are some professionally done videos.  What are some common approaches used to convey the ideas powerfully and thoroughly?\"  I actually had a group this year do a video dressed as and impersonating Pardis Sabeti and she liked it so much she posted it on her twitter feed.  (Besides being brilliant  she's also an extremely cool and interesting person.)  In other words  it's a good way to convey to kids  \"The medium isn't the thing  it's how the medium is used.\"  That being said  I don't limit or constrain how they do their videos.  I've seen unbelievable rap videos made by these kids that I will forever treasure.     "

findThoughts(forums_stm,
             texts = ts_forum_data_reduced,
             topics = 3, 
             n = 10,
             thresh = 0.2) #topic 3

## 
##  Topic 3: 
##       NCTM and the American Statistical Association (ASA) have great resources for teachers. NCTM has two volumes in their Essential Understandings Series on teaching statistics  one for grades 6-8  and the other for 9-12 (full disclosure: I'm a co-author on the 9-12 volume). These books are for teachers  and focus on topics that are difficult to teach. Even if you teach only grades 9-12  I recommend the 6-8 volume as well. There is also a \Putting Essential Understandings into Practice\" volume that gives concrete suggestions  examples  and activities as to how to teach certain topics (there is a grades 9-12 volume  I'm not sure about a grades 6-8 volume). This link to the education section of the ASA website can direct you to resources that might be useful: http://www.amstat.org/education/index.cfm. Another great (free!) resource is the GAISE book (<em><a href=\"http://www.amstat.org/education/gaise/index.cfm\">Guidelines for the Assessment and Instruction             in Statistics Education</a></em> -- http://www.amstat.org/education/gaise/index.cfm). Best of luck  and feel free to reach out if you want more suggestions. There is lots of great stuff available.  "
##      <p style=\margin: 0in 0in 8pt;\">Through my experiences with middle school students  in  addition to myself being a middle school student  I do agree that students  would be able to answer some of the questions that we were given  but not all   as some involve some critical thinking skills that many students have not  learned yet. Solving problems that involve the mean  median  mode  and range  are all calculations that I remember doing at a middle school age.  Additionally  I do believe that there are many graphs that a middle school  student  and maybe even some elementary school students  could create. As a  high school student  I took statistics as one of my math courses  and many of  the questions that this investigation asked  I did not encounter until my high  school stats class.         "
##      One of our non-honors high school courses includes a 3-4 week unit on statistics. The students are already familiar with mean  mode and median from middle school. It is frustrating that textbook examples often give examples where the mean and median are close to one another; some students feel that either value may suffice  even if one is incrementally \better.\"   To help them appreciate the different measures of center  we do a simple activity which is an eye-opener for many. I ask each student for her age  this class typically has students in grades 10-12 so there is a bit of a range of results  and then we do the typical mean/mode/median analysis. Then  I remind them that there's one more person in the class - me! After having some fun guessing my age  we add my age to the data and recalculate mean/mode/median. The ensuing discussion helps to reinforce that  especially when one-sided outliers are involved  choosing an appropriate measure of center becomes more important. "
##      Hi Meg   Interesting to read your approach about the order of teaching big ideas and engaging students in statistical experiments. As a contrary example  for example  in Spain  teaching statistics and probability starts at very early ages (beginning of the primary school-age of 6). So in this case  how could we teach 6-year old students about big ideas? Should we start teaching stats in high school  instead?   There has been a debate about when to start teaching statistics and probability  and I do not know which camp is right.  Thanks   Kemal
##      Where the majority of my career has been spent in sixth grade and recently switched  to seventh grade  math  I can share that becauSe of the demands of curriculum directors to practice programs \with fidelity \" many middle school teachers do not find time in their year to actually teach the statistics and probability concepts- often times  these concepts appearing toward the end of more traditionally sequenced programs.  Furthermore  because of the isolated nature of many schools and curriculums  obvious chances to conduct quantitative science and other approaches that would integrate statistics in a real-world content.    The bottom line?  Yes  the standards have this content in them.  I would argue  however r that many students would be woefully unprepared to actually succeed due to insignificant or lack of coverage entirely. "
##      I am really interested in taking the Pepsi versus Coke experiment further for grade 12 students  so I found this link: <a href=\https://serc.carleton.edu/sp/library/datasim/examples/cokepepsi.html\">https://serc.carleton.edu/sp/library/datasim/examples/cokepepsi.html</a> <span style=\"color: rgb(119  119  119); font-family: 'Lucida Sans Unicode'  'Lucida Grande'  'Lucida Sans'  Verdana  Helvetica  Arial  sans-serif;\">This lesson plan and activity are based on material from the NSF-funded AIMS Project (Garfield  delMas and Zieffler  2007). For more information contact Joan Garfield at jbg@umn.edu</span>  <span style=\"color: rgb(119  119  119); font-family: 'Lucida Sans Unicode'  'Lucida Grande'  'Lucida Sans'  Verdana  Helvetica  Arial  sans-serif;\"> </span>  <span style=\"color: rgb(119  119  119); font-family: 'Lucida Sans Unicode'  'Lucida Grande'  'Lucida Sans'  Verdana  Helvetica  Arial  sans-serif;\">It discusses how to create an experiment out of this with control and treatment groups.</span>  <span style=\"color: rgb(119  119  119); font-family: 'Lucida Sans Unicode'  'Lucida Grande'  'Lucida Sans'  Verdana  Helvetica  Arial  sans-serif;\"> </span>    "
##      I have seen those IEP specs as well. Many IEP writers believe that they are doing a service to students by requiring step by step prescriptions from the teachers. Could you call for a meeting to modify the IEP writing? From my experience an IEP Addendum usually can be achieved without calling a full scale meeting with all plays participate. Hope this help. I totally agree with you about developing mathematical thinking rather than training for following direction.
##      <div class=\video-text-above\"><section class=\"course-section course-video\">      <div class=\"section-header\">          <h2 class=\"section-label\">             </h2>      </div>      <div class=\"section-content\">                     Chris Franklin's video on  developing the concept of the mean help me understand the SASI framework and the activities that can   be used at levels A  B  and C. The discussion between Chris and Hollylynne also shed light of <i>CCSS 5.MD.2. Make a line plot to display a data set of measurement... For example  given different measurements of liquid in identical bekers  find the amount of liquid each beaker would contain if the total amount in all the beakers were redistributed equally.  </i>I can see  how the concept of mean progresses through the grade level stating from grade 5.       </div>  </section></div>  "

Topic Modeling Badge

LASER Institute TM Learning Lab 3

Dr. Shiyan Jiang

July 21, 2023

Part I: Reflect and Plan

Part II: Data Product

Knit & Submit