The final activity for each learning lab provides space to work with data and to reflect on how the concepts and techniques introduced in each lab might apply to your own research.
To earn a badge for each lab, you are required to respond to a set of prompts for two parts:
In Part I, you will reflect on your understanding of key concepts and begin to think about potential next steps for your own study.
In Part II, you will create a simple data product in R that demonstrates your ability to apply a data analysis technique introduced in this learning lab.
Use the institutional library (e.g. NCSU Library), Google Scholar or search engine to locate a research article, presentation, or resource that applies text mining to an educational context or topic of interest. More specifically, locate a text mining study that visualize text data.
Provide an APA citation for your selected study.
How does topic modeling address research questions?
Draft a research question for a population you may be interested in studying, or that would be of interest to educational researchers, and that would require the collection of text data and answer the following questions:
What text data would need to be collected?
For what reason would text data need to be collected in order to address this question?
Explain the analytical level at which these text data would need to be collected and analyzed.
Use your case study file to try a small number of topics (e.g., 3) or a large number of topics (e.g., 30) and explain how changing number of topics shape the way you interpret results.
The change of the number of topics lead to a totally different set of topics. The explanations of the 3 topics are different from what was listed in the case study. It might or might not be align with the corresponding theory. Therefore, it is worth exploring the number of topics while conducting the analysis.
I highly recommend creating a new R script in your lab-3 folder to complete this task. When your code is ready to share, use the code chunk below to share the final code for your model and answer the questions that follow.
# YOUR FINAL CODE HERE
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.2 ✔ readr 2.1.4
## ✔ forcats 1.0.0 ✔ stringr 1.5.0
## ✔ ggplot2 3.4.2 ✔ tibble 3.2.1
## ✔ lubridate 1.9.2 ✔ tidyr 1.3.0
## ✔ purrr 1.0.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(tidytext)
library(SnowballC)
library(topicmodels)
library(stm)
## stm v1.3.6 successfully loaded. See ?stm for help.
## Papers, resources, and other materials at structuraltopicmodel.com
library(ldatuning)
library(knitr)
library(LDAvis)
#read data
ts_forum_data <- read_csv("/cloud/project/lab-3/data/ts_forum_data.csv",
col_types = cols(course_id = col_character(),
forum_id = col_character(),
discussion_id = col_character(),
post_id = col_character()
)
)
#tidying data
forums_tidy <- ts_forum_data %>%
unnest_tokens(output = word, input = post_content) %>%
anti_join(stop_words, by = "word")
forums_tidy
## # A tibble: 192,160 × 14
## course_id course_name forum_id forum_name discussion_id discussion_name
## <chr> <chr> <chr> <chr> <chr> <chr>
## 1 9 Teaching Statist… 126 Investiga… 6822 Not much compa…
## 2 9 Teaching Statist… 126 Investiga… 6822 Not much compa…
## 3 9 Teaching Statist… 126 Investiga… 6822 Not much compa…
## 4 9 Teaching Statist… 126 Investiga… 6822 Not much compa…
## 5 9 Teaching Statist… 126 Investiga… 6822 Not much compa…
## 6 9 Teaching Statist… 126 Investiga… 6822 Not much compa…
## 7 9 Teaching Statist… 126 Investiga… 6822 Not much compa…
## 8 9 Teaching Statist… 126 Investiga… 6822 Not much compa…
## 9 9 Teaching Statist… 126 Investiga… 6822 Not much compa…
## 10 9 Teaching Statist… 126 Investiga… 6822 Not much compa…
## # ℹ 192,150 more rows
## # ℹ 8 more variables: discussion_creator <dbl>, discussion_poster <dbl>,
## # discussion_reference <chr>, parent_id <dbl>, post_date <chr>,
## # post_id <chr>, post_title <chr>, word <chr>
forums_tidy %>%
count(word, sort = TRUE)
## # A tibble: 13,620 × 2
## word n
## <chr> <int>
## 1 students 6841
## 2 data 4365
## 3 statistics 3103
## 4 school 1488
## 5 questions 1470
## 6 class 1426
## 7 font 1311
## 8 span 1267
## 9 time 1253
## 10 style 1150
## # ℹ 13,610 more rows
forums_dtm <- forums_tidy %>%
count(post_id, word) %>%
cast_dtm(post_id, word, n)
#stemming for STM
library(stm)
temp <- textProcessor(ts_forum_data$post_content,
metadata = ts_forum_data,
lowercase=TRUE,
removestopwords=TRUE,
removenumbers=TRUE,
removepunctuation=TRUE,
wordLengths=c(3,Inf),
stem=TRUE,
onlycharacter= FALSE,
striphtml=TRUE,
customstopwords=NULL)
## Building corpus...
## Converting to Lower Case...
## Removing punctuation...
## Removing stopwords...
## Removing numbers...
## Stemming...
## Creating Output...
meta <- temp$meta
vocab <- temp$vocab
docs <- temp$documents
stemmed_forums <- ts_forum_data %>%
unnest_tokens(output = word, input = post_content) %>%
anti_join(stop_words, by = "word") %>%
mutate(stem = wordStem(word))
stemmed_forums
## # A tibble: 192,160 × 15
## course_id course_name forum_id forum_name discussion_id discussion_name
## <chr> <chr> <chr> <chr> <chr> <chr>
## 1 9 Teaching Statist… 126 Investiga… 6822 Not much compa…
## 2 9 Teaching Statist… 126 Investiga… 6822 Not much compa…
## 3 9 Teaching Statist… 126 Investiga… 6822 Not much compa…
## 4 9 Teaching Statist… 126 Investiga… 6822 Not much compa…
## 5 9 Teaching Statist… 126 Investiga… 6822 Not much compa…
## 6 9 Teaching Statist… 126 Investiga… 6822 Not much compa…
## 7 9 Teaching Statist… 126 Investiga… 6822 Not much compa…
## 8 9 Teaching Statist… 126 Investiga… 6822 Not much compa…
## 9 9 Teaching Statist… 126 Investiga… 6822 Not much compa…
## 10 9 Teaching Statist… 126 Investiga… 6822 Not much compa…
## # ℹ 192,150 more rows
## # ℹ 9 more variables: discussion_creator <dbl>, discussion_poster <dbl>,
## # discussion_reference <chr>, parent_id <dbl>, post_date <chr>,
## # post_id <chr>, post_title <chr>, word <chr>, stem <chr>
#Model Fitting: LDA
n_distinct(ts_forum_data$forum_name)
## [1] 21
forums_lda <- LDA(forums_dtm,
k = 3,
control = list(seed = 588)
)
forums_lda
## A LDA_VEM topic model with 3 topics.
#Model Fitting: STM
docs <- temp$documents
meta <- temp$meta
vocab <- temp$vocab
forums_stm <- stm(documents=docs,
data=meta,
vocab=vocab,
prevalence =~ course_id + forum_id,
K=3,
max.em.its=25,
verbose = FALSE)
forums_stm
## A topic model with 3 topics, 5781 documents and a 7820 word dictionary.
toLDAvis(mod = forums_stm, docs = docs)
## Loading required namespace: servr
#Exploring beta values
terms(forums_lda, 5)
## Topic 1 Topic 2 Topic 3
## [1,] "font" "statistics" "students"
## [2,] "span" "href" "data"
## [3,] "style" "li" "statistics"
## [4,] "text" "strong" "questions"
## [5,] "normal" "https" "school"
tidy_lda <- tidy(forums_lda)
tidy_lda
## # A tibble: 40,860 × 3
## topic term beta
## <int> <chr> <dbl>
## 1 1 2015 1.98e- 4
## 2 2 2015 5.59e- 4
## 3 3 2015 5.69e- 5
## 4 1 21 1.44e-40
## 5 2 21 1.32e- 4
## 6 3 21 1.29e-17
## 7 1 beginning 5.02e- 5
## 8 2 beginning 1.34e- 4
## 9 3 beginning 8.14e- 4
## 10 1 content 5.24e- 4
## # ℹ 40,850 more rows
top_terms <- tidy_lda %>%
group_by(topic) %>%
slice_max(beta, n = 5, with_ties = FALSE) %>%
ungroup() %>%
arrange(topic, -beta)
top_terms %>%
mutate(term = reorder_within(term, beta, topic)) %>%
group_by(topic, term) %>%
arrange(desc(beta)) %>%
ungroup() %>%
ggplot(aes(beta, term, fill = as.factor(topic))) +
geom_col(show.legend = FALSE) +
scale_y_reordered() +
labs(title = "Top 5 terms in each LDA topic",
x = expression(beta), y = NULL) +
facet_wrap(~ topic, ncol = 4, scales = "free")
#exploring gamma value
td_beta <- tidy(forums_lda)
td_beta
## # A tibble: 40,860 × 3
## topic term beta
## <int> <chr> <dbl>
## 1 1 2015 1.98e- 4
## 2 2 2015 5.59e- 4
## 3 3 2015 5.69e- 5
## 4 1 21 1.44e-40
## 5 2 21 1.32e- 4
## 6 3 21 1.29e-17
## 7 1 beginning 5.02e- 5
## 8 2 beginning 1.34e- 4
## 9 3 beginning 8.14e- 4
## 10 1 content 5.24e- 4
## # ℹ 40,850 more rows
td_gamma <- tidy(forums_lda, matrix = "gamma")
td_gamma
## # A tibble: 17,298 × 3
## document topic gamma
## <chr> <int> <dbl>
## 1 11295 1 0.00335
## 2 12711 1 0.000413
## 3 12725 1 0.0717
## 4 12733 1 0.00393
## 5 12743 1 0.0146
## 6 12744 1 0.00688
## 7 12756 1 0.0717
## 8 12757 1 0.00500
## 9 12775 1 0.110
## 10 12816 1 0.00500
## # ℹ 17,288 more rows
top_terms <- td_beta %>%
arrange(beta) %>%
group_by(topic) %>%
top_n(7, beta) %>%
arrange(-beta) %>%
#select(topic, term) %>%
summarise(terms = list(term)) %>%
mutate(terms = map(terms, paste, collapse = ", ")) %>%
unnest()
## Warning: `cols` is now required when using `unnest()`.
## ℹ Please use `cols = c(terms)`.
top_terms
## # A tibble: 3 × 2
## topic terms
## <int> <chr>
## 1 1 font, span, style, text, normal, 0px, height
## 2 2 statistics, href, li, strong, https, resources, target
## 3 3 students, data, statistics, questions, school, class, time
gamma_terms <- td_gamma %>%
group_by(topic) %>%
summarise(gamma = mean(gamma)) %>%
arrange(desc(gamma)) %>%
left_join(top_terms, by = "topic") %>%
mutate(topic = paste0("Topic ", topic),
topic = reorder(topic, gamma))
gamma_terms %>%
#select(topic, gamma, terms) %>%
kable(digits = 3,
col.names = c("Topic", "Expected topic proportion", "Top 7 terms"))
| Topic | Expected topic proportion | Top 7 terms |
|---|---|---|
| Topic 3 | 0.780 | students, data, statistics, questions, school, class, time |
| Topic 2 | 0.176 | statistics, href, li, strong, https, resources, target |
| Topic 1 | 0.044 | font, span, style, text, normal, 0px, height |
#reading tea leaves
ts_forum_data_reduced <-ts_forum_data$post_content[-temp$docs.removed]
findThoughts(forums_stm,
texts = ts_forum_data_reduced,
topics = 2,
n = 10,
thresh = 0.5) #topic 2
##
## Topic 2:
## <!--[if !supportLists]-->· <!--[endif]--><span dir=\LTR\"></span>Do wooden coasters tend to have the same maximum height as coasters made from steel? Does anything surprise you? The height of steel coaster tends to be higher than the height of wooden coaster back in the populations. What surprising me: Although the box-plot of height of both types of coasters have outliers the mean and median in both are almost same. In addition sample observations looks like follow approximately normal distribution although outliers are exist. In terms of spread the observations of wooden type which is the old version is more homogeneous than the steel type which is the newest version. In addition the standard deviation of steel group is almost double compared to the wooden group <!--[if !supportLists]-->· <!--[endif]--><span dir=\"LTR\"></span>Do steel roller coasters tend to have longer drops than wooden roller coasters? The drop of steel coaster tends to be longer than the drop of wooden coaster back in the populations because the median for the drops of steel coaster lies outside the box of wooden coaster (more than half of the steel coaster are above than three quarters of the wooden group). I confused about using Age-14 or Age-15 guidelines in the guidelines for analysis file. The sample sizes of 54 and 100 are outside the range 20 and 40 as found in Age-14. So which one we have to use here is it Age-14 or Age-15 guidelines? <!--[if !supportLists]-->· <!--[endif]--><span dir=\"LTR\"></span>Based on what you found predict what might be reasonable to expect for the height of the new wooden or steel roller coasters opening soon. I expect the height would be ft100 and ft150 for the wooden and steel coaster respectively "
## Ana after reading your post as well as the first post by Pat Engle it got me thinking about your investigation. You concluded that ultimately steel coasters have a higher max height than wood tracks. An unsurprising result is that steel coasters also have a longer track length than wood tracks. Pat however was interested in whether or not the duration of these rides were different. In spite of the higher max heights and longer tracks he was unable to conclude that one type of track had a longer duration. I wonder how much of this result is confounded by the notion that longer tracks with higher max heights also require a longer climb to get up to the peak. In my roller coaster riding experience this seems to be the longest portion of the entire ride.
## I decided to explore the distribution of fuel type in CODAP. I thought that the majority of vehicles would use regular fuel. My experience with owning cars (and being not rich) has been that I've only ever driven a car that used regular gas. I was surprised by what I saw in the two samples. The first sample (#86) had three more cars that used premium gas than used regular gas with only one car using diesel. The second sample (#11) had the same number of cars using premium and regular gas with one car using mid grade and one car using diesel. Once I looked at the brand name of the cars the graphs made more sense. It seems like there are a lot of BMWs Mercedes Porsches and other luxury cars which are probably going to use premium gasoline. This made me think about the difference between this data set and what a sample of 100 cars taken from a highway or parking lot would look like. My guess is that a real-life sample of 100 cars would be much different (it would also depend a lot on where you were taking the sample). <img src=\@@PLUGINFILE@@/Fuel%20Type%201.jpeg\" alt=\"\" width=\"468\" height=\"363\" role=\"presentation\" class=\"img-responsive atto_image_button_text-bottom\"> <img src=\"@@PLUGINFILE@@/Fuel%20Type%202.jpeg\" alt=\"\" width=\"557\" height=\"475\" role=\"presentation\" class=\"img-responsive atto_image_button_text-bottom\"> "
## <!--[if !supportLists]-->· <!--[endif]--><span dir=\LTR\"></span>What proportion of vehicles manufactured in 2015 are classified as SUVs (Sport Utility Vehicle)? Using the CarSample_00 the proportion of vehicles manufactured in 2015 which are classified as SUVs is 61% from a sample of size 100. So we can say that the proportion of SUVs in the population of vehicles manufactured in 2015 in US is probably approximately two thirds. <!--[if !supportLists]-->· <!--[endif]--><span dir=\"LTR\"></span>What was the typical estimated annual fuel cost for vehicles manufactured in 2015? The estimated annual fuel cost for vehicles manufactured in 2015 is $2300 based on a sample size of 100. The median must be used better than the mean because of two outliers are exist. The data distribution does not follow a normal distribution where there is a positive skewness <!--[if !supportLists]-->· <!--[endif]--><span dir=\"LTR\"></span>What is the typical relationship between a vehicle's fuel economy in the City and Highway? The relationship between a vehicle's fuel economy in the City and Highway is positive based on the graph of scatter diagram and the regression line. Although the coefficient of determination was found high approximately 83% we cannot say the regression coefficient (slope) is significant unless we can test it using an appropriate statistic. Of course we can suggest other attributes to be used in the regression equation which is called multiple regression equation. "
## Hi Erin- There are so many online data portals available it is ironically sometimes hard to find a good one. We have used the following criteria when assisting teachers in assessing the quality of online data portals: <table border=\1\" cellpadding=\"0\" cellspacing=\"0\"> <tbody><tr> <td valign=\"top\" width=\"162\"> <b>Quality of Information</b> </td> <td valign=\"top\" width=\"162\"> <b>Use</b> </td> <td valign=\"top\" width=\"162\"> <b>Do Not Use</b> </td> </tr> <tr> <td valign=\"top\" width=\"162\"> <b>Credibility/Objectivity</b> </td> <td valign=\"top\" width=\"162\"> <b> </b> </td> <td valign=\"top\" width=\"162\"> <b> </b> </td> </tr> <tr> <td valign=\"top\" width=\"162\"> Authorship - Is the author/organization clearly identified? </td> <td valign=\"top\" width=\"162\"> </td> <td valign=\"top\" width=\"162\"> </td> </tr> <tr> <td valign=\"top\" width=\"162\"> Does the URL/domain name provide insight about the affiliation? </td> <td valign=\"top\" width=\"162\"> </td> <td valign=\"top\" width=\"162\"> </td> </tr> <tr> <td valign=\"top\" width=\"162\"> What is the primary purpose and scope of the site? </td> <td valign=\"top\" width=\"162\"> </td> <td valign=\"top\" width=\"162\"> </td> </tr> <tr> <td valign=\"top\" width=\"162\"> Is it a well regarded author or organization? </td> <td valign=\"top\" width=\"162\"> </td> <td valign=\"top\" width=\"162\"> </td> </tr> <tr> <td valign=\"top\" width=\"162\"> Are there signs of bias or data interpretation provided? </td> <td valign=\"top\" width=\"162\"> </td> <td valign=\"top\" width=\"162\"> </td> </tr> <tr> <td valign=\"top\" width=\"162\"> </td> <td valign=\"top\" width=\"162\"> </td> <td valign=\"top\" width=\"162\"> </td> </tr> <tr> <td valign=\"top\" width=\"162\"> </td> <td valign=\"top\" width=\"162\"> </td> <td valign=\"top\" width=\"162\"> </td> </tr> <tr> <td valign=\"top\" width=\"162\"> <b>Accuracy/Verifiability</b> </td> <td valign=\"top\" width=\"162\"> </td> <td valign=\"top\" width=\"162\"> </td> </tr> <tr> <td valign=\"top\" width=\"162\"> Are the data gathering methodologies explained? </td> <td valign=\"top\" width=\"162\"> </td> <td valign=\"top\" width=\"162\"> </td> </tr> <tr> <td valign=\"top\" width=\"162\"> Is the data time stamped? </td> <td valign=\"top\" width=\"162\"> </td> <td valign=\"top\" width=\"162\"> </td> </tr> <tr> <td valign=\"top\" width=\"162\"> Is the site current? When was the site last modified? </td> <td valign=\"top\" width=\"162\"> </td> <td valign=\"top\" width=\"162\"> </td> </tr> <tr> <td valign=\"top\" width=\"162\"> </td> <td valign=\"top\" width=\"162\"> </td> <td valign=\"top\" width=\"162\"> </td> </tr> <tr> <td valign=\"top\" width=\"162\"> <b>Metadata</b> </td> <td valign=\"top\" width=\"162\"> </td> <td valign=\"top\" width=\"162\"> </td> </tr> <tr> <td valign=\"top\" width=\"162\"> How was the data collected </td> <td valign=\"top\" width=\"162\"> </td> <td valign=\"top\" width=\"162\"> </td> </tr> <tr> <td valign=\"top\" width=\"162\"> When was the data collected </td> <td valign=\"top\" width=\"162\"> </td> <td valign=\"top\" width=\"162\"> </td> </tr> <tr> <td valign=\"top\" width=\"162\"> What organization or researcher collected the data </td> <td valign=\"top\" width=\"162\"> </td> <td valign=\"top\" width=\"162\"> </td> </tr> <tr> <td valign=\"top\" width=\"162\"> </td> <td valign=\"top\" width=\"162\"> </td> <td valign=\"top\" width=\"162\"> </td> </tr> <tr> <td valign=\"top\" width=\"162\"> </td> <td valign=\"top\" width=\"162\"> </td> <td valign=\"top\" width=\"162\"> </td> </tr> <tr> <td valign=\"top\" width=\"162\"> <b>Quality of Web Site</b> </td> <td valign=\"top\" width=\"162\"> <b>Use</b> </td> <td valign=\"top\" width=\"162\"> <b>Do Not Use</b> </td> </tr> <tr> <td valign=\"top\" width=\"162\"> Is the site accessible? Will it load quickly with many students accessing the site at once? It is viewable in different browsers? Will it meet the needs of students with disabilities? </td> <td valign=\"top\" width=\"162\"> </td> <td valign=\"top\" width=\"162\"> </td> </tr> <tr> <td valign=\"top\" width=\"162\"> Is the site easy to navigate and read? </td> <td valign=\"top\" width=\"162\"> </td> <td valign=\"top\" width=\"162\"> </td> </tr> <tr> <td valign=\"top\" width=\"162\"> Usability – can you find the information you need quickly? </td> <td valign=\"top\" width=\"162\"> </td> <td valign=\"top\" width=\"162\"> </td> </tr> <tr> <td valign=\"top\" width=\"162\"> Grade-level appropriate graphics data display language (too much jargon?) </td> <td valign=\"top\" width=\"162\"> </td> <td valign=\"top\" width=\"162\"> </td> </tr> </tbody></table> <b>References</b> TRIO Training University of Washington <a href=\"http://depts.washington.edu/trio/trioquest/resources/web/assess.php\">http://depts.washington.edu/trio/trioquest/resources/web/assess.php</a> <b>* </b>Attached also is a list of online oceanographic and environmental science data portals. "
## Hi Tom. I give my strongest recommendation to the Against All Odds series on Learner.org. They're narrated by Pardis Sabeti - a Harvard Computational Biologist. I mention them because those videos can serves as a framework for what you want your students to do. \Here are some professionally done videos. What are some common approaches used to convey the ideas powerfully and thoroughly?\" I actually had a group this year do a video dressed as and impersonating Pardis Sabeti and she liked it so much she posted it on her twitter feed. (Besides being brilliant she's also an extremely cool and interesting person.) In other words it's a good way to convey to kids \"The medium isn't the thing it's how the medium is used.\" That being said I don't limit or constrain how they do their videos. I've seen unbelievable rap videos made by these kids that I will forever treasure. "
findThoughts(forums_stm,
texts = ts_forum_data_reduced,
topics = 3,
n = 10,
thresh = 0.2) #topic 3
##
## Topic 3:
## NCTM and the American Statistical Association (ASA) have great resources for teachers. NCTM has two volumes in their Essential Understandings Series on teaching statistics one for grades 6-8 and the other for 9-12 (full disclosure: I'm a co-author on the 9-12 volume). These books are for teachers and focus on topics that are difficult to teach. Even if you teach only grades 9-12 I recommend the 6-8 volume as well. There is also a \Putting Essential Understandings into Practice\" volume that gives concrete suggestions examples and activities as to how to teach certain topics (there is a grades 9-12 volume I'm not sure about a grades 6-8 volume). This link to the education section of the ASA website can direct you to resources that might be useful: http://www.amstat.org/education/index.cfm. Another great (free!) resource is the GAISE book (<em><a href=\"http://www.amstat.org/education/gaise/index.cfm\">Guidelines for the Assessment and Instruction in Statistics Education</a></em> -- http://www.amstat.org/education/gaise/index.cfm). Best of luck and feel free to reach out if you want more suggestions. There is lots of great stuff available. "
## <p style=\margin: 0in 0in 8pt;\">Through my experiences with middle school students in addition to myself being a middle school student I do agree that students would be able to answer some of the questions that we were given but not all as some involve some critical thinking skills that many students have not learned yet. Solving problems that involve the mean median mode and range are all calculations that I remember doing at a middle school age. Additionally I do believe that there are many graphs that a middle school student and maybe even some elementary school students could create. As a high school student I took statistics as one of my math courses and many of the questions that this investigation asked I did not encounter until my high school stats class. "
## One of our non-honors high school courses includes a 3-4 week unit on statistics. The students are already familiar with mean mode and median from middle school. It is frustrating that textbook examples often give examples where the mean and median are close to one another; some students feel that either value may suffice even if one is incrementally \better.\" To help them appreciate the different measures of center we do a simple activity which is an eye-opener for many. I ask each student for her age this class typically has students in grades 10-12 so there is a bit of a range of results and then we do the typical mean/mode/median analysis. Then I remind them that there's one more person in the class - me! After having some fun guessing my age we add my age to the data and recalculate mean/mode/median. The ensuing discussion helps to reinforce that especially when one-sided outliers are involved choosing an appropriate measure of center becomes more important. "
## Hi Meg Interesting to read your approach about the order of teaching big ideas and engaging students in statistical experiments. As a contrary example for example in Spain teaching statistics and probability starts at very early ages (beginning of the primary school-age of 6). So in this case how could we teach 6-year old students about big ideas? Should we start teaching stats in high school instead? There has been a debate about when to start teaching statistics and probability and I do not know which camp is right. Thanks Kemal
## Where the majority of my career has been spent in sixth grade and recently switched to seventh grade math I can share that becauSe of the demands of curriculum directors to practice programs \with fidelity \" many middle school teachers do not find time in their year to actually teach the statistics and probability concepts- often times these concepts appearing toward the end of more traditionally sequenced programs. Furthermore because of the isolated nature of many schools and curriculums obvious chances to conduct quantitative science and other approaches that would integrate statistics in a real-world content. The bottom line? Yes the standards have this content in them. I would argue however r that many students would be woefully unprepared to actually succeed due to insignificant or lack of coverage entirely. "
## I am really interested in taking the Pepsi versus Coke experiment further for grade 12 students so I found this link: <a href=\https://serc.carleton.edu/sp/library/datasim/examples/cokepepsi.html\">https://serc.carleton.edu/sp/library/datasim/examples/cokepepsi.html</a> <span style=\"color: rgb(119 119 119); font-family: 'Lucida Sans Unicode' 'Lucida Grande' 'Lucida Sans' Verdana Helvetica Arial sans-serif;\">This lesson plan and activity are based on material from the NSF-funded AIMS Project (Garfield delMas and Zieffler 2007). For more information contact Joan Garfield at jbg@umn.edu</span> <span style=\"color: rgb(119 119 119); font-family: 'Lucida Sans Unicode' 'Lucida Grande' 'Lucida Sans' Verdana Helvetica Arial sans-serif;\"> </span> <span style=\"color: rgb(119 119 119); font-family: 'Lucida Sans Unicode' 'Lucida Grande' 'Lucida Sans' Verdana Helvetica Arial sans-serif;\">It discusses how to create an experiment out of this with control and treatment groups.</span> <span style=\"color: rgb(119 119 119); font-family: 'Lucida Sans Unicode' 'Lucida Grande' 'Lucida Sans' Verdana Helvetica Arial sans-serif;\"> </span> "
## I have seen those IEP specs as well. Many IEP writers believe that they are doing a service to students by requiring step by step prescriptions from the teachers. Could you call for a meeting to modify the IEP writing? From my experience an IEP Addendum usually can be achieved without calling a full scale meeting with all plays participate. Hope this help. I totally agree with you about developing mathematical thinking rather than training for following direction.
## <div class=\video-text-above\"><section class=\"course-section course-video\"> <div class=\"section-header\"> <h2 class=\"section-label\"> </h2> </div> <div class=\"section-content\"> Chris Franklin's video on developing the concept of the mean help me understand the SASI framework and the activities that can be used at levels A B and C. The discussion between Chris and Hollylynne also shed light of <i>CCSS 5.MD.2. Make a line plot to display a data set of measurement... For example given different measurements of liquid in identical bekers find the amount of liquid each beaker would contain if the total amount in all the beakers were redistributed equally. </i>I can see how the concept of mean progresses through the grade level stating from grade 5. </div> </section></div> "
Congratulations, you’ve completed your Intro to text mining Badge! Complete the following steps in the orientation to submit your work for review.