DAM1_Assignment_Fall2024

Author

Steve Hoffman

Purpose of the Data-Analytic Memo

Here you will conduct analyses on behalf of a trio of 3rd-grade teachers at Lindquist Elementary School in Hometown, USA. The Principal of the school has requested your assistance helping these teachers analyze some of their preliminary in-house assessments. Ms. Affolter, Mr. Miller-Lane, and Ms. Weston teach 3rd grade, and they plan their instruction together often. After one month of school, they compiled scores for a comprehensive 64-item spelling test for the words they’ve agreed to teach (16 words each week). And they administered the first unit test of 40 items from their mathematics textbook. Ms. Weston keyed in the scores into a Google sheet for all the students on the 3rd grade team and shared them with the Principal. Your task, then, is to make sense of this data and describe your suggestions back to the teaching team.

Remember that this assignment is all about collaborative learning. It is an opportunity to try out your skills as a coder and your understanding of educational testing. Please ensure that you engage in a full, fair, and mutually-agreeable collaboration with your partner(s). Do not simply divide the work. Discuss and plan your analyses together; debate what you have found with each other; collaborate on the writing.

Data set

The data are contained in a Google Sheet:

https://docs.google.com/spreadsheets/d/1778B79za0VK7-OnSNGeybGRR0F_4AlSxd2n64Nh9LiE/edit?usp=sharing

Ms. Weston included the following details.

Boys were assigned as 1; girls were assigned as 2. However, she left three of the students’ gender blank, as Morgan, Paige, and River are to be categorized as nonbinary. The teachers ask that you work out how to make this happen in the data.

The three classrooms are labeled 1 for Ms. Affolter’s class, 2 for Mr. Miller-Lane’s class, and 3 for Ms. Weston’s class.

Organizing your work

At the end of your process of working through this assignment you will turn in a .qmd file – with your names on it – to Canvas. (Only one copy of the joint DAM produced as partners is required.) But getting organized on how to produce the final product takes some work. Here’s how I might take this on:

  • Start a New Project called DAM1_Ella_Mo_Felix or something similar.

  • Within your Project, place the starter script called DAM1_starter.R in the appropriate folder.

  • Add to the DAM1_starter.R file with your new code and learnings. This will NOT be turned in but can be a source of ideas and code for future work.

  • Start a quarto document file with a file name that has your names on it. (e.g. DAM1_Ella_mo_Felix.qmd)

  • Communicate with your partner(s) often!

Data Analysis

Task 1

Input the DAM1_fall2024 data into your computer and print the data set, without alteration.

Task 2 - Clean the data set

  • Produce suitable labels for the assigned classrooms (Affolter, Miller-Lane, Weston).

  • Produce suitable labels for student gender.

  • Print the data set with labels for classroom and gender.

    NOTE: identifying the non-binary students is an issue that isn’t easily solved by following the book. If you get help from colleagues, please make a note of their contributions to your coding.

Task 3 - Describe the Spelling scores

  • Produce a histogram for all 75 scores on the Spelling test of 64 words (similar to page 36 or 37 of the Thorndike text. Do not designate which classroom for this task.)

  • Calculate the median score and state it.

  • Calculate the minimum and maximum Spelling scores and state them.

  • In one short paragraph, describe the shape, center, and spread of the Spelling scores for all 75 students.

  • Produce a (separate) density plot for all 75 scores on the Spelling test. (Basically a different plot of the same data. Again, do not designate which classroom for this task.)

  • In one short paragraph, describe the shape, center, and spread of the Spelling scores for the 75 students.

  • State, in one paragraph, what are the advantages of providing a histogram or a density plot, and which one you prefer for this task.

Task 4 - Describe the Math scores

  • Produce a histogram for all 75 scores on the Math test (out of 40 items). Do not designate which classroom for this task.)

  • Calculate the median score and state it.

  • Calculate the minimum and maximum Spelling scores and state them.

  • In one short paragraph, describe the shape, center, and spread of the Math scores for all 75 students.

  • Produce a (separate) density plot for all 75 scores on the Math test. (Basically a different plot of the same data. Again, do not designate which classroom for this task.)

  • In one short paragraph, describe the shape, center, and spread of the Math scores for the 75 students.

  • State, in one paragraph, what are the advantages of providing a histogram or a density plot, and which one you prefer for this task.

Task 5 - A scatterplot

Produce and display as a scatterplot the relationship between Spelling (on the x-axis) and Math (on the y-axis). Make sure that the axes are labeled appropriately and that you have provided a suitable title for the plot. Note: Again, I am asking for a plot where all 75 students are considered as a whole, not designated by classroom. All points plotted in black is appropriate.

Task 6 - Analyze the scatterplot

Briefly describe the nature of any relationship you observe on the scatterplot between Math scores and Spelling scores for the full sample of 75 children.

Task 7 - Scatterplots by Classroom

  • Produce a scatter plot showing the relationship between Math scores and Spelling scores AND which classroom each student is assigned to.

  • Briefly describe the nature of any relationship between Math scores, Spelling scores, and which classroom assigned that you notice. (If there is not a notable relationship between classrooms, say so.)

Task 8 - Boxplots of Spelling scores by Classroom

Produce boxplots of Spelling scores for the three classrooms, featuring it as a graphical exhibit.

Task 9 - Describing the boxplots

Using R4DS section 1.5.1 as a guide, describe what this figure with three boxplots shows by…

  • State and interpret the value of the median, the 25th percentile, the 75th percentile, and the interquartile range of Spelling scores for Ms. Affolter’s classroom. If any outlying data points are apparent for her classroom, identify the student or students by name, and explain what you noted.

  • State and interpret the value of the median, the 25th percentile, the 75th percentile, and the interquartile range of Spelling scores for Mr. Miller-Lane’s classroom. If any outlying data points are apparent for his classroom, identify the student or students by name, and explain what you noted.

  • State and interpret the value of the median, the 25th percentile, the 75th percentile, and the interquartile range of Spelling scores for Ms. Weston’s classroom. If any outlying data points are apparent for her classroom, identify the student or students by name, and explain what you noted.

Task 10 - Density plots of Spelling by Classroom

Produce a density plot of Spelling scores disaggregated by Classroom (similar to the transparent density curves for the Penguins in R4DS section 1.5.1).

Task 11 - Opinion

In one paragraph, describe what the advantages and disadvantages of each type of univariate plot (boxplots v. density plots) are, in your opinion. In a second paragraph, make a recommendation as to which one you would use if you could only show one plot to the larger community of 3rd grade parents, the principal, and the school board – and why.

Task 12 - Recommendations

Without employing any statistical inference or causal language, write (in 200 words or fewer) your general recommendations for the teachers on the 3rd grade team and their Principal regarding what your descriptive analysis of the Spelling data for the three Classrooms reveals. As consultants, your opinions should be non-judgmental, providing guidance and support for these teachers and their students.

Turn in your work

Submit a joint .qmd file to Canvas by September 25. Be sure that all names are listed. It is not necessary for all to submit DAM1 to Canvas, as the grade for your submission will be identical for the team.

A Final Word

As this is a collaborative effort between partners, please abide by the Middlebury policies on plagiarism. If someone else makes a contribution to your work (other than your instructor), recognize their contribution explicitly. This requirement includes class members from other groups, other members of the Middlebury community, using AI like ChatGPT, and copying code from websites like StackOverflow.

(In this case, I assume that a lot of the code you write is very similar to code from our R4DS textbook, so I don’t ask you to cite the book.)

If you do this consistently, you will not be accused of plagiarism. There is no penalty for seeking help, provided that the help is explicitly recognized in your memo.

Remember that this is not a competition. Everyone’s work will be graded on its quality.