The PISA 2009 dataset, which is part of the likert
package in R, contains responses from students across different
countries regarding their attitudes toward reading. The data is based on
the Programme for International Student Assessment (PISA), a worldwide
study by the OECD that evaluates educational systems by testing
15-year-old students in various subjects.
In this report, we will focus on analyzing selected questions from the PISA 2009 dataset to gain insights into students’ reading habits and preferences. The dataset consists of 81 variables and 66,690 observations, with each variable representing a different survey question and each observation representing an individual student’s response.
The dataset includes the following factors:
In our analysis, we will leverage R’s likert package to
visualize and summarize the data, providing a comprehensive
understanding of students’ reading attitudes in the participating
countries.
I was born and raised in Poland and have been living in Iceland since 2006, which means I’ve been here for 18 years! As a mother of twins—a boy and a girl born in 2013 who have American and Icelandic citizenship—I am deeply interested in how different countries approach education, especially reading habits and attitudes. While I had initially hoped to compare data from all three countries that hold personal significance for my family, this report will focus on analyzing the reading attitudes of students from the USA compared to the overall average from all participating countries in the PISA 2009 dataset.
Understanding how American students’ attitudes align with or differ from the global average gives me valuable insight into the broader educational context that my children might encounter in their multicultural upbringing.
The purpose of this section is to examine the structure, summary statistics, and other properties of our filtered data frame containing data from the USA, Canada, and Mexico. Below are five R commands that provide insights into the data frame:
str(selected_countries): Shows the
internal structure of the selected_countries data frame,
including data types and a preview of the values.dim(selected_countries): Displays the
total number of rows and columns, helping us understand the size of the
data.summary(selected_countries): Provides
a summary of each column, including details such as minimum, maximum,
mean, and quartile values for numerical variables, and the count of
factors for categorical variables.names(selected_countries): Lists all
the column names (variables) in the data frame, helping identify the
available data fields.head(selected_countries, 5): Displays
the first 5 rows of the data frame, offering a quick view of how the
data is structured.Next, we’ll explore specific variables within our USA dataset. These commands help us understand individual variables, such as the responses to a particular question.
length(selected_countries$ST24Q01):
Shows the total number of observations/responses for the question
ST24Q01.table(selected_countries$ST24Q01):
Creates a table that shows the frequency of each response option.unique(selected_countries$ST24Q01):
Displays all distinct response options used in the dataset for this
question.is.factor(selected_countries$ST24Q01):
Confirms whether the variable is stored as a categorical factor, which
is essential for proper analysis.summary(selected_countries$ST24Q01):
Provides a concise summary of the variable, showing the count of each
response category.By using these commands, we’ve gained a better understanding of the structure of our USA dataset as well as specific variables within it. These insights are crucial as they guide our subsequent analysis, allowing us to explore students’ reading habits in the USA and compare them with the overall attitudes observed globally.
In this section, we’ll focus on analyzing students’ reading attitudes using selected variables from the PISA 2009 dataset. The following variables capture different aspects of students’ reading habits and preferences:
ST24Q01: “I read only if I have
to.”ST24Q02: “Reading is one of my favorite
hobbies.”ST24Q03: “I like talking about books with other
people.”These questions reflect students’ attitudes toward reading and will help us compare how students in the USA perceive reading activities compared to the overall responses from all participating countries.
Next, we will filter our dataset to include only the selected
variables and use the likert() function from the
likert package to prepare the data for analysis. We will
analyze the reading attitudes of students from the USA and compare them
with the overall responses from all participating countries in the
dataset (Canada, Mexico, and the USA).
# Select variables related to students' reading attitudes
usa_reading_attitudes <- selected_countries[selected_countries$CNT == "United States", c("ST24Q01", "ST24Q02", "ST24Q03")]
# Rename columns for better readability
colnames(usa_reading_attitudes) <- c(
"Read only if I have to",
"Reading is a favorite hobby",
"Like talking about books"
)
# Display the first few rows of the selected variables for the USA
head(usa_reading_attitudes)
## Read only if I have to Reading is a favorite hobby
## 470228 Strongly agree Strongly disagree
## 470229 Disagree Agree
## 470230 Strongly agree Strongly disagree
## 470231 Agree Disagree
## 470232 Strongly agree Strongly disagree
## 470233 <NA> <NA>
## Like talking about books
## 470228 Strongly disagree
## 470229 Agree
## 470230 Strongly disagree
## 470231 Strongly disagree
## 470232 Strongly disagree
## 470233 <NA>
# Create a dataset for all participants (Canada, Mexico, and the USA)
global_data <- pisaitems[, c("ST24Q01", "ST24Q02", "ST24Q03")]
# Rename columns for better readability
colnames(global_data) <- c(
"Read only if I have to",
"Reading is a favorite hobby",
"Like talking about books"
)
# Display the first few rows of the global dataset
head(global_data)
## Read only if I have to Reading is a favorite hobby
## 68038 Disagree Strongly agree
## 68039 Agree Strongly disagree
## 68040 Strongly agree Strongly disagree
## 68041 Disagree Disagree
## 68042 Strongly disagree Disagree
## 68043 Agree Strongly disagree
## Like talking about books
## 68038 Strongly agree
## 68039 Strongly disagree
## 68040 Strongly disagree
## 68041 Agree
## 68042 Strongly disagree
## 68043 Strongly disagree
Now that we have selected the variables for the USA and the global
dataset, we’ll proceed with applying the likert() function
to analyze the data.
In this section, we use the likert() function to summarize students’ responses to the selected variables from the USA and the global dataset. This function provides an overview of how students from the USA responded to statements about their reading attitudes compared to the average responses from all participating countries.
# Use the likert function to analyze the selected variables for the USA
likert_analysis <- likert(usa_reading_attitudes)
# View the summary of the likert analysis for the USA
summary(likert_analysis)
## Item low neutral high mean sd
## 1 Read only if I have to 50.17422 0 49.82578 2.485095 0.9541627
## 3 Like talking about books 59.45317 0 40.54683 2.239868 0.9077166
## 2 Reading is a favorite hobby 69.60139 0 30.39861 2.106811 0.9298025
The table above summarizes how students from the USA responded to the selected questions. It gives us insight into the overall reading attitudes among students in the USA.
Next, we compare the USA’s results with the overall average responses from all participating countries (Canada, Mexico, and the USA). This comparison will allow us to identify any differences or similarities in reading attitudes between the USA and the combined dataset.
# Apply the likert() function to analyze the global dataset
global_likert_analysis <- likert(global_data)
# Display a summary of the results for all participants
summary(global_likert_analysis)
## Item low neutral high mean sd
## 3 Like talking about books 54.99129 0 45.00871 2.328049 0.9090326
## 2 Reading is a favorite hobby 56.64470 0 43.35530 2.344530 0.9277495
## 1 Read only if I have to 58.72868 0 41.27132 2.291811 0.9369023
The table above provides the global average responses for the selected questions. By comparing this with the USA-specific data, we can observe where the USA stands in relation to the overall reading attitudes among all participating countries.
This analysis helps us understand the reading attitudes of students in the USA in comparison to the combined dataset. In the following section, we will interpret the results and highlight key observations from this comparison.
In this section, we will visualize the selected variables using the likert() function. This visualization helps us compare reading attitudes between students in the USA and the overall dataset.
# Plot the reading attitudes for the USA
plot(likert_analysis, centered = TRUE, main = "Reading Attitudes in the USA")
The plot above visualizes how students in the USA responded to the statements about reading. The responses are presented in a stacked bar chart, showing the distribution of agreement and disagreement for each statement.
# Plot the reading attitudes for all participants
plot(global_likert_analysis, centered = TRUE, main = "Global Reading Attitudes (Canada, Mexico, and USA)")
The plot above provides a visualization of reading attitudes across all participating countries, allowing us to compare these results with the USA-specific data.
Interpretation of Results: These visualizations provide insights into how students in the USA view reading compared to the overall dataset. We observe differences in the levels of agreement or disagreement with each statement, offering valuable insights into reading habits.
Summary Analysis (USA): The table showed that students in the USA had diverse attitudes towards reading. For example, the statement “Reading is one of my favorite hobbies” had a substantial percentage of disagreement, indicating that reading may not be a favorite leisure activity for many students in the USA. In contrast, statements like “I like talking about books with other people” had a more balanced response, indicating a mixed level of interest in discussing reading with peers.
Comparison with Global Data When compared with the global dataset, the USA exhibited some notable differences. For instance, the global dataset displayed higher overall enthusiasm for reading as a favorite hobby, suggesting that students outside the USA might be more engaged with reading as a hobby. Conversely, the level of agreement with the statement “I read only if I have to” was relatively similar, indicating a shared sentiment among students from different countries that reading is sometimes seen as a task rather than a pleasure.
In this section, we enhance our visualizations to highlight the reading attitudes among students in the USA and globally.
We´ll create more in-depth and varied visualizations using the functions from the likert package.
# Change colors for the plot
plot(likert_analysis, colors=c('orange', 'yellow', 'blue', 'darkblue'), center = 2)
This visualization uses a different color scheme to make the comparison
more striking. The center value is adjusted to highlight the level of
agreement or disagreement.
# Include a histogram of responses
plot(likert_analysis, include.histogram=TRUE)
The histogram gives us a clearer view of the distribution of responses
for each item, showing how many students selected each response
category.
# Generate a density plot
plot(likert_analysis, type='density', facet=FALSE)
The density plot treats the Likert items as continuous variables and offers insights into how responses are distributed across the items. This can help us understand which attitudes were most common.
# Create a heatmap
plot(likert_analysis, type='heat', wrap=30, text.size=4)
The heatmap provides a visual representation of the mean and standard
deviation for each item, making it easier to spot trends and differences
between the items.
To visualize differences between the USA and the global dataset, let’s group the responses:
# Group the responses for comparison
usa_vs_global <- likert(items = global_data[, 1:3], grouping = pisaitems$CNT)
plot(usa_vs_global)
This grouped bar plot will illustrate how students from the USA, Canada, and Mexico differ in their reading attitudes for each statement, enabling a comprehensive comparison.
With these additional visualizations, we’ve gained deeper insights into the attitudes towards reading. The combination of bar plots, density plots, heatmaps, and grouped comparisons allows us to see patterns and trends that may not be evident in a basic summary. This detailed analysis will help us understand students’ reading habits more comprehensively.
This project has been a rewarding experience, not just for exploring reading attitudes among students but also for the opportunity to dive deep into the practical applications of quantitative research methods using R. Working with the likert package offered me a hands-on way to transform raw data into meaningful insights, turning numbers into stories that reveal patterns and trends.
One of the most valuable lessons I learned was the power of data visualization. The likert package allowed me to present complex data in a way that’s both accessible and informative, and I can definitely see how these skills will be useful in future projects – whether for academic work or real-world applications.
It was also an eye-opener to see how different aspects of data analysis come together. From selecting the right variables, renaming columns for clarity, and understanding how to compare results, I could witness firsthand how the tools in R bring data to life. Although the focus was on students’ attitudes toward reading, the real achievement here was gaining confidence in using R to carry out a structured, thorough analysis.
Looking ahead, I’m excited to keep exploring these skills. In the future, I’d love to extend this analysis to compare other countries, especially Iceland and Poland, and apply these methods to more current datasets to see how reading habits have evolved. This project has just scratched the surface of what’s possible with R, and I’m eager to keep learning and growing in my quantitative analysis journey.
Overall, it’s been a fantastic experience that reinforced the idea that with the right tools, even a beginner like me can conduct meaningful and professional data analysis.
It’s not just about crunching numbers – it’s about discovering the stories hidden within them. This projecy has shown me that data isn’t just about figures and tables; it’s about understanding people, their behaviors, and the world they live in. I look forward to applying these skills in future projects and, who knows, maybe even using them to explore my children’s reading journey as they grow!
Thank you for joining me on this analytical adventure – I hope you enjoyed it as much as I did.