Part A: Introduction

The PISA 2009 dataset, which is part of the likert package in R, contains responses from students across different countries regarding their attitudes toward reading. The data is based on the Programme for International Student Assessment (PISA), a worldwide study by the OECD that evaluates educational systems by testing 15-year-old students in various subjects.

In this report, we will focus on analyzing selected questions from the PISA 2009 dataset to gain insights into students’ reading habits and preferences. The dataset consists of 81 variables and 66,690 observations, with each variable representing a different survey question and each observation representing an individual student’s response.

The dataset includes the following factors:

Country codes: Identifying the participating countries.
Reading habits: How often students engage in reading activities.
Reading frequency: The types of reading materials students prefer.
Students’ attitudes toward books: Their enjoyment and attitudes toward reading.

In our analysis, we will leverage R’s likert package to visualize and summarize the data, providing a comprehensive understanding of students’ reading attitudes in the participating countries.

I was born and raised in Poland and have been living in Iceland since 2006, which means I’ve been here for 18 years! As a mother of twins—a boy and a girl born in 2013 who have American and Icelandic citizenship—I am deeply interested in how different countries approach education, especially reading habits and attitudes. While I had initially hoped to compare data from all three countries that hold personal significance for my family, this report will focus on analyzing the reading attitudes of students from the USA compared to the overall average from all participating countries in the PISA 2009 dataset.

Understanding how American students’ attitudes align with or differ from the global average gives me valuable insight into the broader educational context that my children might encounter in their multicultural upbringing.

Part B: Exploring Data Frames and Variable Commands

Data Frame Commands

The purpose of this section is to examine the structure, summary statistics, and other properties of our filtered data frame containing data from the USA, Canada, and Mexico. Below are five R commands that provide insights into the data frame:

Commands for Data Frames

str(selected_countries): Shows the internal structure of the selected_countries data frame, including data types and a preview of the values.
dim(selected_countries): Displays the total number of rows and columns, helping us understand the size of the data.
summary(selected_countries): Provides a summary of each column, including details such as minimum, maximum, mean, and quartile values for numerical variables, and the count of factors for categorical variables.
names(selected_countries): Lists all the column names (variables) in the data frame, helping identify the available data fields.
head(selected_countries, 5): Displays the first 5 rows of the data frame, offering a quick view of how the data is structured.

Variable Commands

Next, we’ll explore specific variables within our USA dataset. These commands help us understand individual variables, such as the responses to a particular question.

Commands for Variables

length(selected_countries$ST24Q01): Shows the total number of observations/responses for the question ST24Q01.
table(selected_countries$ST24Q01): Creates a table that shows the frequency of each response option.
unique(selected_countries$ST24Q01): Displays all distinct response options used in the dataset for this question.
is.factor(selected_countries$ST24Q01): Confirms whether the variable is stored as a categorical factor, which is essential for proper analysis.
summary(selected_countries$ST24Q01): Provides a concise summary of the variable, showing the count of each response category.

By using these commands, we’ve gained a better understanding of the structure of our USA dataset as well as specific variables within it. These insights are crucial as they guide our subsequent analysis, allowing us to explore students’ reading habits in the USA and compare them with the overall attitudes observed globally.

Part C: Selecting and Describing Variables

In this section, we’ll focus on analyzing students’ reading attitudes using selected variables from the PISA 2009 dataset. The following variables capture different aspects of students’ reading habits and preferences:

ST24Q01: “I read only if I have to.”
ST24Q02: “Reading is one of my favorite hobbies.”
ST24Q03: “I like talking about books with other people.”

These questions reflect students’ attitudes toward reading and will help us compare how students in the USA perceive reading activities compared to the overall responses from all participating countries.

Data Preparation and Analysis

Next, we will filter our dataset to include only the selected variables and use the likert() function from the likert package to prepare the data for analysis. We will analyze the reading attitudes of students from the USA and compare them with the overall responses from all participating countries in the dataset (Canada, Mexico, and the USA).

# Select variables related to students' reading attitudes
usa_reading_attitudes <- selected_countries[selected_countries$CNT == "United States", c("ST24Q01", "ST24Q02", "ST24Q03")]


# Rename columns for better readability
colnames(usa_reading_attitudes) <- c(
  "Read only if I have to",
  "Reading is a favorite hobby",
  "Like talking about books"
)

# Display the first few rows of the selected variables for the USA
head(usa_reading_attitudes)

##        Read only if I have to Reading is a favorite hobby
## 470228         Strongly agree           Strongly disagree
## 470229               Disagree                       Agree
## 470230         Strongly agree           Strongly disagree
## 470231                  Agree                    Disagree
## 470232         Strongly agree           Strongly disagree
## 470233                   <NA>                        <NA>
##        Like talking about books
## 470228        Strongly disagree
## 470229                    Agree
## 470230        Strongly disagree
## 470231        Strongly disagree
## 470232        Strongly disagree
## 470233                     <NA>

# Create a dataset for all participants (Canada, Mexico, and the USA)
global_data <- pisaitems[, c("ST24Q01", "ST24Q02", "ST24Q03")]

# Rename columns for better readability
colnames(global_data) <- c(
  "Read only if I have to",
  "Reading is a favorite hobby",
  "Like talking about books"
)

# Display the first few rows of the global dataset
head(global_data)

##       Read only if I have to Reading is a favorite hobby
## 68038               Disagree              Strongly agree
## 68039                  Agree           Strongly disagree
## 68040         Strongly agree           Strongly disagree
## 68041               Disagree                    Disagree
## 68042      Strongly disagree                    Disagree
## 68043                  Agree           Strongly disagree
##       Like talking about books
## 68038           Strongly agree
## 68039        Strongly disagree
## 68040        Strongly disagree
## 68041                    Agree
## 68042        Strongly disagree
## 68043        Strongly disagree

Now that we have selected the variables for the USA and the global dataset, we’ll proceed with applying the likert() function to analyze the data.

Analyzing Reading Attitudes with the likert() Function

In this section, we use the likert() function to summarize students’ responses to the selected variables from the USA and the global dataset. This function provides an overview of how students from the USA responded to statements about their reading attitudes compared to the average responses from all participating countries.

# Use the likert function to analyze the selected variables for the USA
likert_analysis <- likert(usa_reading_attitudes)

# View the summary of the likert analysis for the USA
summary(likert_analysis)

##                          Item      low neutral     high     mean        sd
## 1      Read only if I have to 50.17422       0 49.82578 2.485095 0.9541627
## 3    Like talking about books 59.45317       0 40.54683 2.239868 0.9077166
## 2 Reading is a favorite hobby 69.60139       0 30.39861 2.106811 0.9298025

The table above summarizes how students from the USA responded to the selected questions. It gives us insight into the overall reading attitudes among students in the USA.

Comparing with the Global Average

Next, we compare the USA’s results with the overall average responses from all participating countries (Canada, Mexico, and the USA). This comparison will allow us to identify any differences or similarities in reading attitudes between the USA and the combined dataset.

# Apply the likert() function to analyze the global dataset
global_likert_analysis <- likert(global_data)

# Display a summary of the results for all participants
summary(global_likert_analysis)

##                          Item      low neutral     high     mean        sd
## 3    Like talking about books 54.99129       0 45.00871 2.328049 0.9090326
## 2 Reading is a favorite hobby 56.64470       0 43.35530 2.344530 0.9277495
## 1      Read only if I have to 58.72868       0 41.27132 2.291811 0.9369023

The table above provides the global average responses for the selected questions. By comparing this with the USA-specific data, we can observe where the USA stands in relation to the overall reading attitudes among all participating countries.

Conclusion

This analysis helps us understand the reading attitudes of students in the USA in comparison to the combined dataset. In the following section, we will interpret the results and highlight key observations from this comparison.

Part D: Part D: Analyzing and Visualizing Data with the likert() Function

In this section, we will visualize the selected variables using the likert() function. This visualization helps us compare reading attitudes between students in the USA and the overall dataset.

Visualizing the USA Reading Attitudes

# Plot the reading attitudes for the USA
plot(likert_analysis, centered = TRUE, main = "Reading Attitudes in the USA")

The plot above visualizes how students in the USA responded to the statements about reading. The responses are presented in a stacked bar chart, showing the distribution of agreement and disagreement for each statement.

Visualizing the Global Reading Attitudes

# Plot the reading attitudes for all participants
plot(global_likert_analysis, centered = TRUE, main = "Global Reading Attitudes (Canada, Mexico, and USA)")

The plot above provides a visualization of reading attitudes across all participating countries, allowing us to compare these results with the USA-specific data.

Interpretation of Results: These visualizations provide insights into how students in the USA view reading compared to the overall dataset. We observe differences in the levels of agreement or disagreement with each statement, offering valuable insights into reading habits.

Summary Analysis (USA): The table showed that students in the USA had diverse attitudes towards reading. For example, the statement “Reading is one of my favorite hobbies” had a substantial percentage of disagreement, indicating that reading may not be a favorite leisure activity for many students in the USA. In contrast, statements like “I like talking about books with other people” had a more balanced response, indicating a mixed level of interest in discussing reading with peers.

Comparison with Global Data When compared with the global dataset, the USA exhibited some notable differences. For instance, the global dataset displayed higher overall enthusiasm for reading as a favorite hobby, suggesting that students outside the USA might be more engaged with reading as a hobby. Conversely, the level of agreement with the statement “I read only if I have to” was relatively similar, indicating a shared sentiment among students from different countries that reading is sometimes seen as a task rather than a pleasure.

Part E: Advanced Visualization with the likert() Package

In this section, we enhance our visualizations to highlight the reading attitudes among students in the USA and globally.

We´ll create more in-depth and varied visualizations using the functions from the likert package.

Bar Plot with Changed Colors

# Change colors for the plot
plot(likert_analysis, colors=c('orange', 'yellow', 'blue', 'darkblue'), center = 2)

This visualization uses a different color scheme to make the comparison more striking. The center value is adjusted to highlight the level of agreement or disagreement.

Response Histograms

# Include a histogram of responses
plot(likert_analysis, include.histogram=TRUE)

The histogram gives us a clearer view of the distribution of responses for each item, showing how many students selected each response category.

Density Plot

# Generate a density plot
plot(likert_analysis, type='density', facet=FALSE)

The density plot treats the Likert items as continuous variables and offers insights into how responses are distributed across the items. This can help us understand which attitudes were most common.

Heatmap Plot

# Create a heatmap
plot(likert_analysis, type='heat', wrap=30, text.size=4)

The heatmap provides a visual representation of the mean and standard deviation for each item, making it easier to spot trends and differences between the items.

Grouping Responses (Detailed Comparison USA vs. Global)

To visualize differences between the USA and the global dataset, let’s group the responses:

# Group the responses for comparison
usa_vs_global <- likert(items = global_data[, 1:3], grouping = pisaitems$CNT)
plot(usa_vs_global)

This grouped bar plot will illustrate how students from the USA, Canada, and Mexico differ in their reading attitudes for each statement, enabling a comprehensive comparison.

Closing Notes

With these additional visualizations, we’ve gained deeper insights into the attitudes towards reading. The combination of bar plots, density plots, heatmaps, and grouped comparisons allows us to see patterns and trends that may not be evident in a basic summary. This detailed analysis will help us understand students’ reading habits more comprehensively.

Conclusion

This project has been a rewarding experience, not just for exploring reading attitudes among students but also for the opportunity to dive deep into the practical applications of quantitative research methods using R. Working with the likert package offered me a hands-on way to transform raw data into meaningful insights, turning numbers into stories that reveal patterns and trends.

One of the most valuable lessons I learned was the power of data visualization. The likert package allowed me to present complex data in a way that’s both accessible and informative, and I can definitely see how these skills will be useful in future projects – whether for academic work or real-world applications.

It was also an eye-opener to see how different aspects of data analysis come together. From selecting the right variables, renaming columns for clarity, and understanding how to compare results, I could witness firsthand how the tools in R bring data to life. Although the focus was on students’ attitudes toward reading, the real achievement here was gaining confidence in using R to carry out a structured, thorough analysis.

Looking ahead, I’m excited to keep exploring these skills. In the future, I’d love to extend this analysis to compare other countries, especially Iceland and Poland, and apply these methods to more current datasets to see how reading habits have evolved. This project has just scratched the surface of what’s possible with R, and I’m eager to keep learning and growing in my quantitative analysis journey.

Overall, it’s been a fantastic experience that reinforced the idea that with the right tools, even a beginner like me can conduct meaningful and professional data analysis.

It’s not just about crunching numbers – it’s about discovering the stories hidden within them. This projecy has shown me that data isn’t just about figures and tables; it’s about understanding people, their behaviors, and the world they live in. I look forward to applying these skills in future projects and, who knows, maybe even using them to explore my children’s reading journey as they grow!

Thank you for joining me on this analytical adventure – I hope you enjoyed it as much as I did.

Verkefni 3 - PISA 2009 Analysis

Daniela Zbikowska

2024-09-28

Course: 104.6.0.QNRM Megindlegar rannsóknaraðferðir 6 ETCS

Instructor: Kári Joensen

Semester: Haustönn 2024

Verkefni 3: Einstaklingsverkefni

https://rpubs.com/dkz/verkefni3

Part A: Introduction

Part B: Exploring Data Frames and Variable Commands

Data Frame Commands

Commands for Data Frames

Variable Commands

Commands for Variables

Part C: Selecting and Describing Variables

Data Preparation and Analysis

Analyzing Reading Attitudes with the likert() Function

Comparing with the Global Average

Conclusion

Part D: Part D: Analyzing and Visualizing Data with the likert() Function

Visualizing the USA Reading Attitudes

Visualizing the Global Reading Attitudes

Part E: Advanced Visualization with the likert() Package

Bar Plot with Changed Colors

Response Histograms

Density Plot

Heatmap Plot

Grouping Responses (Detailed Comparison USA vs. Global)

Closing Notes

Conclusion