Warning: package 'dplyr' was built under R version 4.4.3
Attaching package: 'dplyr'
The following objects are masked from 'package:stats':
filter, lag
The following objects are masked from 'package:base':
intersect, setdiff, setequal, union
Warning: package 'ggplot2' was built under R version 4.4.3
Warning: package 'tidyr' was built under R version 4.4.3
Overall Data Description
1. Dimension
2. Data Types
Text/Categorical (Non-numerical): The dataset contains almost no free-text responses. The only character column is Country, which is represented by standardized 3-letter ISO country codes.
Numerical: The vast majority of the dataset is numerical (integer).
Year and Wave act as temporal and categorical numerical groupings.
Age is a continuous numerical variable.
All other survey responses—such as Important in Life (ILFam, ILReligion), Active Memberships (ACTUnions), Self-positioning in Political Scale (PolScale), and Confidence in organizations (CPolice, CChurches)—are stored as integers representing categorical or ordinal scales.
3. Missing Values
Critical Detail: In the World Values Survey dataset, missing values are encoded as negative numbers (e.g., -1, -2, -4, -5).
These typically represent “Don’t know”, “No answer”, “Not asked in this country”, or “Missing”.
Note of Relevance: Before doing any statistical analysis, taking means, or building predictive models, you must convert these negative values to standard NAs in R. Otherwise, the negative numbers will severely skew your numerical distributions.
4. Distribution of Numerical Responses
10-Point Scales: Variables like LifeSatis (Life Satisfaction), PolScale, and IncomeEquality are typically distributed on a 1 to 10 scale.
4-Point Likert Scales: The target variables for your assignment—Confidence in organizations (CChurches, CPolice, CGovernment, etc.)—are usually recorded on a 1 to 4 scale (e.g., 1 = A great deal, 2 = Quite a lot, 3 = Not very much, 4 = None at all).
Binary/Dummy Variables: Variables like Qualities Children should learn (ICQIndependence, ICQHardWork) are distributed as 0 (Not mentioned) and 1 (Important).