Survey data is defined as the resultant data that is collected from a sample of respondents that took a survey. This data is comprehensive information gathered from a target audience about a specific topic to conduct research. There are many methods used for survey data collection and statistical analysis.
Various channels are used to collect feedback and opinions from the desired sample of individuals. While conducting survey research, researchers prefer multiple sources to gather data such as online surveys, telephonic surveys, face-to-face surveys, etc.
However, the medium of collecting survey data decides the sample of people that are to be reached out to, to reach the requisite number of survey responses.
Factors of collecting survey data such as how the interviewer will contact the respondents (online or offline), how the information is communicated to the respondents etc. decide the effectiveness of gathered data.
SURVEY DATA COLLECTION METHODS
The methods used to collect survey data have evolved with the change in technology. From face-to-face surveys, telephonic surveys to now online and email surveys, the world of survey data collection has changed with time. Each survey data collection method has its pros and cons, and every researcher has a preference for gathering accurate information from the target sample.
The survey response rates for each of these data collection methods will differ as their reach and impact are always different. Different ways are chosen according to specific target population characteristics and the intent to examine human nature under various situations.
There are four main survey data collection methods:
Online Surveys
Face-to-Face
Surveys, Telephone Surveys, and
Paper Survey.
ONLINE SURVEYS
Online surveys are the most cost-effective and can reach the maximum number of people in comparison to the other mediums. The performance of these surveys is much more widespread than the other data collection methods. In situations where there is more than one question to be asked to the target sample, certain researchers prefer conducting online surveys over the traditional face-to-face or telephone surveys.
Online surveys are effective and therefore require computational logic and branching technologies for exponentially more accurate survey data collection versus any other traditional means of surveying. They are straightforward in their implementation and take a minimum time of the respondents.
The investment required for survey data collection using online surveys is also negligible in comparison to the other methods. The results are collected in real-time for researchers to analyze and decide corrective measures. A very good example of an online survey is a hotel chain using an online survey to collect guest satisfaction metrics after a stay or an event at the property.
FACE-T0-FACE SURVEYS
Gaining information from respondents via face-to-face medium is much more effective than the other mediums because respondents usually tend to trust the surveyors and provide honest and clear feedback about the subject in-hand. Researchers can easily identify whether their respondents are uncomfortable with the asked questions and can be extremely productive in case there are sensitive topics involved in the discussion.
This face-to-face data collection method demands more cost-investment than in comparison to the other methods. According to the geographic or psychographic segmentation, researchers must be trained to gain accurate information.
For example, a job evaluation survey is conducted in person between an HR or a manager with the employee. This method works best face-to-face as the data collection can collect as accurate information as possible.
TELEPHONE SURVEYS
Telephone surveys require much lesser investment than face-to-face surveys. Depending on the required reach, telephone surveys cost as much or a little more than online surveys. Contacting respondents via the telephonic medium requires less effort and manpower than the face-to-face survey medium. If interviewers are located at the same place, they can cross-check their questions to ensure error-free questions are asked to the target audience.
The main drawback of conducting telephone surveys is that establishing a friendly equation with the respondent becomes challenging due to the bridge of the medium. Respondents are also highly likely to choose to remain anonymous in their feedback over the phone as the reliability associated with the researcher can be questioned.
For example, if a retail giant would like to understand purchasing decisions, they can conduct a telephonic, motivation, and buying experience survey to collect data about the entire purchasing experience.
PAPER SURVEYS
The other commonly used survey method is paper surveys. These surveys can be used where laptops, computers, and tablets cannot go, and hence they use the age-old method of data collection; pen and paper. This method helps collect survey data in field research and helps strengthen the number of responses collected and the validity of these responses.
A popular example or use case of a paper survey is a fast food restaurant survey where the fast-food chain would like to collect feedback on the dining experience of its patrons (attendants).
TYPES OF SURVEY DATA BASED ON THE FREQUENCY AT WHICH THEY ARE ADMINISTERED
Surveys can be divided into 3 distinctive types on the basis of the frequency of their distribution. They are:
Cross-Sectional Surveys Cross-sectional surveys are an observational research method that analyzes data of variables collected at one given point of time across a sample population or a pre-defined subset. The survey data from this method helps the researcher understand what the respondent is feeling at a certain point in time.
It helps measure opinions in a particular situation. For example, if the researcher would like to understand movie rental habits, a survey can be conducted across demographics and geographical locations.
The cross-sectional survey, for example, can help understand that males between 21-28 rent action movies and females between 35-45 rent romantic comedies.
Longitudinal Surveys Longitudinal surveys are those surveys that help researchers to make an observation and collect data over an extended period of time. This survey data can be qualitative or quantitative in nature, and the survey creator does not interfere with the survey respondents.
For example, a longitudinal study can be carried out for years to help understand if mine workers are more prone to lung diseases. This study takes a year and discounts any pre-existing conditions.
Retrospective Surveys In retrospective surveys, researchers ask respondents to report events from the past. This survey method offers in-depth survey data but doesn’t take as long to complete. By deploying this kind of survey, researchers can gather data based on past experiences and beliefs of people.
For example, if hikers are asked about a certain hike – the conditions of the hiking trail, ease of hike, weather conditions, trekking conditions, etc. after they have completed the trek, it is a retrospective study.
### SURVEY DATA ANALYSIS USING R LANGUAGE ###
After the survey data has been collected, this data has to be analyzed to ensure it aids towards the end research objective. There are different ways of conducting this research and some steps to follow. There are four main steps of survey data analysis:
No. 1: Understand the most popular survey research questions: The survey questions should align with the overall purpose of the survey. That is when the collected data will be effective in helping researchers. For example, if a seminar has been conducted, the researchers will send out a post-seminar feedback survey. T he primary goal of this survey will be to understand whether the attendees are interested in attending future seminars. The question will be: “How likely are you to attend future seminars?”. Data collected for this question will decide the likelihood of success of future seminars.
No. 2: Filter obtained results using the cross-tabulation technique: Understand the various categories in the target audience and their thoughts using cross-tabulation format. For example, if there are business owners, administrators, students, etc. who attend the seminar, the data about whether they would prefer attending future seminars or not can be represented using cross-tabulation.
No. 3: Evaluate the derived numbers: Analyzing the gathered information is critical. How many of the attendees are of the opinion that they will be attending future seminars and how many will not – these facts need to be evaluated according to the results obtained from the sample.
No. 4: Draw conclusions: Weave a story with the collected and analyzed data. What was the intention of the survey research, and how does the survey data suffice that objective? – Understand that and develop accurate, conclusive results.
### SURVEY DATA ANALYSIS METHODS ###
Conducting a survey without having access to the resultant data and the inability to draw conclusions from the survey data is pointless. When you conduct a survey, it is imperative to have access to its analytics. It is tough to analyze using traditional survey methods like pen and paper and also requires additional manpower. Survey data analysis becomes much easier when using advanced online data collection methods with an online survey platform such as market research survey software or customer survey software like R and Python.
Statistical analysis can be conducted on the survey data to make sense of the data that has been collected. There are multiple data analysis methods of quantitative data. Some of the commonly used types are:
Cross-tabulation analysis
Trend analysis
MaxDiff analysis
Conjoint analysis
Total Unduplicated Reach and Frequency (TURF) analysis
Gap analysis
SWOT analysis: SWOT analysis, another widely used statistical method, organizes survey data into data that represents the strength, weaknesses, opportunities, and threats of an organization or product or service that provides a holistic picture of competition. This method helps to create effective business strategies.
Text analysis: Text analysis is an advanced statistical method where intelligent tools make sense of and quantify or fashion qualitative and open-ended data into easily understandable data. This method is used when the survey data is unstructured.
APPLICATION OF R PROGRAMMING LANGUAGE IN SURVEY DATA ANALYSIS
We want to describe and implement functions (commands) that aid the exploration of survey data via simple tabulations of respondent counts and proportions, including the ability to specify:
either a frequency count or a row/column/joint/total table proportion;
multiple row and column variables; and
all or grand margins or no margins plus retention of data in a format that is amendable to further analysis in R.
NOW, LET’S START THE CALCULATIONS
PRACTICAL ILLUSTRATION
We will simulate datasets, that will be approximately 95% accurate with real-life dataset. That will serve as our case study.
Install the following packages as follows:
# install.packages(“tidyverse”)
# install.packages(“stringr”)
# install.packages(“gmodels”)
# install.packages(“ggplots”)
# install.packages(“descr”)
# After installation, call their libraries one by one as follows:
Run the codes one by one, it will draw the applicability of the packages for usage:
library("tidyverse")
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.4.0 ✔ purrr 0.3.5
## ✔ tibble 3.1.8 ✔ dplyr 1.0.10
## ✔ tidyr 1.2.1 ✔ stringr 1.4.0
## ✔ readr 2.1.3 ✔ forcats 0.5.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
library("stringr")
library("gmodels")
library("ggplot2")
library("descr")
##
## Attaching package: 'descr'
##
## The following object is masked from 'package:gmodels':
##
## CrossTable
EXAMPLE 1: TO CREATE SOME DEMOGRAPHIC DATA SETS:
Run the following R codes to see what you observe:
ID = seq(1:3000)
set.seed(234)
Age = sample(c("0 - 5", "6 - 14", "15 - 24", "25 - 50", "51 - 64", "65+ "), 3000, replace = TRUE)
#View(Age)
set.seed(234)
Gender = sample(c("Male", "Female"), 3000, replace = TRUE)
#View(Gender)
set.seed(234)
Country = sample(c("Nigeria", "Ghana", "South Africa", "Botswana", "United Kingdom", "Austria"), 3000, replace = TRUE)
#View(Country)
set.seed(234)
Health_Status = sample(c("Poor", "Fair", "Okay"), 3000, replace = TRUE)
#View(Health_Status)
Survey = data.frame(Age, Gender, Country, Health_Status)
#View(Survey)
head(Survey) ##Recall that head is used to pick the first 6 elements of the generated data set
#Look at this code:
head(Survey, 15) #This will produce the first 15 elements of survey
#What about the command below:
tail(Survey) # This picks the last 6 elements of the generated data set
tail(Survey, 20) # This picks the last 20 elements of the generated data set
result_1 = CrossTable(Survey$Age, Survey$Gender, expected = TRUE, chisq = TRUE, prop.chisq = TRUE, dnn = c("Age", "Gender"))
print(result_1)
## Cell Contents
## |-------------------------|
## | N |
## | Expected N |
## | Chi-square contribution |
## | N / Row Total |
## | N / Col Total |
## | N / Table Total |
## |-------------------------|
##
## =================================
## Gender
## Age Female Male Total
## ---------------------------------
## 0 - 5 253 227 480
## 246.7 233.3
## 0.160 0.169
## 0.527 0.473 0.160
## 0.164 0.156
## 0.084 0.076
## ---------------------------------
## 15 - 24 246 259 505
## 259.6 245.4
## 0.709 0.750
## 0.487 0.513 0.168
## 0.160 0.178
## 0.082 0.086
## ---------------------------------
## 25 - 50 235 263 498
## 256.0 242.0
## 1.718 1.817
## 0.472 0.528 0.166
## 0.152 0.180
## 0.078 0.088
## ---------------------------------
## 51 - 64 269 228 497
## 255.5 241.5
## 0.718 0.759
## 0.541 0.459 0.166
## 0.174 0.156
## 0.090 0.076
## ---------------------------------
## 6 - 14 268 232 500
## 257.0 243.0
## 0.471 0.498
## 0.536 0.464 0.167
## 0.174 0.159
## 0.089 0.077
## ---------------------------------
## 65+ 271 249 520
## 267.3 252.7
## 0.052 0.055
## 0.521 0.479 0.173
## 0.176 0.171
## 0.090 0.083
## ---------------------------------
## Total 1542 1458 3000
## 0.514 0.486
## =================================
##
## Statistics for All Table Factors
##
## Pearson's Chi-squared test
## ------------------------------------------------------------
## Chi^2 = 7.876522 d.f. = 5 p = 0.163
To generate the same results in SPSS format, use the command below:
result_1B = CrossTable(Survey$Age, Survey$Gender,prop.r=FALSE,prop.t=FALSE,prop.chisq=FALSE,format="SPSS")
print(result_1B)
## Cell Contents
## |-------------------------|
## | Count |
## | Column Percent |
## |-------------------------|
##
## ====================================
## Survey$Gender
## Survey$Age Female Male Total
## ------------------------------------
## 0 - 5 253 227 480
## 16.4% 15.6%
## ------------------------------------
## 15 - 24 246 259 505
## 16.0% 17.8%
## ------------------------------------
## 25 - 50 235 263 498
## 15.2% 18.0%
## ------------------------------------
## 51 - 64 269 228 497
## 17.4% 15.6%
## ------------------------------------
## 6 - 14 268 232 500
## 17.4% 15.9%
## ------------------------------------
## 65+ 271 249 520
## 17.6% 17.1%
## ------------------------------------
## Total 1542 1458 3000
## 51.4% 48.6%
## ====================================
Now, let’s consider Age and Health Status:
result_2 = CrossTable(Survey$Age, Survey$Health_Status, expected = TRUE, chisq = TRUE, prop.chisq = TRUE, dnn = c("Age", "Health_Status"))
print(result_2)
## Cell Contents
## |-------------------------|
## | N |
## | Expected N |
## | Chi-square contribution |
## | N / Row Total |
## | N / Col Total |
## | N / Table Total |
## |-------------------------|
##
## ========================================
## Health_Status
## Age Fair Okay Poor Total
## ----------------------------------------
## 0 - 5 145 181 154 480
## 163.2 160.3 156.5
## 2.030 2.668 0.039
## 0.302 0.377 0.321 0.160
## 0.142 0.181 0.157
## 0.048 0.060 0.051
## ----------------------------------------
## 15 - 24 166 179 160 505
## 171.7 168.7 164.6
## 0.189 0.633 0.130
## 0.329 0.354 0.317 0.168
## 0.163 0.179 0.164
## 0.055 0.060 0.053
## ----------------------------------------
## 25 - 50 177 153 168 498
## 169.3 166.3 162.3
## 0.348 1.069 0.197
## 0.355 0.307 0.337 0.166
## 0.174 0.153 0.172
## 0.059 0.051 0.056
## ----------------------------------------
## 51 - 64 164 153 180 497
## 169.0 166.0 162.0
## 0.147 1.018 1.995
## 0.330 0.308 0.362 0.166
## 0.161 0.153 0.184
## 0.055 0.051 0.060
## ----------------------------------------
## 6 - 14 179 176 145 500
## 170.0 167.0 163.0
## 0.476 0.485 1.988
## 0.358 0.352 0.290 0.167
## 0.175 0.176 0.148
## 0.060 0.059 0.048
## ----------------------------------------
## 65+ 189 160 171 520
## 176.8 173.7 169.5
## 0.842 1.078 0.013
## 0.363 0.308 0.329 0.173
## 0.185 0.160 0.175
## 0.063 0.053 0.057
## ----------------------------------------
## Total 1020 1002 978 3000
## 0.340 0.334 0.326
## ========================================
##
## Statistics for All Table Factors
##
## Pearson's Chi-squared test
## ------------------------------------------------------------
## Chi^2 = 15.34322 d.f. = 10 p = 0.12
For the immediate above, let’s have the same results in SPSS format:
result_2B = CrossTable(Survey$Age, Survey$Health_Status,prop.r=FALSE,prop.t=FALSE,prop.chisq=FALSE,format="SPSS")
print(result_2B)
## Cell Contents
## |-------------------------|
## | Count |
## | Column Percent |
## |-------------------------|
##
## ===========================================
## Survey$Health_Status
## Survey$Age Fair Okay Poor Total
## -------------------------------------------
## 0 - 5 145 181 154 480
## 14.2% 18.1% 15.7%
## -------------------------------------------
## 15 - 24 166 179 160 505
## 16.3% 17.9% 16.4%
## -------------------------------------------
## 25 - 50 177 153 168 498
## 17.4% 15.3% 17.2%
## -------------------------------------------
## 51 - 64 164 153 180 497
## 16.1% 15.3% 18.4%
## -------------------------------------------
## 6 - 14 179 176 145 500
## 17.5% 17.6% 14.8%
## -------------------------------------------
## 65+ 189 160 171 520
## 18.5% 16.0% 17.5%
## -------------------------------------------
## Total 1020 1002 978 3000
## 34.0% 33.4% 32.6%
## ===========================================
See another example below:
result_3 = CrossTable(Survey$Age, Survey$Country, expected = TRUE, chisq = TRUE, prop.chisq = TRUE, dnn = c("Age", "Country"))
print(result_3)
## Cell Contents
## |-------------------------|
## | N |
## | Expected N |
## | Chi-square contribution |
## | N / Row Total |
## | N / Col Total |
## | N / Table Total |
## |-------------------------|
##
## =====================================================================================
## Country
## Age Austria Botswana Ghana Nigeria Sth Afrc Untd Kng Total
## -------------------------------------------------------------------------------------
## 0 - 5 0 0 0 480 0 0 480
## 83.2 79.7 80.0 76.8 80.8 79.5
## 83.200 79.680 80.000 2116.800 80.800 79.520
## 0.000 0.000 0.000 1.000 0.000 0.000 0.160
## 0 0 0 1 0 0
## 0.000 0.000 0.000 0.160 0.000 0.000
## -------------------------------------------------------------------------------------
## 15 - 24 0 0 0 0 505 0 505
## 87.5 83.8 84.2 80.8 85.0 83.7
## 87.533 83.830 84.167 80.800 2075.008 83.662
## 0.000 0.000 0.000 0.000 1.000 0.000 0.168
## 0 0 0 0 1 0
## 0.000 0.000 0.000 0.000 0.168 0.000
## -------------------------------------------------------------------------------------
## 25 - 50 0 498 0 0 0 0 498
## 86.3 82.7 83.0 79.7 83.8 82.5
## 86.320 2086.668 83.000 79.680 83.830 82.502
## 0.000 1.000 0.000 0.000 0.000 0.000 0.166
## 0 1 0 0 0 0
## 0.000 0.166 0.000 0.000 0.000 0.000
## -------------------------------------------------------------------------------------
## 51 - 64 0 0 0 0 0 497 497
## 86.1 82.5 82.8 79.5 83.7 82.3
## 86.147 82.502 82.833 79.520 83.662 2088.336
## 0.000 0.000 0.000 0.000 0.000 1.000 0.166
## 0 0 0 0 0 1
## 0.000 0.000 0.000 0.000 0.000 0.166
## -------------------------------------------------------------------------------------
## 6 - 14 0 0 500 0 0 0 500
## 86.7 83.0 83.3 80.0 84.2 82.8
## 86.667 83.000 2083.333 80.000 84.167 82.833
## 0.000 0.000 1.000 0.000 0.000 0.000 0.167
## 0 0 1 0 0 0
## 0.000 0.000 0.167 0.000 0.000 0.000
## -------------------------------------------------------------------------------------
## 65+ 520 0 0 0 0 0 520
## 90.1 86.3 86.7 83.2 87.5 86.1
## 2050.133 86.320 86.667 83.200 87.533 86.147
## 1.000 0.000 0.000 0.000 0.000 0.000 0.173
## 1 0 0 0 0 0
## 0.173 0.000 0.000 0.000 0.000 0.000
## -------------------------------------------------------------------------------------
## Total 520 498 500 480 505 497 3000
## 0.173 0.166 0.167 0.160 0.168 0.166
## =====================================================================================
##
## Statistics for All Table Factors
##
## Pearson's Chi-squared test
## ------------------------------------------------------------
## Chi^2 = 15000 d.f. = 25 p <2e-16
Look at what we have again in SPSS FORMAT:
result_3B = CrossTable(Survey$Age, Survey$Country, prop.r=FALSE,prop.t=FALSE,prop.chisq=FALSE,format="SPSS")
print(result_3B)
## Cell Contents
## |-------------------------|
## | Count |
## | Column Percent |
## |-------------------------|
##
## ==============================================================================
## Survey$Country
## Survy$Ag Austria Botswana Ghana Nigeria Sth Afrc Untd Kng Total
## ------------------------------------------------------------------------------
## 0 - 5 0 0 0 480 0 0 480
## 0% 0% 0% 100% 0% 0%
## ------------------------------------------------------------------------------
## 15 - 24 0 0 0 0 505 0 505
## 0% 0% 0% 0% 100% 0%
## ------------------------------------------------------------------------------
## 25 - 50 0 498 0 0 0 0 498
## 0% 100% 0% 0% 0% 0%
## ------------------------------------------------------------------------------
## 51 - 64 0 0 0 0 0 497 497
## 0% 0% 0% 0% 0% 100%
## ------------------------------------------------------------------------------
## 6 - 14 0 0 500 0 0 0 500
## 0% 0% 100% 0% 0% 0%
## ------------------------------------------------------------------------------
## 65+ 520 0 0 0 0 0 520
## 100% 0% 0% 0% 0% 0%
## ------------------------------------------------------------------------------
## Total 520 498 500 480 505 497 3000
## 17.3% 16.6% 16.7% 16.0% 16.8% 16.6%
## ==============================================================================
Let’s look at Example below:
result_4 = CrossTable(Survey$Gender, Survey$Country, expected = TRUE, chisq = TRUE, prop.chisq = TRUE, dnn = c("Gender", "Country"))
print(result_4)
## Cell Contents
## |-------------------------|
## | N |
## | Expected N |
## | Chi-square contribution |
## | N / Row Total |
## | N / Col Total |
## | N / Table Total |
## |-------------------------|
##
## ================================================================================
## Country
## Gender Austria Botswana Ghana Nigeria South Afrc Untd Kngdm Total
## --------------------------------------------------------------------------------
## Female 271 235 268 253 246 269 1542
## 267.3 256.0 257.0 246.7 259.6 255.5
## 0.052 1.718 0.471 0.160 0.709 0.718
## 0.176 0.152 0.174 0.164 0.160 0.174 0.514
## 0.521 0.472 0.536 0.527 0.487 0.541
## 0.090 0.078 0.089 0.084 0.082 0.090
## --------------------------------------------------------------------------------
## Male 249 263 232 227 259 228 1458
## 252.7 242.0 243.0 233.3 245.4 241.5
## 0.055 1.817 0.498 0.169 0.750 0.759
## 0.171 0.180 0.159 0.156 0.178 0.156 0.486
## 0.479 0.528 0.464 0.473 0.513 0.459
## 0.083 0.088 0.077 0.076 0.086 0.076
## --------------------------------------------------------------------------------
## Total 520 498 500 480 505 497 3000
## 0.173 0.166 0.167 0.160 0.168 0.166
## ================================================================================
##
## Statistics for All Table Factors
##
## Pearson's Chi-squared test
## ------------------------------------------------------------
## Chi^2 = 7.876522 d.f. = 5 p = 0.163
The same results above could be in SPSS FORMAT in the following codes:
result_4B = CrossTable(Survey$Gender, Survey$Country, prop.r=FALSE,prop.t=FALSE,prop.chisq=FALSE,format="SPSS")
print(result_4B)
## Cell Contents
## |-------------------------|
## | Count |
## | Column Percent |
## |-------------------------|
##
## ==============================================================================
## Survey$Country
## Srvy$Gnd Austria Botswana Ghana Nigeria Sth Afrc Untd Kng Total
## ------------------------------------------------------------------------------
## Female 271 235 268 253 246 269 1542
## 52.1% 47.2% 53.6% 52.7% 48.7% 54.1%
## ------------------------------------------------------------------------------
## Male 249 263 232 227 259 228 1458
## 47.9% 52.8% 46.4% 47.3% 51.3% 45.9%
## ------------------------------------------------------------------------------
## Total 520 498 500 480 505 497 3000
## 17.3% 16.6% 16.7% 16.0% 16.8% 16.6%
## ==============================================================================
Another example below:
result_5 = CrossTable(Survey$Gender, Survey$Health_Status, expected = TRUE, chisq = TRUE, prop.chisq = TRUE, dnn = c("Gender", "Health_Status"))
print(result_5)
## Cell Contents
## |-------------------------|
## | N |
## | Expected N |
## | Chi-square contribution |
## | N / Row Total |
## | N / Col Total |
## | N / Table Total |
## |-------------------------|
##
## =======================================
## Health_Status
## Gender Fair Okay Poor Total
## ---------------------------------------
## Female 538 493 511 1542
## 524.3 515.0 502.7
## 0.359 0.942 0.137
## 0.349 0.320 0.331 0.514
## 0.527 0.492 0.522
## 0.179 0.164 0.170
## ---------------------------------------
## Male 482 509 467 1458
## 495.7 487.0 475.3
## 0.380 0.996 0.145
## 0.331 0.349 0.320 0.486
## 0.473 0.508 0.478
## 0.161 0.170 0.156
## ---------------------------------------
## Total 1020 1002 978 3000
## 0.340 0.334 0.326
## =======================================
##
## Statistics for All Table Factors
##
## Pearson's Chi-squared test
## ------------------------------------------------------------
## Chi^2 = 2.959869 d.f. = 2 p = 0.228
We can still have it in SPSS FORMAT as follows:
result_5B = CrossTable(Survey$Gender, Survey$Health_Status,prop.r=FALSE,prop.t=FALSE,prop.chisq=FALSE,format="SPSS")
print(result_5B)
## Cell Contents
## |-------------------------|
## | Count |
## | Column Percent |
## |-------------------------|
##
## ==============================================
## Survey$Health_Status
## Survey$Gender Fair Okay Poor Total
## ----------------------------------------------
## Female 538 493 511 1542
## 52.7% 49.2% 52.2%
## ----------------------------------------------
## Male 482 509 467 1458
## 47.3% 50.8% 47.8%
## ----------------------------------------------
## Total 1020 1002 978 3000
## 34.0% 33.4% 32.6%
## ==============================================
####EXAMPLE 2:####
data(esoph, package = "datasets")
View(esoph)
names(esoph)
## [1] "agegp" "alcgp" "tobgp" "ncases" "ncontrols"
result_6 = CrossTable(esoph$alcgp, esoph$agegp, expected = TRUE, chisq = TRUE, prop.chisq = TRUE, dnn = c("Alcohol consumption", "Tobacco consumption"))
## Warning in chisq.test(tab, correct = FALSE, ...): Chi-squared approximation may
## be incorrect
print(result_6)
## Cell Contents
## |-------------------------|
## | N |
## | Expected N |
## | Chi-square contribution |
## | N / Row Total |
## | N / Col Total |
## | N / Table Total |
## |-------------------------|
##
## ============================================================================
## Tobacco consumption
## Alcohol consumption 25-34 35-44 45-54 55-64 65-74 75+ Total
## ----------------------------------------------------------------------------
## 0-39g/day 4 4 4 4 4 3 23
## 3.9 3.9 4.2 4.2 3.9 2.9
## 0.002 0.002 0.008 0.008 0.002 0.005
## 0.174 0.174 0.174 0.174 0.174 0.130 0.261
## 0.267 0.267 0.250 0.250 0.267 0.273
## 0.045 0.045 0.045 0.045 0.045 0.034
## ----------------------------------------------------------------------------
## 40-79 4 4 4 4 3 4 23
## 3.9 3.9 4.2 4.2 3.9 2.9
## 0.002 0.002 0.008 0.008 0.216 0.440
## 0.174 0.174 0.174 0.174 0.130 0.174 0.261
## 0.267 0.267 0.250 0.250 0.200 0.364
## 0.045 0.045 0.045 0.045 0.034 0.045
## ----------------------------------------------------------------------------
## 80-119 3 4 4 4 4 2 21
## 3.6 3.6 3.8 3.8 3.6 2.6
## 0.094 0.049 0.009 0.009 0.049 0.149
## 0.143 0.190 0.190 0.190 0.190 0.095 0.239
## 0.200 0.267 0.250 0.250 0.267 0.182
## 0.034 0.045 0.045 0.045 0.045 0.023
## ----------------------------------------------------------------------------
## 120+ 4 3 4 4 4 2 21
## 3.6 3.6 3.8 3.8 3.6 2.6
## 0.049 0.094 0.009 0.009 0.049 0.149
## 0.190 0.143 0.190 0.190 0.190 0.095 0.239
## 0.267 0.200 0.250 0.250 0.267 0.182
## 0.045 0.034 0.045 0.045 0.045 0.023
## ----------------------------------------------------------------------------
## Total 15 15 16 16 15 11 88
## 0.170 0.170 0.182 0.182 0.170 0.125
## ============================================================================
##
## Statistics for All Table Factors
##
## Pearson's Chi-squared test
## ------------------------------------------------------------
## Chi^2 = 1.41891 d.f. = 15 p = 1
#### EXAMPLE 3: ####
set.seed(234)
sex = factor(c(rep("F", 900), rep("M", 900)))
income = 100 * (rnorm(1800) + 5)
weight = rep(1, 1800)
weight[sex == "F" & income > 500] = 3
View(weight)
attr(income, "label") = "Income"
attr(sex, "label") = "Sex"
compmeans(income, sex, col = "lightgray", ylab = "income", xlab = "sex")
## Mean value of "Income" according to "Sex"
## Mean N Std. Dev.
## F 497.6180 900 96.95414
## M 503.7893 900 98.07035
## Total 500.7036 1800 97.53558
comp = compmeans(income, sex, weight, plot = FALSE)
plot(comp, col = c("orange", "lightblue"), ylab = "income", xlab = "sex")
library(dplyr)
library(ggplot2)
library(reshape2)
##
## Attaching package: 'reshape2'
## The following object is masked from 'package:tidyr':
##
## smiths
library(DescTools)
## Registered S3 method overwritten by 'DescTools':
## method from
## reorder.factor gdata
NB: Some of the libraries above are already embedded in the tidyverse package.
Create contingency table for Sex Disposition
sex=as.table(matrix(c(33,28,11,2,4,3,117,101,41,4,17,8),ncol=2,nrow=6,dimnames=list(c("Guma","Logo","Barkin-Ladi","Bokkos","Mangu","Riyom"),c("Male","Female"))))
print(sex)
## Male Female
## Guma 33 117
## Logo 28 101
## Barkin-Ladi 11 41
## Bokkos 2 4
## Mangu 4 17
## Riyom 3 8
sex_bar=melt(sex, varnames=c("LGA","Sex"), id.vars="States")
print(sex_bar)
## LGA Sex value
## 1 Guma Male 33
## 2 Logo Male 28
## 3 Barkin-Ladi Male 11
## 4 Bokkos Male 2
## 5 Mangu Male 4
## 6 Riyom Male 3
## 7 Guma Female 117
## 8 Logo Female 101
## 9 Barkin-Ladi Female 41
## 10 Bokkos Female 4
## 11 Mangu Female 17
## 12 Riyom Female 8
ggplot(sex_bar %>% group_by(LGA) %>% mutate(Percentage = round(value/sum(value),4)),
aes(x=LGA, y = Percentage,fill = Sex, cumulative = TRUE))+geom_col() +
geom_text(aes(label = paste0(Percentage*100,"%")),
position = position_stack(vjust = 0.5))
Row Percentages
PercTable(sex, row.vars = NULL, col.vars = NULL, justify = "right",
freq = TRUE, rfrq = "010", expected = FALSE, residuals = FALSE,
stdres = FALSE, margins = c(1,2), digits = 2)
##
## Male Female Sum
##
## Guma freq 33 117 150
## p.row 22.00% 78.00% .
##
## Logo freq 28 101 129
## p.row 21.71% 78.29% .
##
## Barkin-Ladi freq 11 41 52
## p.row 21.15% 78.85% .
##
## Bokkos freq 2 4 6
## p.row 33.33% 66.67% .
##
## Mangu freq 4 17 21
## p.row 19.05% 80.95% .
##
## Riyom freq 3 8 11
## p.row 27.27% 72.73% .
##
## Sum freq 81 288 369
## p.row 21.95% 78.05% .
##
Column Percentages
PercTable(sex, row.vars = NULL, col.vars = NULL, justify = "right",
freq = TRUE, rfrq = "001", expected = FALSE, residuals = FALSE,
stdres = FALSE, margins = c(1,2), digits = 2)
##
## Male Female Sum
##
## Guma freq 33 117 150
## p.col 40.74% 40.62% 40.65%
##
## Logo freq 28 101 129
## p.col 34.57% 35.07% 34.96%
##
## Barkin-Ladi freq 11 41 52
## p.col 13.58% 14.24% 14.09%
##
## Bokkos freq 2 4 6
## p.col 2.47% 1.39% 1.63%
##
## Mangu freq 4 17 21
## p.col 4.94% 5.90% 5.69%
##
## Riyom freq 3 8 11
## p.col 3.70% 2.78% 2.98%
##
## Sum freq 81 288 369
## p.col . . .
##
Expected Frequencies and Residuals
PercTable(sex, row.vars = NULL, col.vars = NULL, justify = "right",
freq = TRUE, rfrq = "000", expected = TRUE, residuals = TRUE,
stdres = FALSE, margins = c(1,2), digits = NULL)
##
## Male Female Sum
##
## Guma freq 33 117 150
## exp 32.927 117.073 .
## res 0.013 -0.007 .
##
## Logo freq 28 101 129
## exp 28.317 100.683 .
## res -0.060 0.032 .
##
## Barkin-Ladi freq 11 41 52
## exp 11.415 40.585 .
## res -0.123 0.065 .
##
## Bokkos freq 2 4 6
## exp 1.317 4.683 .
## res 0.595 -0.316 .
##
## Mangu freq 4 17 21
## exp 4.610 16.390 .
## res -0.284 0.151 .
##
## Riyom freq 3 8 11
## exp 2.415 8.585 .
## res 0.377 -0.200 .
##
## Sum freq 81 288 369
## exp . . .
## res . . .
##
chi-square test
chisq.test(sex, correct=FALSE)
## Warning in chisq.test(sex, correct = FALSE): Chi-squared approximation may be
## incorrect
##
## Pearson's Chi-squared test
##
## data: sex
## X-squared = 0.76292, df = 5, p-value = 0.9793
Create contingency table for Age Classification
age=as.table(matrix(c(19,11,7,1,3,1,131,118,45,5,18,10),ncol=2,nrow=6,dimnames=list(c("Guma","Logo","Barkin-Ladi","Bokkos","Mangu","Riyom"),c("(10-17) Years","(18-Above) Years"))))
print(age)
## (10-17) Years (18-Above) Years
## Guma 19 131
## Logo 11 118
## Barkin-Ladi 7 45
## Bokkos 1 5
## Mangu 3 18
## Riyom 1 10
age_bar=melt(age, varnames=c("LGA","Age"))
print(age_bar)
## LGA Age value
## 1 Guma (10-17) Years 19
## 2 Logo (10-17) Years 11
## 3 Barkin-Ladi (10-17) Years 7
## 4 Bokkos (10-17) Years 1
## 5 Mangu (10-17) Years 3
## 6 Riyom (10-17) Years 1
## 7 Guma (18-Above) Years 131
## 8 Logo (18-Above) Years 118
## 9 Barkin-Ladi (18-Above) Years 45
## 10 Bokkos (18-Above) Years 5
## 11 Mangu (18-Above) Years 18
## 12 Riyom (18-Above) Years 10
ggplot(age_bar %>% group_by(LGA) %>% mutate(Percentage = round(value/sum(value),4)),aes(x=LGA, y = Percentage,fill = Age, cumulative = TRUE))+geom_col()+geom_text(aes(label = paste0(Percentage*100,"%")),position = position_stack(vjust = 0.5))
Row Percentages
PercTable(age, row.vars = NULL, col.vars = NULL, justify = "right",
freq = TRUE, rfrq = "010", expected = FALSE, residuals = FALSE,
stdres = FALSE, margins = c(1,2), digits = 2)
##
## (10-17) Years (18-Above) Years Sum
##
## Guma freq 19 131 150
## p.row 12.67% 87.33% .
##
## Logo freq 11 118 129
## p.row 8.53% 91.47% .
##
## Barkin-Ladi freq 7 45 52
## p.row 13.46% 86.54% .
##
## Bokkos freq 1 5 6
## p.row 16.67% 83.33% .
##
## Mangu freq 3 18 21
## p.row 14.29% 85.71% .
##
## Riyom freq 1 10 11
## p.row 9.09% 90.91% .
##
## Sum freq 42 327 369
## p.row 11.38% 88.62% .
##
Column Percentages
PercTable(age, row.vars = NULL, col.vars = NULL, justify = "right",
freq = TRUE, rfrq = "001", expected = FALSE, residuals = FALSE,
stdres = FALSE, margins = c(1,2), digits = 2)
##
## (10-17) Years (18-Above) Years Sum
##
## Guma freq 19 131 150
## p.col 45.24% 40.06% 40.65%
##
## Logo freq 11 118 129
## p.col 26.19% 36.09% 34.96%
##
## Barkin-Ladi freq 7 45 52
## p.col 16.67% 13.76% 14.09%
##
## Bokkos freq 1 5 6
## p.col 2.38% 1.53% 1.63%
##
## Mangu freq 3 18 21
## p.col 7.14% 5.50% 5.69%
##
## Riyom freq 1 10 11
## p.col 2.38% 3.06% 2.98%
##
## Sum freq 42 327 369
## p.col . . .
##
Expected Frequencies and Residuals
PercTable(age, row.vars = NULL, col.vars = NULL, justify = "right",
freq = TRUE, rfrq = "000", expected = TRUE, residuals = TRUE,
stdres = FALSE, margins = c(1,2), digits = NULL)
##
## (10-17) Years (18-Above) Years Sum
##
## Guma freq 19 131 150
## exp 17.073 132.927 .
## res 0.466 -0.167 .
##
## Logo freq 11 118 129
## exp 14.683 114.317 .
## res -0.961 0.344 .
##
## Barkin-Ladi freq 7 45 52
## exp 5.919 46.081 .
## res 0.444 -0.159 .
##
## Bokkos freq 1 5 6
## exp 0.683 5.317 .
## res 0.384 -0.138 .
##
## Mangu freq 3 18 21
## exp 2.390 18.610 .
## res 0.394 -0.141 .
##
## Riyom freq 1 10 11
## exp 1.252 9.748 .
## res -0.225 0.081 .
##
## Sum freq 42 327 369
## exp . . .
## res . . .
##
chi-square test
chisq.test(age, correct=FALSE)
## Warning in chisq.test(age, correct = FALSE): Chi-squared approximation may be
## incorrect
##
## Pearson's Chi-squared test
##
## data: age
## X-squared = 1.9096, df = 5, p-value = 0.8615
Create contingency table for Dialect Classification
dialect=as.table(matrix(c(136,63,0,0,0,0,12,4,0,0,0,0,2,42,0,0,0,0,0,0,48,0,0,8,0,0,0,6,0,0,0,0,0,0,19,0,0,20,4,0,2,3),ncol=7,nrow=6,dimnames=list(c("Guma","Logo","Barkin-Ladi","Bokkos","Mangu","Riyom"),c("Tiv","Wapan","Idoma","Berom","Ron","Mwaghavul","Others"))))
print(dialect)
## Tiv Wapan Idoma Berom Ron Mwaghavul Others
## Guma 136 12 2 0 0 0 0
## Logo 63 4 42 0 0 0 20
## Barkin-Ladi 0 0 0 48 0 0 4
## Bokkos 0 0 0 0 6 0 0
## Mangu 0 0 0 0 0 19 2
## Riyom 0 0 0 8 0 0 3
dialect_bar=melt(dialect, varnames=c("LGA","Dialects"))
print(dialect_bar)
## LGA Dialects value
## 1 Guma Tiv 136
## 2 Logo Tiv 63
## 3 Barkin-Ladi Tiv 0
## 4 Bokkos Tiv 0
## 5 Mangu Tiv 0
## 6 Riyom Tiv 0
## 7 Guma Wapan 12
## 8 Logo Wapan 4
## 9 Barkin-Ladi Wapan 0
## 10 Bokkos Wapan 0
## 11 Mangu Wapan 0
## 12 Riyom Wapan 0
## 13 Guma Idoma 2
## 14 Logo Idoma 42
## 15 Barkin-Ladi Idoma 0
## 16 Bokkos Idoma 0
## 17 Mangu Idoma 0
## 18 Riyom Idoma 0
## 19 Guma Berom 0
## 20 Logo Berom 0
## 21 Barkin-Ladi Berom 48
## 22 Bokkos Berom 0
## 23 Mangu Berom 0
## 24 Riyom Berom 8
## 25 Guma Ron 0
## 26 Logo Ron 0
## 27 Barkin-Ladi Ron 0
## 28 Bokkos Ron 6
## 29 Mangu Ron 0
## 30 Riyom Ron 0
## 31 Guma Mwaghavul 0
## 32 Logo Mwaghavul 0
## 33 Barkin-Ladi Mwaghavul 0
## 34 Bokkos Mwaghavul 0
## 35 Mangu Mwaghavul 19
## 36 Riyom Mwaghavul 0
## 37 Guma Others 0
## 38 Logo Others 20
## 39 Barkin-Ladi Others 4
## 40 Bokkos Others 0
## 41 Mangu Others 2
## 42 Riyom Others 3
ggplot(dialect_bar %>% group_by(LGA) %>% mutate(Percentage = round(value/sum(value),4)),
aes(x=LGA, y = Percentage,fill = Dialects, cumulative = TRUE))+geom_col()+geom_text(aes(label = paste0(Percentage*100,"%")),
position = position_stack(vjust = 0.5))
Row Percentages
PercTable(dialect, row.vars = NULL, col.vars = NULL, justify = "right",
freq = TRUE, rfrq = "010", expected = FALSE, residuals = FALSE,
stdres = FALSE, margins = c(1,2), digits = 2)
##
## Tiv Wapan Idoma Berom Ron Mwaghavul
##
## Guma freq 136 12 2 0 0 0
## p.row 90.67% 8.00% 1.33% 0.00% 0.00% 0.00%
##
## Logo freq 63 4 42 0 0 0
## p.row 48.84% 3.10% 32.56% 0.00% 0.00% 0.00%
##
## Barkin-Ladi freq 0 0 0 48 0 0
## p.row 0.00% 0.00% 0.00% 92.31% 0.00% 0.00%
##
## Bokkos freq 0 0 0 0 6 0
## p.row 0.00% 0.00% 0.00% 0.00% 100.00% 0.00%
##
## Mangu freq 0 0 0 0 0 19
## p.row 0.00% 0.00% 0.00% 0.00% 0.00% 90.48%
##
## Riyom freq 0 0 0 8 0 0
## p.row 0.00% 0.00% 0.00% 72.73% 0.00% 0.00%
##
## Sum freq 199 16 44 56 6 19
## p.row 53.93% 4.34% 11.92% 15.18% 1.63% 5.15%
##
##
## Others Sum
##
## Guma freq 0 150
## p.row 0.00% .
##
## Logo freq 20 129
## p.row 15.50% .
##
## Barkin-Ladi freq 4 52
## p.row 7.69% .
##
## Bokkos freq 0 6
## p.row 0.00% .
##
## Mangu freq 2 21
## p.row 9.52% .
##
## Riyom freq 3 11
## p.row 27.27% .
##
## Sum freq 29 369
## p.row 7.86% .
##
Column Percentages
PercTable(age, row.vars = NULL, col.vars = NULL, justify = "right",
freq = TRUE, rfrq = "001", expected = FALSE, residuals = FALSE,
stdres = FALSE, margins = c(1,2), digits = 2)
##
## (10-17) Years (18-Above) Years Sum
##
## Guma freq 19 131 150
## p.col 45.24% 40.06% 40.65%
##
## Logo freq 11 118 129
## p.col 26.19% 36.09% 34.96%
##
## Barkin-Ladi freq 7 45 52
## p.col 16.67% 13.76% 14.09%
##
## Bokkos freq 1 5 6
## p.col 2.38% 1.53% 1.63%
##
## Mangu freq 3 18 21
## p.col 7.14% 5.50% 5.69%
##
## Riyom freq 1 10 11
## p.col 2.38% 3.06% 2.98%
##
## Sum freq 42 327 369
## p.col . . .
##
Expected Frequencies and Residuals
PercTable(age, row.vars = NULL, col.vars = NULL, justify = "right",
freq = TRUE, rfrq = "000", expected = TRUE, residuals = TRUE,
stdres = FALSE, margins = c(1,2), digits = NULL)
##
## (10-17) Years (18-Above) Years Sum
##
## Guma freq 19 131 150
## exp 17.073 132.927 .
## res 0.466 -0.167 .
##
## Logo freq 11 118 129
## exp 14.683 114.317 .
## res -0.961 0.344 .
##
## Barkin-Ladi freq 7 45 52
## exp 5.919 46.081 .
## res 0.444 -0.159 .
##
## Bokkos freq 1 5 6
## exp 0.683 5.317 .
## res 0.384 -0.138 .
##
## Mangu freq 3 18 21
## exp 2.390 18.610 .
## res 0.394 -0.141 .
##
## Riyom freq 1 10 11
## exp 1.252 9.748 .
## res -0.225 0.081 .
##
## Sum freq 42 327 369
## exp . . .
## res . . .
##
chi-square test
chisq.test(age, correct=FALSE)
## Warning in chisq.test(age, correct = FALSE): Chi-squared approximation may be
## incorrect
##
## Pearson's Chi-squared test
##
## data: age
## X-squared = 1.9096, df = 5, p-value = 0.8615
Create contingency table for Marital Status
marital=as.table(matrix(c(56,34,8,0,2,4,7,9,6,0,1,0,20,14,12,1,4,2,49,57,16,2,11,3,18,15,10,3,3,2),ncol=5,nrow=6,dimnames=list(c("Guma","Logo","Barkin-Ladi","Bokkos","Mangu","Riyom"),c("Married","Widowed","Divorced","Separated","Single"))))
print(marital)
## Married Widowed Divorced Separated Single
## Guma 56 7 20 49 18
## Logo 34 9 14 57 15
## Barkin-Ladi 8 6 12 16 10
## Bokkos 0 0 1 2 3
## Mangu 2 1 4 11 3
## Riyom 4 0 2 3 2
marital_bar=melt(marital, varnames=c("LGA","Marital_Status"))
print(marital_bar)
## LGA Marital_Status value
## 1 Guma Married 56
## 2 Logo Married 34
## 3 Barkin-Ladi Married 8
## 4 Bokkos Married 0
## 5 Mangu Married 2
## 6 Riyom Married 4
## 7 Guma Widowed 7
## 8 Logo Widowed 9
## 9 Barkin-Ladi Widowed 6
## 10 Bokkos Widowed 0
## 11 Mangu Widowed 1
## 12 Riyom Widowed 0
## 13 Guma Divorced 20
## 14 Logo Divorced 14
## 15 Barkin-Ladi Divorced 12
## 16 Bokkos Divorced 1
## 17 Mangu Divorced 4
## 18 Riyom Divorced 2
## 19 Guma Separated 49
## 20 Logo Separated 57
## 21 Barkin-Ladi Separated 16
## 22 Bokkos Separated 2
## 23 Mangu Separated 11
## 24 Riyom Separated 3
## 25 Guma Single 18
## 26 Logo Single 15
## 27 Barkin-Ladi Single 10
## 28 Bokkos Single 3
## 29 Mangu Single 3
## 30 Riyom Single 2
ggplot(marital_bar %>% group_by(LGA) %>% mutate(Percentage = round(value/sum(value),4)),
aes(x=LGA, y = Percentage,fill = Marital_Status, cumulative = TRUE))+geom_col()+geom_text(aes(label = paste0(Percentage*100,"%")),
position = position_stack(vjust = 0.5))
Row Percentages
PercTable(marital, row.vars = NULL, col.vars = NULL, justify = "right",
freq = TRUE, rfrq = "010", expected = FALSE, residuals = FALSE,
stdres = FALSE, margins = c(1,2), digits = 2)
##
## Married Widowed Divorced Separated Single Sum
##
## Guma freq 56 7 20 49 18 150
## p.row 37.33% 4.67% 13.33% 32.67% 12.00% .
##
## Logo freq 34 9 14 57 15 129
## p.row 26.36% 6.98% 10.85% 44.19% 11.63% .
##
## Barkin-Ladi freq 8 6 12 16 10 52
## p.row 15.38% 11.54% 23.08% 30.77% 19.23% .
##
## Bokkos freq 0 0 1 2 3 6
## p.row 0.00% 0.00% 16.67% 33.33% 50.00% .
##
## Mangu freq 2 1 4 11 3 21
## p.row 9.52% 4.76% 19.05% 52.38% 14.29% .
##
## Riyom freq 4 0 2 3 2 11
## p.row 36.36% 0.00% 18.18% 27.27% 18.18% .
##
## Sum freq 104 23 53 138 51 369
## p.row 28.18% 6.23% 14.36% 37.40% 13.82% .
##
Column Percentages
PercTable(marital, row.vars = NULL, col.vars = NULL, justify = "right",
freq = TRUE, rfrq = "001", expected = FALSE, residuals = FALSE,
stdres = FALSE, margins = c(1,2), digits = 2)
##
## Married Widowed Divorced Separated Single Sum
##
## Guma freq 56 7 20 49 18 150
## p.col 53.85% 30.43% 37.74% 35.51% 35.29% 40.65%
##
## Logo freq 34 9 14 57 15 129
## p.col 32.69% 39.13% 26.42% 41.30% 29.41% 34.96%
##
## Barkin-Ladi freq 8 6 12 16 10 52
## p.col 7.69% 26.09% 22.64% 11.59% 19.61% 14.09%
##
## Bokkos freq 0 0 1 2 3 6
## p.col 0.00% 0.00% 1.89% 1.45% 5.88% 1.63%
##
## Mangu freq 2 1 4 11 3 21
## p.col 1.92% 4.35% 7.55% 7.97% 5.88% 5.69%
##
## Riyom freq 4 0 2 3 2 11
## p.col 3.85% 0.00% 3.77% 2.17% 3.92% 2.98%
##
## Sum freq 104 23 53 138 51 369
## p.col . . . . . .
##
Expected Frequencies and Residuals
PercTable(marital, row.vars = NULL, col.vars = NULL, justify = "right",
freq = TRUE, rfrq = "000", expected = TRUE, residuals = TRUE,
stdres = FALSE, margins = c(1,2), digits = NULL)
##
## Married Widowed Divorced Separated Single Sum
##
## Guma freq 56 7 20 49 18 150
## exp 42.276 9.350 21.545 56.098 20.732 .
## res 2.111 -0.768 -0.333 -0.948 -0.600 .
##
## Logo freq 34 9 14 57 15 129
## exp 36.358 8.041 18.528 48.244 17.829 .
## res -0.391 0.338 -1.052 1.261 -0.670 .
##
## Barkin-Ladi freq 8 6 12 16 10 52
## exp 14.656 3.241 7.469 19.447 7.187 .
## res -1.739 1.532 1.658 -0.782 1.049 .
##
## Bokkos freq 0 0 1 2 3 6
## exp 1.691 0.374 0.862 2.244 0.829 .
## res -1.300 -0.612 0.149 -0.163 2.384 .
##
## Mangu freq 2 1 4 11 3 21
## exp 5.919 1.309 3.016 7.854 2.902 .
## res -1.611 -0.270 0.566 1.123 0.057 .
##
## Riyom freq 4 0 2 3 2 11
## exp 3.100 0.686 1.580 4.114 1.520 .
## res 0.511 -0.828 0.334 -0.549 0.389 .
##
## Sum freq 104 23 53 138 51 369
## exp . . . . . .
## res . . . . . .
##
chi-square test
chisq.test(marital, correct=FALSE)
## Warning in chisq.test(marital, correct = FALSE): Chi-squared approximation may
## be incorrect
##
## Pearson's Chi-squared test
##
## data: marital
## X-squared = 33.218, df = 20, p-value = 0.03193
Create contingency table for Educational Status
educational=as.table(matrix(c(134,116,48,6,15,9,10,9,4,0,3,0,2,1,0,0,0,1,4,3,0,0,3,1,0,0,0,0,0,0),ncol=5,nrow=6,dimnames=list(c("Guma","Logo","Barkin-Ladi","Bokkos","Mangu","Riyom"),c("No Formal Education","Primary Education","Secondary Education","Adult/Vocational Education","Tertiary Education"))))
print(educational)
## No Formal Education Primary Education Secondary Education
## Guma 134 10 2
## Logo 116 9 1
## Barkin-Ladi 48 4 0
## Bokkos 6 0 0
## Mangu 15 3 0
## Riyom 9 0 1
## Adult/Vocational Education Tertiary Education
## Guma 4 0
## Logo 3 0
## Barkin-Ladi 0 0
## Bokkos 0 0
## Mangu 3 0
## Riyom 1 0
educational_bar=melt(educational, varnames=c("LGA","Educational_Status"))
print(educational_bar)
## LGA Educational_Status value
## 1 Guma No Formal Education 134
## 2 Logo No Formal Education 116
## 3 Barkin-Ladi No Formal Education 48
## 4 Bokkos No Formal Education 6
## 5 Mangu No Formal Education 15
## 6 Riyom No Formal Education 9
## 7 Guma Primary Education 10
## 8 Logo Primary Education 9
## 9 Barkin-Ladi Primary Education 4
## 10 Bokkos Primary Education 0
## 11 Mangu Primary Education 3
## 12 Riyom Primary Education 0
## 13 Guma Secondary Education 2
## 14 Logo Secondary Education 1
## 15 Barkin-Ladi Secondary Education 0
## 16 Bokkos Secondary Education 0
## 17 Mangu Secondary Education 0
## 18 Riyom Secondary Education 1
## 19 Guma Adult/Vocational Education 4
## 20 Logo Adult/Vocational Education 3
## 21 Barkin-Ladi Adult/Vocational Education 0
## 22 Bokkos Adult/Vocational Education 0
## 23 Mangu Adult/Vocational Education 3
## 24 Riyom Adult/Vocational Education 1
## 25 Guma Tertiary Education 0
## 26 Logo Tertiary Education 0
## 27 Barkin-Ladi Tertiary Education 0
## 28 Bokkos Tertiary Education 0
## 29 Mangu Tertiary Education 0
## 30 Riyom Tertiary Education 0
ggplot(educational_bar %>% group_by(LGA) %>% mutate(Percentage = round(value/sum(value),4)),
aes(x=LGA, y = Percentage,fill = Educational_Status, cumulative = TRUE))+geom_col()+geom_text(aes(label = paste0(Percentage*100,"%")),
position = position_stack(vjust = 0.5))
Row Percentages
PercTable(educational, row.vars = NULL, col.vars = NULL, justify = "right",
freq = TRUE, rfrq = "010", expected = FALSE, residuals = FALSE,
stdres = FALSE, margins = c(1,2), digits = 2)
##
## No Formal Education Primary Education
##
## Guma freq 134 10
## p.row 89.33% 6.67%
##
## Logo freq 116 9
## p.row 89.92% 6.98%
##
## Barkin-Ladi freq 48 4
## p.row 92.31% 7.69%
##
## Bokkos freq 6 0
## p.row 100.00% 0.00%
##
## Mangu freq 15 3
## p.row 71.43% 14.29%
##
## Riyom freq 9 0
## p.row 81.82% 0.00%
##
## Sum freq 328 26
## p.row 88.89% 7.05%
##
##
## Secondary Education Adult/Vocational Education
##
## Guma freq 2 4
## p.row 1.33% 2.67%
##
## Logo freq 1 3
## p.row 0.78% 2.33%
##
## Barkin-Ladi freq 0 0
## p.row 0.00% 0.00%
##
## Bokkos freq 0 0
## p.row 0.00% 0.00%
##
## Mangu freq 0 3
## p.row 0.00% 14.29%
##
## Riyom freq 1 1
## p.row 9.09% 9.09%
##
## Sum freq 4 11
## p.row 1.08% 2.98%
##
##
## Tertiary Education Sum
##
## Guma freq 0 150
## p.row 0.00% .
##
## Logo freq 0 129
## p.row 0.00% .
##
## Barkin-Ladi freq 0 52
## p.row 0.00% .
##
## Bokkos freq 0 6
## p.row 0.00% .
##
## Mangu freq 0 21
## p.row 0.00% .
##
## Riyom freq 0 11
## p.row 0.00% .
##
## Sum freq 0 369
## p.row 0.00% .
##
Column Percentages
PercTable(educational, row.vars = NULL, col.vars = NULL, justify = "right",
freq = TRUE, rfrq = "001", expected = FALSE, residuals = FALSE,
stdres = FALSE, margins = c(1,2), digits = 2)
##
## No Formal Education Primary Education
##
## Guma freq 134 10
## p.col 40.85% 38.46%
##
## Logo freq 116 9
## p.col 35.37% 34.62%
##
## Barkin-Ladi freq 48 4
## p.col 14.63% 15.38%
##
## Bokkos freq 6 0
## p.col 1.83% 0.00%
##
## Mangu freq 15 3
## p.col 4.57% 11.54%
##
## Riyom freq 9 0
## p.col 2.74% 0.00%
##
## Sum freq 328 26
## p.col . .
##
##
## Secondary Education Adult/Vocational Education
##
## Guma freq 2 4
## p.col 50.00% 36.36%
##
## Logo freq 1 3
## p.col 25.00% 27.27%
##
## Barkin-Ladi freq 0 0
## p.col 0.00% 0.00%
##
## Bokkos freq 0 0
## p.col 0.00% 0.00%
##
## Mangu freq 0 3
## p.col 0.00% 27.27%
##
## Riyom freq 1 1
## p.col 25.00% 9.09%
##
## Sum freq 4 11
## p.col . .
##
##
## Tertiary Education Sum
##
## Guma freq 0 150
## p.col NA 40.65%
##
## Logo freq 0 129
## p.col NA 34.96%
##
## Barkin-Ladi freq 0 52
## p.col NA 14.09%
##
## Bokkos freq 0 6
## p.col NA 1.63%
##
## Mangu freq 0 21
## p.col NA 5.69%
##
## Riyom freq 0 11
## p.col NA 2.98%
##
## Sum freq 0 369
## p.col . .
##
Expected Frequencies and Residuals
PercTable(educational, row.vars = NULL, col.vars = NULL, justify = "right",
freq = TRUE, rfrq = "000", expected = TRUE, residuals = TRUE,
stdres = FALSE, margins = c(1,2), digits = NULL)
##
## No Formal Education Primary Education
##
## Guma freq 134 10
## exp 133.333 10.569
## res 0.058 -0.175
##
## Logo freq 116 9
## exp 114.667 9.089
## res 0.125 -0.030
##
## Barkin-Ladi freq 48 4
## exp 46.222 3.664
## res 0.261 0.176
##
## Bokkos freq 6 0
## exp 5.333 0.423
## res 0.289 -0.650
##
## Mangu freq 15 3
## exp 18.667 1.480
## res -0.849 1.250
##
## Riyom freq 9 0
## exp 9.778 0.775
## res -0.249 -0.880
##
## Sum freq 328 26
## exp . .
## res . .
##
##
## Secondary Education Adult/Vocational Education
##
## Guma freq 2 4
## exp 1.626 4.472
## res 0.293 -0.223
##
## Logo freq 1 3
## exp 1.398 3.846
## res -0.337 -0.431
##
## Barkin-Ladi freq 0 0
## exp 0.564 1.550
## res -0.751 -1.245
##
## Bokkos freq 0 0
## exp 0.065 0.179
## res -0.255 -0.423
##
## Mangu freq 0 3
## exp 0.228 0.626
## res -0.477 3.000
##
## Riyom freq 1 1
## exp 0.119 0.328
## res 2.551 1.174
##
## Sum freq 4 11
## exp . .
## res . .
##
##
## Tertiary Education Sum
##
## Guma freq 0 150
## exp 0.000 .
## res NA .
##
## Logo freq 0 129
## exp 0.000 .
## res NA .
##
## Barkin-Ladi freq 0 52
## exp 0.000 .
## res NA .
##
## Bokkos freq 0 6
## exp 0.000 .
## res NA .
##
## Mangu freq 0 21
## exp 0.000 .
## res NA .
##
## Riyom freq 0 11
## exp 0.000 .
## res NA .
##
## Sum freq 0 369
## exp . .
## res . .
##
chi-square test
chisq.test(educational, correct=FALSE)
## Warning in chisq.test(educational, correct = FALSE): Chi-squared approximation
## may be incorrect
##
## Pearson's Chi-squared test
##
## data: educational
## X-squared = NaN, df = 20, p-value = NA
Best wishes from:
Timothy A. OGUNLEYE Department of Statistics, Faculty of Basic and Applied Sciences, Osun State University, Osogbo, Nigeria Official Email: timothy.ogunleye@uniosun.edu.ng Personal Email: thompsondx@gmail.com