1. Required Packages for this Analysis

# CODE TO LOAD PACKAGES
library(tidyverse)
library(AER)


2. Dataset

I have chosen to work with the AFFAIRS data set from the AES package.


3. Description:

Infidelity data, known as Fair’s Affairs. Cross-section data from a survey conducted by Psychology Today in 1969.


4. First 10 Rows

The first 10 rows of the dataset look like the following.

# CODE TO DISPLAY FIRST 10 ROWS
data("Affairs")
head(Affairs,10)
##    affairs gender age yearsmarried children religiousness education occupation
## 4        0   male  37        10.00       no             3        18          7
## 5        0 female  27         4.00       no             4        14          6
## 11       0 female  32        15.00      yes             1        12          1
## 16       0   male  57        15.00      yes             5        18          6
## 23       0   male  22         0.75       no             2        17          6
## 29       0 female  32         1.50       no             2        17          5
## 44       0 female  22         0.75       no             2        12          1
## 45       0   male  57        15.00      yes             2        14          4
## 47       0 female  32        15.00      yes             4        16          1
## 49       0   male  22         1.50       no             4        14          4
##    rating
## 4       4
## 5       4
## 11      4
## 16      5
## 23      3
## 29      5
## 44      3
## 45      4
## 47      2
## 49      5


5. Number of Observations and Variables

# CODE TO DISPLAY NUMBER OF OBSERVATIONS
nrow(Affairs)
## [1] 601
# CODE TO DISPLAY NUMBER OF VARIABLES
ncol(Affairs)
## [1] 9


6. View of types of data

Below is a visual of the types of data in my dataset.

# CODE TO GENERATE VIEW OF DATA TYPES
library(visdat)
vis_dat(Affairs)


Numeric Variables

7. Basic Statistcs

Below are the basic descriptive statistics of the data variable.

# CODE TO GENERATE BASIC DECRIPTIVE STATISTICS

mean(Affairs$age)
## [1] 32.48752
sd(Affairs$age)
## [1] 9.288762
min(Affairs$age)
## [1] 17.5
max(Affairs$age)
## [1] 57
summary(Affairs$age)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   17.50   27.00   32.00   32.49   37.00   57.00


8. Graph of one variable

# CODE TO GENERATE PLOT OF ONE VARIABLE WITH APPORPORIATE TITLES
ggplot(data=Affairs, mapping=aes(x=age)) + 
  geom_bar() +
  labs(title="affairs ", 
       subtitle="affairs",
       x = " age ")


9. Description of Graph

The distribution is normal distribution graph right skewed .The graph shows affairs data along with the age , the count is highest around age 20 to 30 year .

Relationship between variables.

Below is an anlysis showing affairs count withrespect to age


10. Graph of two variables

# CODE TO GENERATE PLOT FROM TWO VARIRABLES (in color)
ggplot(data=Affairs,mapping = aes(x=age,y= yearsmarried,
color=children))+
  geom_boxplot()+
  labs(title="Vis
       ",
       subtitle="data in context with age and years of marriage
          ", 
       x = "age ", y = " yearsmarried",
       caption = "Source: Psychology Today in 1969.") 


11. Description of Graph

we can see from the graph that around age 40 to 50 have children instead of age 20 to 30 year