1 CLUSTER ANALYSIS

Cluster Analysis in R, when we do data analytics, there are two kinds of approaches, one is supervised and the other is unsupervised. Clustering is a method for finding subgroups of observations within a data set. When we are doing clustering, we need observations in the same group with similar patterns and observations in different groups to be dissimilar.

If there is no response variable, then suitable for an unsupervised method, which implies that it seeks to find relationships between the n observations without being trained by a response variable. Thus, clustering allows us to identify homogenous groups and categorize them from the dataset.

One of the simplest clustering is K-means, the most commonly used clustering method for splitting a dataset into a set of n groups. If datasets contain no response variable and with many variables then it comes under an unsupervised approach. Moreover, cluster analysis is an unsupervised approach and sed for segmenting markets into groups of similar customers or patterns.

In summary, cluster analysis is a statistical technique that groups similar observations into clusters based on their characteristics.Clustering is the task of dividing the population or data points into a number of groups such that data points in the same groups are more similar to other data points in the same group and dissimilar to the data points in other groups. The goal of clustering is to identify patterns or groups of similar objects within a data set of interest.

Two research scenarios where cluster analysis may be used as the data analysis strategy.

Cluster analysis is a definite benefit, and it is widely used across industries, functionalities, and the research field. To better depict the usefulness of cluster analysis in research, let us look at the bottom two examples.

1. Data is imperative for brands and organizations to derive inferences and draw conclusions into the mind of customers. Cluster analysis is a critical component of data analysis in market research that aids brands with deriving trends, identifying groups among various demographics of customers, purchase behaviors, likes and dislikes, and more. 

This market research analysis method provides insights into bucketing information into smaller groups that help understand how different groups of persons respond under comparable conditions. Various organizations and academics can categorize clusters based on pre-defined criteria for what makes sense of a cluster, but the underlying data analysis theme is consistent.

FUrthermore, brands have long used cluster analysis to make sense of purchasing behavior studies and trends among their client base by applying demographic segmentation. Geographic location, gender, age, annual family income, and other criteria are commonly evaluated.

These parameters emphasize how various customer groups make other purchasing decisions; as a result, retail behemoths use this data to draw parallels on how to promote to such audiences. This also aids in increasing the ROI on spending while decreasing client attrition.

2 Classes, or conceptually significant collections of items with similar properties, are fundamental in how individuals examine and describe the world. Indeed, humans are adept at grouping objects (clustering) and allocating specific objects to these categories (classification). Even very young children, for example, can swiftly describe the items in an image as buildings, automobiles, people, animals, plants, and so on. Clusters are prospective classes in the context of data interpretation, and cluster analysis is the study of strategies for automatically finding classes. The following is an excellent example from Biology.

Biologists has spent many years developing a taxonomy (hierarchical classification) of all living things, which includes kingdoms, phyla, classes, orders, families, genera, and species. As a result, it is perhaps not surprising that much of the early work in cluster analysis tried to develop a mathematical taxonomy field capable of automatically discovering such classification systems. Recently, we used clustering to examine the massive volumes of genetic data that are now available. As an example, clustering analysis has been used to identify groupings of genes with similar functions.

2 ONE WAY MANOVA

The one-way multivariate analysis of variance (one-way MANOVA) is used to determine whether there are any differences between independent groups on more than one continuous dependent variable. In this regard, it differs from a one-way ANOVA, which only measures one dependent variable. The following are some scenarios where we uses one way MANOVA.

Two research scenarios where one way MANOVA may be used as the data analysis strategy.

1. You could use a one-way MANOVA to see if there were differences in drug users’ perceptions of attractiveness and intelligence in movies (i.e., the two dependent variables are “perceptions of attractiveness” and “perceptions of intelligence”, while the independent variable is “drug users in movies”, which has three independent groups: “non-user”, “experimenter”, and “regular user”). Alternatively, you could use a one-way MANOVA to see if there were differences in students’ short-term and long-term recall of facts based on three different lengths of lecture (i.e., the two dependent variables are “short-term memory recall” and “long-term memory recall”, and the independent variable is “lecture duration”, which has four independent groups: “30 minutes”, “60 minutes”, “90 minutes”, and “120 minutes”).

2 A researcher allocates 33 individuals to one of three groups at random. The first group receives interactive dietary information from an online website. Group 2 gets the same information from a nurse practitioner, but Group 3 gets it from a video film recorded by the same nurse practitioner. To see if there is a difference in presenting types, the researcher looks at three separate ratings of the presentation: difficulty, usefulness, and importance. The researcher is particularly interested in whether the interactive website is preferable because it is the most cost-effective method of presenting information.

Additional Exmaple

A clinical psychologist enrolls 100 persons with panic disorder in his study. For eight weeks, each participant receives one of four types of treatment. At the completion of treatment, each subject takes part in a structured interview during which the clinical psychologist rates them in three categories: physiological, emotional, and cognitive. The clinical psychologist wants to determine which sort of treatment best reduces panic disorder symptoms as measured by physiological, emotional, and cognitive measures.

3 LINEAR DISCRIMINANT ANALYSIS

Linear Discriminant Analysis (LDA) is a dimensionality reduction technique. LDA used for dimensionality reduction to reduce the number of dimensions (i.e. variables) in a dataset while retaining as much information as possible.

Basically, it helps to find the linear combination of original variables that provide the best possible separation between the groups. This, we provide some scenarios below where we can use LDA.

Two research scenarios where linear discriminant analysis may be used as the data analysis strategy.

1. LDA is being used to identify customers. With the help of LDA, we can easily locate and select features that can specify the demography of consumers who are more inclined to buy a specific product in a mall. This can be useful if we want to identify a specific demographic of shoppers who tend to buy a particular item in a mall.

2 LDA is extremely useful in the medical profession for categorizing patient diseases based on a variety of factors relating to the patient’s health and the current medical treatments. It categorizes disease as mild, moderate, or severe based on these criteria. The doctors can alter the treatment’s pace by using this classification to their advantage.

4 FACTOR ANALYSIS

Factor analysis is a statistical method used to search for some unobserved variables called factors from observed variables called factors. It uses uses the correlation structure amongst observed variables to model a smaller number of unobserved, latent variables known as factors. Researchers use this statistical method when subject-area knowledge suggests that latent factors cause observable variables to covary. Use factor analysis to identify the hidden variables.

Analysts often refer to the observed variables as indicators because they literally indicate information about the factor. Factor analysis treats these indicators as linear combinations of the factors in the analysis plus an error. The procedure assesses how much of the variance each factor explains within the indicators. The idea is that the latent factors create commonalities in some of the observed variables.

For example, socioeconomic status (SES) is a factor you can’t measure directly. However, you can assess occupation, income, and education levels. These variables all relate to socioeconomic status. People with a particular socioeconomic status tend to have similar values for the observable variables. If the factor (SES) has a strong relationship with these indicators, then it accounts for a large portion of the variance in the indicators. Hence, examples are as follows:

Two research scenarios where factor analysis may be used as the data analysis strategy.

1. Factor analysis is used to uncover “factors” that explain a wide range of test results. For example, intelligence studies discovered that persons who perform well on a verbal ability test perform well on other examinations that need linguistic talents. Researchers explained this by utilizing component analysis to identify one factor, commonly referred to as verbal intelligence, which represents a person’s ability to solve problems involving verbal skills.

Factor analysis in psychology is most often associated with intelligence research. However, it also has been used to find factors in a broad range of domains such as personality, attitudes, beliefs, etc. It is linked to psychometrics, as it can assess the validity of an instrument by finding if the instrument indeed measures the postulated factors.

Hence, to find “factors” that can explain a range of test outcomes, factor analysis is performed. For instance, intelligence studies have shown that persons who perform well on verbal aptitude exams also perform well on other verbal aptitude tests. Researchers used factor analysis to identify one factor, commonly referred to as verbal intelligence, which reflects an individual’s capacity to solve challenges involving verbal skills.

2 Data mining and machine learning go together. Factor Analysis may be a Machine Learning tool because of this. Machine learning algorithms employ Factor Analysis to minimise the number of variables in a dataset to get a more accurate and enhanced collection of observable factors. They are well trained with massive data to make room for additional applications. It is a popular unsupervised machine learning technique for dimensionality reduction. Machine learning and Factor Analysis may create data mining methods and speed up data investigation.

Factor Analysis can rival artificial intelligence in data mining. FA simplifies data mining by filtering out variables that are linked. Data scientists have long struggled to uncover links and correlate variables. This statistical strategy has improved data mining.

Also, Marketing promotes products, services, and brands. This statistical technique might aid marketing factor analysis. Businesses use this analysis to establish the link between marketing campaign aspects to improve their long-term performance. It also links customer satisfaction to post-campaign feedback to quantify campaign efficacy and audience impact. Thus, factor analysis may improve marketing input and consumer happiness, increasing sales.

5 ON ‘canondat.sav’ DATA

‘canondat.sav’ data is not found

6 FROM THE VALUEGENESIS DATA (valuegendata)

a. Screen the data for violation of assumptions. Perform appropriate transformations, where appropriate. Be sure to explain the bases for your transformations.

library(mvtnorm)
library(QuantPsyc)
## Warning: package 'QuantPsyc' was built under R version 4.2.3
## Loading required package: boot
## Loading required package: dplyr
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
## Loading required package: purrr
## Loading required package: MASS
## 
## Attaching package: 'MASS'
## The following object is masked from 'package:dplyr':
## 
##     select
## 
## Attaching package: 'QuantPsyc'
## The following object is masked from 'package:base':
## 
##     norm
library(energy)
## Warning: package 'energy' was built under R version 4.2.3
library(readxl)
library(rmarkdown)
valuegendata <- read_excel("C:/Users/63966/Downloads/valuegendata.xlsx")
valuegendata
## # A tibble: 2,315 × 37
##    ComJesus ImpRe…¹ Gender SchType Race  Birth…² Birth…³ Birth…⁴ parstat FathSDA
##    <chr>    <chr>   <chr>  <chr>   <chr> <chr>   <chr>   <chr>   <chr>   <chr>  
##  1 1        2       1      3       5     1       1       0       2       2      
##  2 2        4       1      2       5     0       0       0       2       3      
##  3 5        2       2      2       2     0       0       0       4       2      
##  4 5        3       2      2       4     1       0       0       1       3      
##  5 1        5       1      3       6     1       1       1       3       2      
##  6 4        1       1      2       6     1       1       1       2       2      
##  7 4        1       2      2       4     1       0       1       2       2      
##  8 2        3       2      3       3     0       0       0       1       3      
##  9 5        4       1      2       5     0       0       0       1       3      
## 10 2        1       1      2       2     1       0       1       1       2      
## # … with 2,305 more rows, 27 more variables: MothSDA <chr>, GradeLvel <chr>,
## #   baptism <chr>, howold <chr>, spiritualmaturity <dbl>, PerDev <dbl>,
## #   Grace <dbl>, Works <dbl>, CongClimate <dbl>, LV_Altruim <dbl>,
## #   LV_Adventism <dbl>, LV_Materialism <dbl>, Den_Loyal <dbl>,
## #   AdventStd_Diss <dbl>, AtSchl <dbl>, FamClim <dbl>, SchClimate <dbl>,
## #   QualRelEd <dbl>, SpiritInfluence <dbl>, Rate_Church <dbl>,
## #   Rate_School <dbl>, IntRelig <dbl>, ExtRelig <dbl>, FrndRel <dbl>, …

Null Hypothesis: The variables follow a multivariate normal distribution.

Alternative Hypothesis: The variables do not follow a multivariate normal distribution.

Using the mvn function in MVN package with the code ``mvn(data = valuegendata, mvnTest = “hz”), we will then see that the variables follow a multivariate normal distribution.

b. Run canonical correlation analysis to determine the nature of the relationships between the two sets of variables.

require(ggplot2)
## Loading required package: ggplot2
require(GGally)
## Loading required package: GGally
## Warning: package 'GGally' was built under R version 4.2.3
## Registered S3 method overwritten by 'GGally':
##   method from   
##   +.gg   ggplot2
require(CCA)
## Loading required package: CCA
## Warning: package 'CCA' was built under R version 4.2.3
## Loading required package: fda
## Warning: package 'fda' was built under R version 4.2.3
## Loading required package: splines
## Loading required package: fds
## Warning: package 'fds' was built under R version 4.2.3
## Loading required package: rainbow
## Warning: package 'rainbow' was built under R version 4.2.3
## Loading required package: pcaPP
## Warning: package 'pcaPP' was built under R version 4.2.3
## Loading required package: RCurl
## Loading required package: deSolve
## Warning: package 'deSolve' was built under R version 4.2.3
## 
## Attaching package: 'fda'
## The following object is masked from 'package:boot':
## 
##     melanoma
## The following object is masked from 'package:graphics':
## 
##     matplot
## Loading required package: fields
## Warning: package 'fields' was built under R version 4.2.3
## Loading required package: spam
## Warning: package 'spam' was built under R version 4.2.3
## Spam version 2.9-1 (2022-08-07) is loaded.
## Type 'help( Spam)' or 'demo( spam)' for a short introduction 
## and overview of this package.
## Help for individual functions is also obtained by adding the
## suffix '.spam' to the function name, e.g. 'help( chol.spam)'.
## 
## Attaching package: 'spam'
## The following objects are masked from 'package:mvtnorm':
## 
##     rmvnorm, rmvt
## The following objects are masked from 'package:base':
## 
##     backsolve, forwardsolve
## Loading required package: viridis
## Warning: package 'viridis' was built under R version 4.2.3
## Loading required package: viridisLite
## 
## Try help(fields) to get started.
colnames(valuegendata) <- c("comJesus", "ImpReligion", "Gender", "SchType", "Race", "Birth_ME", "Birth_Mother", "Birth_Dad", "parstat", "FathSDA", "MothSDA", "GradeLvel", "baptism", "howold", "spiritualmaturity", "PerDev", "Grace", "Works", "CongClimate", "LV_Altruim", "LV_Adventism", "LV_Materialism", "Den_Loyal", "AdventStd_Diss", "AtSchl", "FamClim", "SchClimate", "QualRelEd", "SpiritInfluence", "Rate_Church", "Rate_School", "IntRelig", "ExtRelig", "FrndRel", "AdventOrtho", "MAH_1", "ProbMah")
summary(valuegendata)
##    comJesus         ImpReligion           Gender            SchType         
##  Length:2315        Length:2315        Length:2315        Length:2315       
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##                                                                             
##      Race             Birth_ME         Birth_Mother        Birth_Dad        
##  Length:2315        Length:2315        Length:2315        Length:2315       
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##                                                                             
##    parstat            FathSDA            MothSDA           GradeLvel        
##  Length:2315        Length:2315        Length:2315        Length:2315       
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##                                                                             
##    baptism             howold          spiritualmaturity     PerDev     
##  Length:2315        Length:2315        Min.   :12.00     Min.   : 4.00  
##  Class :character   Class :character   1st Qu.:35.00     1st Qu.:13.00  
##  Mode  :character   Mode  :character   Median :41.00     Median :17.00  
##                                        Mean   :40.78     Mean   :17.08  
##                                        3rd Qu.:47.00     3rd Qu.:21.00  
##                                        Max.   :60.00     Max.   :32.00  
##                                                                         
##      Grace           Works        CongClimate     LV_Altruim     LV_Adventism  
##  Min.   : 7.00   Min.   : 3.00   Min.   :11.0   Min.   : 4.00   Min.   :2.000  
##  1st Qu.:17.00   1st Qu.: 7.00   1st Qu.:41.0   1st Qu.:11.00   1st Qu.:5.000  
##  Median :22.00   Median :10.00   Median :49.0   Median :12.93   Median :6.000  
##  Mean   :21.79   Mean   :10.12   Mean   :47.8   Mean   :12.35   Mean   :5.971  
##  3rd Qu.:27.00   3rd Qu.:14.00   3rd Qu.:56.0   3rd Qu.:14.00   3rd Qu.:7.000  
##  Max.   :35.00   Max.   :15.00   Max.   :66.0   Max.   :16.00   Max.   :8.000  
##                                                                                
##  LV_Materialism    Den_Loyal     AdventStd_Diss      AtSchl     
##  Min.   :2.000   Min.   : 5.00   Min.   : 8.00   Min.   : 4.00  
##  1st Qu.:3.000   1st Qu.:16.00   1st Qu.:27.00   1st Qu.:15.00  
##  Median :4.000   Median :19.00   Median :35.00   Median :18.00  
##  Mean   :4.508   Mean   :18.21   Mean   :34.47   Mean   :16.94  
##  3rd Qu.:6.000   3rd Qu.:21.00   3rd Qu.:42.00   3rd Qu.:20.00  
##  Max.   :8.000   Max.   :23.00   Max.   :64.00   Max.   :24.00  
##                                                                 
##     FamClim        SchClimate      QualRelEd     SpiritInfluence 
##  Min.   : 5.00   Min.   : 9.00   Min.   : 8.00   Min.   : 27.00  
##  1st Qu.:23.00   1st Qu.:22.00   1st Qu.:29.00   1st Qu.: 77.00  
##  Median :26.00   Median :25.00   Median :34.00   Median : 89.00  
##  Mean   :25.03   Mean   :24.52   Mean   :33.71   Mean   : 88.22  
##  3rd Qu.:29.00   3rd Qu.:28.00   3rd Qu.:40.00   3rd Qu.:101.00  
##  Max.   :30.00   Max.   :36.00   Max.   :48.00   Max.   :135.00  
##                                                                  
##   Rate_Church     Rate_School       IntRelig        ExtRelig    
##  Min.   :10.00   Min.   :10.00   Min.   : 7.00   Min.   : 7.00  
##  1st Qu.:40.00   1st Qu.:40.00   1st Qu.:23.00   1st Qu.:15.00  
##  Median :49.00   Median :45.00   Median :26.00   Median :18.00  
##  Mean   :49.04   Mean   :45.51   Mean   :25.78   Mean   :18.29  
##  3rd Qu.:57.00   3rd Qu.:53.38   3rd Qu.:29.00   3rd Qu.:21.00  
##  Max.   :70.00   Max.   :70.00   Max.   :35.00   Max.   :35.00  
##                                                                 
##     FrndRel       AdventOrtho        MAH_1            ProbMah        
##  Min.   : 4.00   Min.   : 25.0   Min.   : 0.7353   Min.   :0.000003  
##  1st Qu.:10.00   1st Qu.:128.0   1st Qu.: 4.8931   1st Qu.:0.257340  
##  Median :11.37   Median :138.0   Median : 7.1437   Median :0.521208  
##  Mean   :11.37   Mean   :134.5   Mean   : 7.9965   Mean   :0.509567  
##  3rd Qu.:13.00   3rd Qu.:144.0   3rd Qu.:10.1107   3rd Qu.:0.768939  
##  Max.   :16.00   Max.   :150.0   Max.   :40.3615   Max.   :0.999432  
##                                  NA's   :16        NA's   :16
xtabs(~comJesus, data = valuegendata)
## comJesus
##    1    2    3    4    5 
##   21  196  284 1026  772
psych <- valuegendata[, 1:18]
acad <- valuegendata[, 19:37]
ggpairs(psych)
## Warning: Removed 16 rows containing non-finite values (`stat_g_gally_count()`).
## Removed 16 rows containing non-finite values (`stat_g_gally_count()`).
## Removed 16 rows containing non-finite values (`stat_g_gally_count()`).
## Removed 16 rows containing non-finite values (`stat_g_gally_count()`).
## Removed 16 rows containing non-finite values (`stat_g_gally_count()`).
## Removed 16 rows containing non-finite values (`stat_g_gally_count()`).
## Removed 16 rows containing non-finite values (`stat_g_gally_count()`).
## Removed 16 rows containing non-finite values (`stat_g_gally_count()`).
## Removed 16 rows containing non-finite values (`stat_g_gally_count()`).
## Removed 16 rows containing non-finite values (`stat_g_gally_count()`).
## Removed 16 rows containing non-finite values (`stat_g_gally_count()`).
## Removed 16 rows containing non-finite values (`stat_g_gally_count()`).
## Removed 16 rows containing non-finite values (`stat_g_gally_count()`).
## Warning: Removed 16 rows containing missing values (`stat_boxplot()`).
## Removed 16 rows containing missing values (`stat_boxplot()`).
## Removed 16 rows containing missing values (`stat_boxplot()`).
## Removed 16 rows containing missing values (`stat_boxplot()`).
## Warning: Removed 7 rows containing non-finite values (`stat_g_gally_count()`).
## Removed 7 rows containing non-finite values (`stat_g_gally_count()`).
## Removed 7 rows containing non-finite values (`stat_g_gally_count()`).
## Removed 7 rows containing non-finite values (`stat_g_gally_count()`).
## Removed 7 rows containing non-finite values (`stat_g_gally_count()`).
## Removed 7 rows containing non-finite values (`stat_g_gally_count()`).
## Removed 7 rows containing non-finite values (`stat_g_gally_count()`).
## Removed 7 rows containing non-finite values (`stat_g_gally_count()`).
## Removed 7 rows containing non-finite values (`stat_g_gally_count()`).
## Removed 7 rows containing non-finite values (`stat_g_gally_count()`).
## Removed 7 rows containing non-finite values (`stat_g_gally_count()`).
## Removed 7 rows containing non-finite values (`stat_g_gally_count()`).
## Warning: Removed 7 rows containing missing values (`stat_boxplot()`).
## Removed 7 rows containing missing values (`stat_boxplot()`).
## Removed 7 rows containing missing values (`stat_boxplot()`).
## Removed 7 rows containing missing values (`stat_boxplot()`).
## Warning: Removed 31 rows containing non-finite values (`stat_g_gally_count()`).
## Removed 31 rows containing non-finite values (`stat_g_gally_count()`).
## Removed 31 rows containing non-finite values (`stat_g_gally_count()`).
## Removed 31 rows containing non-finite values (`stat_g_gally_count()`).
## Removed 31 rows containing non-finite values (`stat_g_gally_count()`).
## Removed 31 rows containing non-finite values (`stat_g_gally_count()`).
## Removed 31 rows containing non-finite values (`stat_g_gally_count()`).
## Removed 31 rows containing non-finite values (`stat_g_gally_count()`).
## Removed 31 rows containing non-finite values (`stat_g_gally_count()`).
## Warning: Removed 31 rows containing missing values (`stat_boxplot()`).
## Removed 31 rows containing missing values (`stat_boxplot()`).
## Removed 31 rows containing missing values (`stat_boxplot()`).
## Removed 31 rows containing missing values (`stat_boxplot()`).
## Warning: Removed 10 rows containing non-finite values (`stat_g_gally_count()`).
## Removed 10 rows containing non-finite values (`stat_g_gally_count()`).
## Removed 10 rows containing non-finite values (`stat_g_gally_count()`).
## Removed 10 rows containing non-finite values (`stat_g_gally_count()`).
## Removed 10 rows containing non-finite values (`stat_g_gally_count()`).
## Warning: Removed 10 rows containing missing values (`stat_boxplot()`).
## Removed 10 rows containing missing values (`stat_boxplot()`).
## Removed 10 rows containing missing values (`stat_boxplot()`).
## Removed 10 rows containing missing values (`stat_boxplot()`).
## Warning: Removed 12 rows containing non-finite values (`stat_g_gally_count()`).
## Removed 12 rows containing non-finite values (`stat_g_gally_count()`).
## Removed 12 rows containing non-finite values (`stat_g_gally_count()`).
## Removed 12 rows containing non-finite values (`stat_g_gally_count()`).
## Warning: Removed 12 rows containing missing values (`stat_boxplot()`).
## Removed 12 rows containing missing values (`stat_boxplot()`).
## Removed 12 rows containing missing values (`stat_boxplot()`).
## Removed 12 rows containing missing values (`stat_boxplot()`).
## Warning: Removed 9 rows containing non-finite values (`stat_g_gally_count()`).
## Removed 9 rows containing non-finite values (`stat_g_gally_count()`).
## Removed 9 rows containing non-finite values (`stat_g_gally_count()`).
## Warning: Removed 9 rows containing missing values (`stat_boxplot()`).
## Removed 9 rows containing missing values (`stat_boxplot()`).
## Removed 9 rows containing missing values (`stat_boxplot()`).
## Removed 9 rows containing missing values (`stat_boxplot()`).
## Warning: Removed 16 rows containing non-finite values (`stat_g_gally_count()`).
## Warning: Removed 16 rows containing missing values (`stat_boxplot()`).
## Removed 16 rows containing missing values (`stat_boxplot()`).
## Removed 16 rows containing missing values (`stat_boxplot()`).
## Removed 16 rows containing missing values (`stat_boxplot()`).
## Warning: Removed 7 rows containing missing values (`stat_boxplot()`).
## Removed 7 rows containing missing values (`stat_boxplot()`).
## Removed 7 rows containing missing values (`stat_boxplot()`).
## Removed 7 rows containing missing values (`stat_boxplot()`).
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

ggpairs(acad)
## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 16 rows containing missing values
## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 16 rows containing missing values

## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 16 rows containing missing values

## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 16 rows containing missing values

## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 16 rows containing missing values

## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 16 rows containing missing values

## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 16 rows containing missing values

## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 16 rows containing missing values

## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 16 rows containing missing values

## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 16 rows containing missing values

## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 16 rows containing missing values

## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 16 rows containing missing values

## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 16 rows containing missing values

## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 16 rows containing missing values

## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 16 rows containing missing values

## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 16 rows containing missing values

## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 16 rows containing missing values

## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 16 rows containing missing values

## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 16 rows containing missing values

## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 16 rows containing missing values

## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 16 rows containing missing values

## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 16 rows containing missing values

## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 16 rows containing missing values

## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 16 rows containing missing values

## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 16 rows containing missing values

## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 16 rows containing missing values

## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 16 rows containing missing values

## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 16 rows containing missing values

## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 16 rows containing missing values

## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 16 rows containing missing values

## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 16 rows containing missing values

## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 16 rows containing missing values

## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 16 rows containing missing values

## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 16 rows containing missing values
## Warning: Removed 16 rows containing missing values (`geom_point()`).
## Removed 16 rows containing missing values (`geom_point()`).
## Removed 16 rows containing missing values (`geom_point()`).
## Removed 16 rows containing missing values (`geom_point()`).
## Removed 16 rows containing missing values (`geom_point()`).
## Removed 16 rows containing missing values (`geom_point()`).
## Removed 16 rows containing missing values (`geom_point()`).
## Removed 16 rows containing missing values (`geom_point()`).
## Removed 16 rows containing missing values (`geom_point()`).
## Removed 16 rows containing missing values (`geom_point()`).
## Removed 16 rows containing missing values (`geom_point()`).
## Removed 16 rows containing missing values (`geom_point()`).
## Removed 16 rows containing missing values (`geom_point()`).
## Removed 16 rows containing missing values (`geom_point()`).
## Removed 16 rows containing missing values (`geom_point()`).
## Removed 16 rows containing missing values (`geom_point()`).
## Removed 16 rows containing missing values (`geom_point()`).
## Warning: Removed 16 rows containing non-finite values (`stat_density()`).
## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 16 rows containing missing values
## Warning: Removed 16 rows containing missing values (`geom_point()`).
## Removed 16 rows containing missing values (`geom_point()`).
## Removed 16 rows containing missing values (`geom_point()`).
## Removed 16 rows containing missing values (`geom_point()`).
## Removed 16 rows containing missing values (`geom_point()`).
## Removed 16 rows containing missing values (`geom_point()`).
## Removed 16 rows containing missing values (`geom_point()`).
## Removed 16 rows containing missing values (`geom_point()`).
## Removed 16 rows containing missing values (`geom_point()`).
## Removed 16 rows containing missing values (`geom_point()`).
## Removed 16 rows containing missing values (`geom_point()`).
## Removed 16 rows containing missing values (`geom_point()`).
## Removed 16 rows containing missing values (`geom_point()`).
## Removed 16 rows containing missing values (`geom_point()`).
## Removed 16 rows containing missing values (`geom_point()`).
## Removed 16 rows containing missing values (`geom_point()`).
## Removed 16 rows containing missing values (`geom_point()`).
## Removed 16 rows containing missing values (`geom_point()`).
## Warning: Removed 16 rows containing non-finite values (`stat_density()`).

c. Report your result at the very least, you should present the following: research problem under investigation; the hypothesis/hypotheses being tested; descriptive statistics (means, standard deviation, inter-correlations within and between sets of variables; results of the canonical correlation analysis. Be sure to include summary tables.

Research problem under investigation - Is there an association between the variables in SET 1 and in SET 2. Null Hypothesis: Our two sets of variables are not linearly related.

*Alternative Hypothesis: Our two sets of variables are linearly related.

*RESULTS: The two sets are not linearly related