Dataset: Loksabha 2019 Candidates General Information. (https://www.kaggle.com/datasets/themlphdstudent/lok-sabha-election-candidate-list-2004-to-2019)
# Importing required libraries
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.2 ✔ readr 2.1.4
## ✔ forcats 1.0.0 ✔ stringr 1.5.0
## ✔ ggplot2 3.4.3 ✔ tibble 3.2.1
## ✔ lubridate 1.9.2 ✔ tidyr 1.3.0
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(ggrepel)
library(ggthemes)
# Loading our dataset
data <-read.csv('C:\\Users\\bhush\\Downloads\\Coursework\\I 590 INTRO TO R\\datasets\\data_final\\LokSabha2019_xl.csv')
data_cp <- data
# Age
mean_age <- mean(data_cp$Age, na.rm = TRUE)
data_cp$Age[is.na(data_cp$Age)] <- mean_age
# Total Assets
ta <- mean(data_cp$Total.Assets, na.rm = TRUE)
data_cp$Total.Assets[is.na(data_cp$Total.Assets)] <- ta
# Liablities
li <- mean(data_cp$Liabilities, na.rm = TRUE)
data_cp$Liabilities[is.na(data_cp$Liabilities)] <- li
cor(select(data_cp, Winner, Total.Assets, Liabilities, Age, Criminal.Cases))
## Winner Total.Assets Liabilities Age Criminal.Cases
## Winner 1.00000000 0.13784068 0.12381314 0.13280653 -0.02815683
## Total.Assets 0.13784068 1.00000000 0.50682529 0.11138832 0.03012918
## Liabilities 0.12381314 0.50682529 1.00000000 0.06541089 0.02640872
## Age 0.13280653 0.11138832 0.06541089 1.00000000 0.02234815
## Criminal.Cases -0.02815683 0.03012918 0.02640872 0.02234815 1.00000000
cor(data_cp$Liabilities,data_cp$Total.Assets, method="spearman")
## [1] 0.5438489
qqnorm(residuals(model))
qqline(residuals(model))
The normal Q-Q plot suggests that the residuals from the ‘model’ are approximately normally distributed.
Additionally, the plot shows that the residuals are slightly scattered to the right. This means that there are a few outliers on the right-hand side of the distribution.
qqnorm(residuals(model1))
qqline(residuals(model1))
plot(residuals(model, type = "deviance") ~ fitted(model))
plot(residuals(model1, type = "deviance") ~ fitted(model1))
The residuals are randomly scattered around the zero line.
However, there are a few outliers, which are the observations with large residuals.