Quiz #1

Quiz 1

Il dataset Credit si trova nel package ISLR. Usare l’help per avere informazioni sul dataset

if(!require(tidyverse)){
install.packages("tidyverse")
}

## Loading required package: tidyverse

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.0     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.4.1     ✔ tibble    3.2.0
## ✔ lubridate 1.9.2     ✔ tidyr     1.3.0
## ✔ purrr     1.0.1     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the ]8;;http://conflicted.r-lib.org/conflicted package]8;; to force all conflicts to become errors

if(!require(ISLR)){
install.packages("ISLR")
}

## Loading required package: ISLR

library("tidyverse")
library("ISLR")

Visualizzare la struttura del dataset

str(Credit)

## 'data.frame':    400 obs. of  12 variables:
##  $ ID       : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ Income   : num  14.9 106 104.6 148.9 55.9 ...
##  $ Limit    : int  3606 6645 7075 9504 4897 8047 3388 7114 3300 6819 ...
##  $ Rating   : int  283 483 514 681 357 569 259 512 266 491 ...
##  $ Cards    : int  2 3 4 3 2 4 2 2 5 3 ...
##  $ Age      : int  34 82 71 36 68 77 37 87 66 41 ...
##  $ Education: int  11 15 11 11 16 10 12 9 13 19 ...
##  $ Gender   : Factor w/ 2 levels " Male","Female": 1 2 1 2 1 1 2 1 2 2 ...
##  $ Student  : Factor w/ 2 levels "No","Yes": 1 2 1 1 1 1 1 1 1 2 ...
##  $ Married  : Factor w/ 2 levels "No","Yes": 2 2 1 1 2 1 1 1 1 2 ...
##  $ Ethnicity: Factor w/ 3 levels "African American",..: 3 2 2 2 3 3 1 2 3 1 ...
##  $ Balance  : int  333 903 580 964 331 1151 203 872 279 1350 ...

Quante osservazioni ci sono? Quante variabili categoriche?

data("Credit",package="ISLR")

Osservazioni<-nrow(Credit)
Osservazioni

## [1] 400

VariabiliCategoriche<-sum(sapply(Credit,is.factor))
VariabiliCategoriche

## [1] 4

Calcolare a colpo d’occhio le statistiche descrittive per tutte le variabili del dataset

summary(Credit)

##        ID            Income           Limit           Rating     
##  Min.   :  1.0   Min.   : 10.35   Min.   :  855   Min.   : 93.0  
##  1st Qu.:100.8   1st Qu.: 21.01   1st Qu.: 3088   1st Qu.:247.2  
##  Median :200.5   Median : 33.12   Median : 4622   Median :344.0  
##  Mean   :200.5   Mean   : 45.22   Mean   : 4736   Mean   :354.9  
##  3rd Qu.:300.2   3rd Qu.: 57.47   3rd Qu.: 5873   3rd Qu.:437.2  
##  Max.   :400.0   Max.   :186.63   Max.   :13913   Max.   :982.0  
##      Cards            Age          Education        Gender    Student  
##  Min.   :1.000   Min.   :23.00   Min.   : 5.00    Male :193   No :360  
##  1st Qu.:2.000   1st Qu.:41.75   1st Qu.:11.00   Female:207   Yes: 40  
##  Median :3.000   Median :56.00   Median :14.00                         
##  Mean   :2.958   Mean   :55.67   Mean   :13.45                         
##  3rd Qu.:4.000   3rd Qu.:70.00   3rd Qu.:16.00                         
##  Max.   :9.000   Max.   :98.00   Max.   :20.00                         
##  Married              Ethnicity      Balance       
##  No :155   African American: 99   Min.   :   0.00  
##  Yes:245   Asian           :102   1st Qu.:  68.75  
##            Caucasian       :199   Median : 459.50  
##                                   Mean   : 520.01  
##                                   3rd Qu.: 863.00  
##                                   Max.   :1999.00

Bonus installare il package skimr, caricare il package e provare la funzione skim() sul dataset

if(!require("skimr")){
install.packages("skimr")
}

## Loading required package: skimr

library("skimr")

skim(Credit)

Data summary
Name	Credit
Number of rows	400
Number of columns	12
_______________________
Column type frequency:
factor	4
numeric	8
________________________
Group variables	None

Variable type: factor

skim_variable	complete_rate	ordered	n_unique	top_counts
Gender	1	FALSE	2	Fem: 207, Ma: 193
Student	1	FALSE	2	No: 360, Yes: 40
Married	1	FALSE	2	Yes: 245, No: 155
Ethnicity	1	FALSE	3	Cau: 199, Asi: 102, Afr: 99

Variable type: numeric

skim_variable	complete_rate	mean	sd	p0	p25	p50	p75	p100	hist
ID	1	200.50	115.61	1.00	100.75	200.50	300.25	400.00	▇▇▇▇▇
Income	1	45.22	35.24	10.35	21.01	33.12	57.47	186.63	▇▂▁▁▁
Limit	1	4735.60	2308.20	855.00	3088.00	4622.50	5872.75	13913.00	▆▇▃▁▁
Rating	1	354.94	154.72	93.00	247.25	344.00	437.25	982.00	▆▇▃▁▁
Cards	1	2.96	1.37	1.00	2.00	3.00	4.00	9.00	▇▇▂▁▁
Age	1	55.67	17.25	23.00	41.75	56.00	70.00	98.00	▆▇▇▇▁
Education	1	13.45	3.13	5.00	11.00	14.00	16.00	20.00	▂▅▇▇▂
Balance	1	520.02	459.76	0.00	68.75	459.50	863.00	1999.00	▇▅▃▂▁

Creare un dataset con i soli studenti che hanno almeno 2 carte

Credit2<-Credit %>%
filter(Student=="Yes" & Cards>=2)

In quale dei due dataset ci sono più donne?

Credit %>%
filter(Gender=="Female") %>%
summarize(count=n())

##   count
## 1   207

Credit2 %>%
filter(Gender=="Female") %>%
summarize(count=n())

##   count
## 1    24

print("Nel dataset Credit ci sono più donne")

## [1] "Nel dataset Credit ci sono più donne"

Stampare le categorie della variabile Ethnicity

CategorieEthnicity <- levels(Credit$Ethnicity)
CategorieEthnicity

## [1] "African American" "Asian"            "Caucasian"

Quiz #1

Victor Oshimen

2023-03-18