Decision Trees

Author

Giuseppe A. Veltri

Decision Trees in R

A decision tree is a type of supervised machine learning used to categorize or make predictions based on how a previous set of questions were answered. The model is a form of supervised learning, meaning that the model is trained and tested on a set of data that contains the desired categorization. 

The decision tree may not always provide a clear-cut answer or decision. Instead, it may present options so the data scientist can make an informed decision on their own. Decision trees imitate human thinking, so it’s generally easy for data scientists to understand and interpret the results.

Let’s define some key terms of a decision tree.

  • Root node: The base of the decision tree.

  • Splitting: The process of dividing a node into multiple sub-nodes.

  • Decision node: When a sub-node is further split into additional sub-nodes.

  • Leaf node: When a sub-node does not further split into additional sub-nodes; represents possible outcomes.

  • Pruning: The process of removing sub-nodes of a decision tree.

  • Branch: A subsection of the decision tree consisting of multiple nodes.

A decision tree resembles, well, a tree. The base of the tree is the root node. From the root node flows a series of decision nodes that depict decisions to be made. From the decision nodes are leaf nodes that represent the consequences of those decisions. Each decision node represents a question or split point, and the leaf nodes that stem from a decision node represent the possible answers. Leaf nodes sprout from decision nodes similar to how a leaf sprouts on a tree branch. This is why we call each subsection of a decision tree a “branch.” Let’s take a look at an example for this. You’re a golfer, and a consistent one at that. On any given day you want to predict where your score will be in two buckets: below par or over par.


There are two main types of decision trees: categorical and continuous. The divisions are based on the type of outcome variables used.

Categorical Variable Decision Tree

In a categorical variable decision tree, the answer neatly fits into one category or another. Was the coin toss heads or tails? Is the animal a reptile or a mammal? In this type of decision tree, data is placed into a single category based on the decisions at the nodes throughout the tree.

Continuous Variable Decision Tree or Regression Tree

A continuous variable decision tree is one where there is not a simple yes or no answer. It’s also known as a regression tree because the decision or outcome variable depends on other decisions farther up the tree or the type of choice involved in the decision. 

The benefit of a continuous variable decision tree is that the outcome can be predicted based on multiple variables rather than on a single variable as in a categorical variable decision tree. Continuous variable decision trees are used to create predictions. The system can be used for both linear and non-linear relationships if the correct algorithm is selected.

Classification tree

When you click the Render button a document will be generated that includes both content and the output of embedded code. You can embed code like this:

# Load the party package. It will automatically load other dependent packages.

# Install the packages if not already installed
if (!requireNamespace("rpart", quietly = TRUE)) {
  install.packages("rpart")
}

if (!requireNamespace("rpart.plot", quietly = TRUE)) {
  install.packages("rpart.plot")
}

# Load the packages
library(rpart)
library(rpart.plot)

Print some records from data set readingSkills and Create the input data frame.

data(iris)
# Set the seed for reproducibility
set.seed(123)

# Shuffle the dataset
shuffled_iris <- iris[sample(nrow(iris)), ]

# Split the dataset into 70% training and 30% testing
train_index <- 1:round(0.7 * nrow(shuffled_iris))
train_data <- shuffled_iris[train_index, ]
test_data <- shuffled_iris[-train_index, ]
# Create the classification tree
tree_model <- rpart(Species ~ ., data = train_data, method = "class")
# Plot the tree
rpart.plot(tree_model, extra = 1)

# Make predictions
predicted_species <- predict(tree_model, test_data, type = "class")

# Calculate accuracy
accuracy <- mean(predicted_species == test_data$Species)
cat("Accuracy:", accuracy, "\n")
Accuracy: 0.9777778 

Regression tree

# Load the mtcars dataset
data(mtcars)

# View the first few rows of the dataset
head(mtcars)
                   mpg cyl disp  hp drat    wt  qsec vs am gear carb
Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1
# Set a seed for reproducibility
set.seed(123)

# Split the data into training (70%) and testing (30%) sets
train_indices <- sample(1:nrow(mtcars), 0.7 * nrow(mtcars))
train_data <- mtcars[train_indices, ]
test_data <- mtcars[-train_indices, ]
# Fit the regression tree model using rpart
regression_tree <- rpart(mpg ~ ., data = train_data, method = "anova")

# Print the regression tree model
print(regression_tree)
n= 22 

node), split, n, deviance, yval
      * denotes terminal node

1) root 22 933.87320 21.05909  
  2) cyl>=5 12  72.94917 15.95833 *
  3) cyl< 5 10 174.05600 27.18000 *
# Plot the regression tree using rpart.plot
rpart.plot(regression_tree, type = 3, box.palette = "RdBu", shadow.col = "gray", nn = TRUE)

# Make predictions on the test data
predictions <- predict(regression_tree, test_data)

# Calculate mean squared error (MSE)
mse <- mean((test_data$mpg - predictions)^2)
print(paste("Mean squared error:", mse))
[1] "Mean squared error: 16.7763025"

Model based recursive partitioning

# Install and load the necessary packages
library(partykit)
Caricamento del pacchetto richiesto: grid
Caricamento del pacchetto richiesto: libcoin
Caricamento del pacchetto richiesto: mvtnorm
# Load an example dataset
###Import Data, the CSV file should be in your R working directory####
FinalKids7 <- read.csv(file="KidsITA7.csv", header = T) 
attach(FinalKids7)
###### Libraries ########
library(foreign)
library(corrplot)
corrplot 0.92 loaded
library(REdaS)
library(psych)
library(QuantPsyc)
Caricamento del pacchetto richiesto: boot

Caricamento pacchetto: 'boot'
Il seguente oggetto è mascherato da 'package:psych':

    logit
Caricamento del pacchetto richiesto: dplyr

Caricamento pacchetto: 'dplyr'
I seguenti oggetti sono mascherati da 'package:stats':

    filter, lag
I seguenti oggetti sono mascherati da 'package:base':

    intersect, setdiff, setequal, union
Caricamento del pacchetto richiesto: purrr
Caricamento del pacchetto richiesto: MASS

Caricamento pacchetto: 'MASS'
Il seguente oggetto è mascherato da 'package:dplyr':

    select

Caricamento pacchetto: 'QuantPsyc'
Il seguente oggetto è mascherato da 'package:base':

    norm
library(car)
Caricamento del pacchetto richiesto: carData

Caricamento pacchetto: 'car'
Il seguente oggetto è mascherato da 'package:purrr':

    some
Il seguente oggetto è mascherato da 'package:dplyr':

    recode
Il seguente oggetto è mascherato da 'package:boot':

    logit
Il seguente oggetto è mascherato da 'package:psych':

    logit
library(ggplot2)

Caricamento pacchetto: 'ggplot2'
I seguenti oggetti sono mascherati da 'package:psych':

    %+%, alpha
library(FactoMineR)
library(corrplot)
library(zoo)

Caricamento pacchetto: 'zoo'
I seguenti oggetti sono mascherati da 'package:base':

    as.Date, as.Date.numeric
library(rpart)
library(rpart.plot)
library(party)
Caricamento del pacchetto richiesto: modeltools
Caricamento del pacchetto richiesto: stats4

Caricamento pacchetto: 'modeltools'
Il seguente oggetto è mascherato da 'package:car':

    Predict
Caricamento del pacchetto richiesto: strucchange
Caricamento del pacchetto richiesto: sandwich

Caricamento pacchetto: 'party'
Il seguente oggetto è mascherato da 'package:dplyr':

    where
I seguenti oggetti sono mascherati da 'package:partykit':

    cforest, ctree, ctree_control, edge_simple, mob, mob_control,
    node_barplot, node_bivplot, node_boxplot, node_inner, node_surv,
    node_terminal, varimp
library(sjPlot)
library(sjmisc)

Caricamento pacchetto: 'sjmisc'
Il seguente oggetto è mascherato da 'package:purrr':

    is_empty
library(RCA)
Caricamento del pacchetto richiesto: igraph

Caricamento pacchetto: 'igraph'
Il seguente oggetto è mascherato da 'package:modeltools':

    clusters
I seguenti oggetti sono mascherati da 'package:purrr':

    compose, simplify
I seguenti oggetti sono mascherati da 'package:dplyr':

    as_data_frame, groups, union
I seguenti oggetti sono mascherati da 'package:stats':

    decompose, spectrum
Il seguente oggetto è mascherato da 'package:base':

    union
Caricamento del pacchetto richiesto: gplots

Caricamento pacchetto: 'gplots'
Il seguente oggetto è mascherato da 'package:stats':

    lowess
library(GPArotation)

Caricamento pacchetto: 'GPArotation'
I seguenti oggetti sono mascherati da 'package:psych':

    equamax, varimin
library(ggparty)

###Import Data, the CSV file should be in your R working directory####
FinalKids7 <- read.csv(file="KidsITA7.csv", header = T)
attach(FinalKids7)
I seguenti oggetti sono mascherati da FinalKids7 (pos = 29):

    Ads.unhealhty.lifestyle.Risk_8_score.,
    Ads.unhealthy.food.Risk_9_score., Age, age_re2,
    Bullied.online.Risk_3_score., CEDAD, Child.Age.3.groups, COUNTRY,
    Data.Tracking.Risk_7_score.,
    Digital.identity.theft.fraud.Risk_10_score., digskillschildren,
    digskillsparents, Education.two.groups, Efficacy.online.marketing,
    Efficacy.online.threat, enabling,
    Exposed.to.targeted.Ads.Risk_2_score., filter_., financialrisks,
    Gender, GenderTrue, healthrisks,
    Hidden.ads.Advergames.Risk_6_score., IDC,
    Incentives.in.app.Risk_5_score.,
    Money.in.games.in.app.Risk_4_score., numchild,
    Parenting.mediation.styles, parentstatus, ParentStF,
    pcreativeskills, personalrisks, pinfonav, pmobileskills,
    poperational, psocialskills, Q1, Q10_1, Q10_10, Q10_11, Q10_12,
    Q10_13, Q10_14, Q10_15, Q10_16, Q10_17, Q10_18, Q10_19, Q10_2,
    Q10_20, Q10_21, Q10_22, Q10_23, Q10_24, Q10_3, Q10_4, Q10_5, Q10_6,
    Q10_7, Q10_8, Q10_9, Q11_1, Q11_1_re, Q11_2, Q11_2_re, Q11_3,
    Q11_3_re, Q11_4, Q11_4_re, Q11_5, Q11_5_re, Q11_5N, Q12_1, Q12_2,
    Q12_3, Q12_4, Q12_5, Q12_5N, Q12_6, Q13_1, Q13_10, Q13_11, Q13_12,
    Q13_13, Q13_14, Q13_15, Q13_16, Q13_17, Q13_2, Q13_3, Q13_4, Q13_5,
    Q13_6, Q13_7, Q13_8, Q13_9, Q14_1, Q14_1_re, Q14_2, Q14_2_re,
    Q14_3, Q14_3_re, Q14_4, Q14_4_re, Q14_5, Q14_5_re, Q14_6, Q14_6_re,
    Q14_7, Q14_7_re, Q14_8, Q14_8_re, Q14_8N, Q15_1, Q15_2, Q15_3,
    Q15_4, Q15_5, Q15_6, Q15_7, Q15_8, Q15_9, Q15_9N, Q16_1, Q16_1_re,
    Q16_2, Q16_2_re, Q16_3, Q16_3_re, Q16_4, Q16_4_re, Q16_5, Q16_5_re,
    Q16_6, Q16_6_re, Q16_6N, Q17_1, Q17_10, Q17_2, Q17_3, Q17_4, Q17_5,
    Q17_6, Q17_7, Q17_8, Q17_9, Q18_1, Q18_10, Q18_2, Q18_3, Q18_4,
    Q18_5, Q18_6, Q18_7, Q18_8, Q18_9, Q19_1, Q19_10, Q19_2, Q19_3,
    Q19_4, Q19_5, Q19_6, Q19_7, Q19_8, Q19_9, Q2, Q2_RE, Q2_re2, Q20_1,
    Q20_2, Q20_3, Q21_1, Q21_2, Q21_3, Q21_4, Q21_5, Q21_6, Q21_7,
    Q21_8, Q22_1, Q22_2, Q22_3, Q22_4, Q22_5, Q22_6, Q22_7, Q23_1,
    Q23_2, Q23_3, Q23_4, Q23_5, Q23_6, Q23_7, Q24_1, Q24_2, Q24_3,
    Q24_4, Q24_5, Q24_6, Q25_1, Q25_2, Q25_3, Q25_4, Q25_5, Q25_6, Q26,
    Q27, Q28, Q29, Q3_1, Q3_2, Q3_3, Q3_4, Q3_5, Q3_6, Q3_7, Q3_8,
    Q3_8N, Q3_9, Q3_OTHER, Q30, Q31, Q32_1, Q32_2, Q32_3, Q33_1,
    Q33_10, Q33_11, Q33_12, Q33_13, Q33_14, Q33_15, Q33_16, Q33_17,
    Q33_18, Q33_19, Q33_2, Q33_20, Q33_21, Q33_22, Q33_23, Q33_24,
    Q33_3, Q33_4, Q33_5, Q33_6, Q33_7, Q33_8, Q33_9, Q34, Q35, Q36,
    Q37, Q38, Q39, Q4_1, Q4_1_re, Q4_2, Q4_2_re, Q4_3, Q4_3_re, Q4_4,
    Q4_4_re, Q4_5, Q4_5_re, Q4_6, Q4_6_re, Q4_6N, Q4_re6N, Q40, Q41,
    Q42, Q43, Q44, Q45, Q46_1, Q46_2, Q46_3, Q47_1, Q47_2, Q47_3, Q5_1,
    Q5_2, Q5_3, Q5_4, Q5_5, Q5_5N, Q6, Q7, Q8, Q9_1, Q9_1_re, Q9_10,
    Q9_10_re, Q9_11, Q9_11_re, Q9_12, Q9_12_re, Q9_13, Q9_13_re, Q9_14,
    Q9_14_re, Q9_15, Q9_15_re, Q9_16, Q9_16_re, Q9_17, Q9_17_re, Q9_2,
    Q9_2_re, Q9_3, Q9_3_re, Q9_4, Q9_4_re, Q9_5, Q9_5_re, Q9_6,
    Q9_6_re, Q9_7, Q9_7_re, Q9_8, Q9_8_re, Q9_9, Q9_9_re, QEST, QTEST,
    Range.Scale.Risk.perception.overall., regione, REGISTRO,
    restrictive, Risk.overall.percentage, Risk.perception.overall,
    Risk_1_score, Risk_10_score, Risk_2_score, Risk_3_score,
    Risk_4_score, Risk_5_score, Risk_6_score, Risk_7_score,
    Risk_8_score, Risk_9_score, riskprofile, S3_re3, S5, S6,
    socialstatus, Standardize.Efficacy.online.marketing.,
    Standardize.Risk.perception.overall., style,
    Violent.Images.Risk_1_score., WEIGHT, Weights2, X
#Explore the dataset
view_df(FinalKids7)
Following 1 variables have only missing values and are not shown:
Q2_RE [250]
Data frame: FinalKids7
ID Name Label Values Value Labels
1 X range: 2401-3200
2 REGISTRO range: 8-3119
3 COUNTRY <output omitted>
4 IDC <output omitted>
5 QTEST range: 1-1
6 Age range: 25-62
7 Gender range: 1-2
8 numchild range: 1-5
9 parentstatus range: 1-2
10 S5 range: 1-3
11 S6 range: 1-1
12 Q1 range: 1-2
13 Q2 range: 6-14
14 Q3_1 <output omitted>
15 Q3_2 <output omitted>
16 Q3_3 <output omitted>
17 Q3_4 <output omitted>
18 Q3_5 <output omitted>
19 Q3_6 <output omitted>
20 Q3_7 <output omitted>
21 Q3_8 <output omitted>
22 Q3_9 <output omitted>
23 Q3_OTHER
24 Q4_1 range: 0-1
25 Q4_2 range: 0-1
26 Q4_3 range: 0-1
27 Q4_4 range: 0-1
28 Q4_5 range: 0-1
29 Q4_6 range: 0-1
30 Q5_1 range: 0-1
31 Q5_2 range: 0-1
32 Q5_3 range: 0-1
33 Q5_4 range: 0-1
34 Q5_5 range: 0-1
35 Q6 range: 0-1
36 Q7 range: 1-6
37 Q8 range: 1-6
38 Q9_1 range: 1-5
39 Q9_2 range: 1-5
40 Q9_3 range: 1-5
41 Q9_4 range: 1-5
42 Q9_5 range: 1-5
43 Q9_6 range: 1-5
44 Q9_7 range: 1-5
45 Q9_8 range: 1-5
46 Q9_9 range: 1-5
47 Q9_10 range: 1-5
48 Q9_11 range: 1-5
49 Q9_12 range: 1-5
50 Q9_13 range: 1-5
51 Q9_14 range: 1-5
52 Q9_15 range: 1-5
53 Q9_16 range: 1-5
54 Q9_17 range: 1-5
55 Q10_1 range: 1-5
56 Q10_2 range: 1-5
57 Q10_3 range: 1-5
58 Q10_4 range: 1-5
59 Q10_5 range: 1-5
60 Q10_6 range: 1-5
61 Q10_7 range: 1-5
62 Q10_8 range: 1-5
63 Q10_9 range: 1-5
64 Q10_10 range: 1-5
65 Q10_11 range: 1-5
66 Q10_12 range: 1-5
67 Q10_13 range: 1-5
68 Q10_14 range: 1-5
69 Q10_15 range: 1-5
70 Q10_16 range: 1-5
71 Q10_17 range: 1-5
72 Q10_18 range: 1-5
73 Q10_19 range: 1-5
74 Q10_20 range: 1-5
75 Q10_21 range: 1-5
76 Q10_22 range: 1-5
77 Q10_23 range: 1-5
78 Q10_24 range: 1-5
79 Q11_1 range: 1-5
80 Q11_2 range: 1-5
81 Q11_3 range: 1-5
82 Q11_4 range: 1-5
83 Q11_5 range: 1-5
84 Q12_1 range: 0-1
85 Q12_2 range: 0-1
86 Q12_3 range: 0-1
87 Q12_4 range: 0-1
88 Q12_5 range: 0-1
89 Q13_1 range: 1-3
90 Q13_2 range: 1-3
91 Q13_3 range: 1-3
92 Q13_4 range: 1-3
93 Q13_5 range: 1-3
94 Q13_6 range: 1-3
95 Q13_7 range: 1-3
96 Q13_8 range: 1-3
97 Q13_9 range: 1-3
98 Q13_10 range: 1-3
99 Q13_11 range: 1-3
100 Q13_12 range: 1-3
101 Q13_13 range: 1-3
102 Q13_14 range: 1-3
103 Q13_15 range: 1-3
104 Q13_16 range: 1-3
105 Q13_17 range: 1-3
106 Q14_1 range: 1-5
107 Q14_2 range: 1-5
108 Q14_3 range: 1-5
109 Q14_4 range: 1-5
110 Q14_5 range: 1-5
111 Q14_6 range: 1-5
112 Q14_7 range: 1-5
113 Q14_8 range: 1-5
114 Q15_1 range: 0-1
115 Q15_2 range: 0-1
116 Q15_3 range: 0-1
117 Q15_4 range: 0-1
118 Q15_5 range: 0-1
119 Q15_6 range: 0-1
120 Q15_7 range: 0-1
121 Q15_8 range: 0-1
122 Q15_9 range: 0-1
123 Q16_1 range: 1-5
124 Q16_2 range: 1-5
125 Q16_3 range: 1-5
126 Q16_4 range: 1-5
127 Q16_5 range: 1-5
128 Q16_6 range: 1-5
129 Q17_1 range: 1-7
130 Q17_2 range: 1-7
131 Q17_3 range: 1-7
132 Q17_4 range: 1-7
133 Q17_5 range: 1-7
134 Q17_6 range: 1-7
135 Q17_7 range: 1-7
136 Q17_8 range: 1-7
137 Q17_9 range: 1-7
138 Q17_10 range: 1-7
139 Q18_1 range: 1-7
140 Q18_2 range: 1-7
141 Q18_3 range: 1-7
142 Q18_4 range: 1-7
143 Q18_5 range: 1-7
144 Q18_6 range: 1-7
145 Q18_7 range: 1-7
146 Q18_8 range: 1-7
147 Q18_9 range: 1-7
148 Q18_10 range: 1-7
149 Q19_1 <output omitted>
150 Q19_2 <output omitted>
151 Q19_3 <output omitted>
152 Q19_4 <output omitted>
153 Q19_5 <output omitted>
154 Q19_6 <output omitted>
155 Q19_7 <output omitted>
156 Q19_8 <output omitted>
157 Q19_9 <output omitted>
158 Q19_10 <output omitted>
159 Q20_1 range: 0-1
160 Q20_2 range: 0-1
161 Q21_1 range: 1-4
162 Q21_2 range: 1-4
163 Q21_3 range: 1-4
164 Q21_4 range: 1-4
165 Q21_5 range: 1-4
166 Q21_6 range: 1-4
167 Q21_7 range: 1-4
168 Q21_8 range: 1-4
169 Q22_1 range: 0-1
170 Q22_2 range: 0-1
171 Q22_3 range: 0-1
172 Q22_4 range: 0-1
173 Q22_5 range: 0-1
174 Q22_6 range: 0-1
175 Q22_7 range: 0-1
176 Q23_1 range: 1-7
177 Q23_2 range: 1-7
178 Q23_3 range: 1-7
179 Q23_4 range: 1-7
180 Q23_5 range: 1-7
181 Q23_6 range: 1-7
182 Q23_7 range: 1-7
183 Q24_1 range: 0-1
184 Q24_2 range: 0-1
185 Q24_3 range: 0-1
186 Q24_4 range: 0-1
187 Q24_5 range: 0-1
188 Q24_6 range: 0-1
189 Q25_1 range: 1-7
190 Q25_2 range: 1-7
191 Q25_3 range: 1-7
192 Q25_4 range: 1-7
193 Q25_5 range: 1-7
194 Q25_6 range: 1-7
195 Q26 range: 1-7
196 Q27 range: 1-7
197 Q28 range: 1-7
198 Q29 range: 1-7
199 Q30 range: 1-7
200 Q31 range: 1-7
201 Q32_1 range: 1-7
202 Q32_2 range: 1-7
203 Q32_3 range: 1-7
204 Q33_1 range: 1-5
205 Q33_2 range: 1-5
206 Q33_3 range: 1-5
207 Q33_4 range: 1-5
208 Q33_5 range: 1-5
209 Q33_6 range: 1-5
210 Q33_7 range: 1-5
211 Q33_8 range: 1-5
212 Q33_9 range: 1-5
213 Q33_10 range: 1-5
214 Q33_11 range: 1-5
215 Q33_12 range: 1-5
216 Q33_13 range: 1-5
217 Q33_14 range: 1-5
218 Q33_15 range: 1-5
219 Q33_16 range: 1-5
220 Q33_17 range: 1-5
221 Q33_18 range: 1-5
222 Q33_19 range: 1-5
223 Q33_20 range: 1-5
224 Q33_21 range: 1-5
225 Q33_22 range: 1-5
226 Q33_23 range: 1-5
227 Q34 range: 6-40
228 Q35 <output omitted>
229 Q36 range: 1-4
230 Q37 range: 1-8
231 Q38 range: 1-6
232 Q39 range: 1-14
233 Q40 range: 1-10
234 Q41 range: 2-13
235 Q42 range: 0-4
236 Q43 range: 1-5
237 Q44 range: 0-2
238 Q45 range: 1-6
239 Q46_1 range: 1-7
240 Q46_2 range: 1-7
241 Q46_3 range: 1-7
242 Q47_1 range: 1-7
243 Q47_2 range: 1-7
244 Q47_3 range: 1-7
245 Q12_6 range: 1-1
246 Q20_3 range: 0-1
247 Q33_24 range: 1-5
248 CEDAD range: 1-3
249 QEST range: 2-3
251 WEIGHT range: 1.0-1.0
252 Q2_re2 range: 1-2
253 S3_re3 range: 1-3
254 Q3_8N range: 1-8
255 Q4_6N range: 0-46
256 Q5_5N range: 0-5
257 Q9_1_re range: 0-1
258 Q9_2_re range: 0-1
259 Q9_3_re range: 0-1
260 Q9_4_re range: 0-1
261 Q9_5_re range: 0-1
262 Q9_6_re range: 0-1
263 Q9_7_re range: 0-1
264 Q9_8_re range: 0-1
265 Q9_9_re range: 0-1
266 Q9_10_re range: 0-1
267 Q9_11_re range: 0-1
268 Q9_12_re range: 0-1
269 Q9_13_re range: 0-1
270 Q9_14_re range: 0-1
271 Q9_15_re range: 0-1
272 Q9_16_re range: 0-1
273 Q9_17_re range: 0-1
274 Q11_1_re range: 0-1
275 Q11_2_re range: 0-1
276 Q11_3_re range: 0-1
277 Q11_4_re range: 0-1
278 Q11_5_re range: 0-1
279 Q11_5N range: 0-5
280 Q12_5N range: 0-5
281 Q14_1_re range: 0-1
282 Q14_2_re range: 0-1
283 Q14_3_re range: 0-1
284 Q14_4_re range: 0-1
285 Q14_5_re range: 0-1
286 Q14_6_re range: 0-1
287 Q14_7_re range: 0-1
288 Q14_8_re range: 0-1
289 Q14_8N range: 1-8
290 Q15_9N range: 0-81
291 Q16_1_re range: 0-1
292 Q16_2_re range: 0-1
293 Q16_3_re range: 0-1
294 Q16_4_re range: 0-1
295 Q16_5_re range: 0-1
296 Q16_6_re range: 0-1
297 Q16_6N range: 0-6
298 Q4_1_re range: 0-1
299 Q4_2_re range: 0-1
300 Q4_3_re range: 0-1
301 Q4_4_re range: 0-1
302 Q4_5_re range: 0-1
303 Q4_6_re range: 0-1
304 Q4_re6N range: 0-6
305 Weights2 range: 1.0-1.0
306 filter_. range: 0-0
307 age_re2 range: 1-3
308 Risk_1_score range: 1-49
309 Risk_2_score range: 1-49
310 Risk_3_score range: 1-49
311 Risk_4_score range: 1-49
312 Risk_5_score range: 1-49
313 Risk_6_score range: 1-49
314 Risk_7_score range: 1-49
315 Risk_8_score range: 1-49
316 Risk_9_score range: 1-49
317 Risk_10_score range: 1-49
318 Risk.perception.overall range: 10-490
319 Risk.overall.percentage range: 2.0-98.0
320 Efficacy.online.threat range: 1-49
321 Efficacy.online.marketing range: 1-49
322 Violent.Images.Risk_1_score. range: 0.0-1.0
323 Exposed.to.targeted.Ads.Risk_2_score. range: 0.0-1.0
324 Bullied.online.Risk_3_score. range: 0.0-1.0
325 Money.in.games.in.app.Risk_4_score. range: 0.0-1.0
326 Incentives.in.app.Risk_5_score. range: 0.0-1.0
327 Hidden.ads.Advergames.Risk_6_score. range: 0.0-1.0
328 Data.Tracking.Risk_7_score. range: 0.0-1.0
329 Ads.unhealhty.lifestyle.Risk_8_score. range: 0.0-1.0
330 Ads.unhealthy.food.Risk_9_score. range: 0.0-1.0
331 Digital.identity.theft.fraud.Risk_10_score. range: 0.0-1.0
332 Range.Scale.Risk.perception.overall. range: 0.0-1.0
333 Standardize.Risk.perception.overall. range: -2.0-2.1
334 Standardize.Efficacy.online.marketing. range: -1.2-2.9
335 Child.Age.3.groups <output omitted>
336 Education.two.groups <output omitted>
337 Parenting.mediation.styles
338 GenderTrue <output omitted>
339 ParentStF <output omitted>
340 digskillsparents range: 1.6-9.0
341 digskillschildren range: 1.0-99.0
342 poperational range: 1.0-9.0
343 pinfonav range: 1.4-9.0
344 psocialskills range: 1.4-9.0
345 pcreativeskills range: 1.0-9.0
346 pmobileskills range: 1.0-9.0
347 enabling range: -6.1-3.8
348 restrictive range: -2.0-10.2
349 style <output omitted>
350 personalrisks range: 1.0-7.0
351 healthrisks range: 1.0-7.0
352 financialrisks range: 1.0-7.0
353 riskprofile <output omitted>
354 socialstatus range: 1-10
355 regione <output omitted>
# Correlation and factor analysis of risks using perceived harm only
riskdataharm <- data.frame(Q17_1, Q17_2, Q17_3, Q17_4, Q17_5, Q17_6, Q17_7, Q17_8, Q17_9, Q17_10)
riskdataharmCOR <- cor (riskdataharm, use="complete", method = "pearson")
corrplot(riskdataharmCOR, method = "number", order = "AOE", type="lower")

#Perform Factor Analysis (from previous analysis we know that we should have 3 factors)
farisksharm <- fa(riskdataharm,3, rotate  = "oblimin", scores = "regression", missing=T,
                  impute = "median", fm="minres" )
(farisksharm)
Factor Analysis using method =  minres
Call: fa(r = riskdataharm, nfactors = 3, rotate = "oblimin", scores = "regression", 
    missing = T, impute = "median", fm = "minres")
Standardized loadings (pattern matrix) based upon correlation matrix
         MR1   MR2   MR3   h2   u2 com
Q17_1   0.85  0.03 -0.07 0.68 0.32 1.0
Q17_2   0.05  0.64  0.04 0.50 0.50 1.0
Q17_3   0.72  0.05  0.06 0.64 0.36 1.0
Q17_4   0.25 -0.08  0.66 0.64 0.36 1.3
Q17_5  -0.02  0.17  0.76 0.78 0.22 1.1
Q17_6   0.02  0.59  0.28 0.68 0.32 1.4
Q17_7   0.54  0.10  0.20 0.61 0.39 1.3
Q17_8   0.60  0.32  0.01 0.75 0.25 1.5
Q17_9   0.10  0.74  0.00 0.66 0.34 1.0
Q17_10  0.83 -0.07  0.08 0.71 0.29 1.0

                       MR1  MR2  MR3
SS loadings           3.15 1.91 1.60
Proportion Var        0.31 0.19 0.16
Cumulative Var        0.31 0.51 0.67
Proportion Explained  0.47 0.29 0.24
Cumulative Proportion 0.47 0.76 1.00

 With factor correlations of 
     MR1  MR2  MR3
MR1 1.00 0.69 0.76
MR2 0.69 1.00 0.70
MR3 0.76 0.70 1.00

Mean item complexity =  1.2
Test of the hypothesis that 3 factors are sufficient.

df null model =  45  with the objective function =  6.87 with Chi Square =  5461.88
df of  the model are 18  and the objective function was  0.24 

The root mean square of the residuals (RMSR) is  0.02 
The df corrected root mean square of the residuals is  0.04 

The harmonic n.obs is  781 with the empirical chi square  38.64  with prob <  0.0032 
The total n.obs was  800  with Likelihood Chi Square =  190.28  with prob <  8.7e-31 

Tucker Lewis Index of factoring reliability =  0.92
RMSEA index =  0.109  and the 90 % confidence intervals are  0.096 0.124
BIC =  69.96
Fit based upon off diagonal values = 1
Measures of factor score adequacy             
                                                   MR1  MR2  MR3
Correlation of (regression) scores with factors   0.95 0.92 0.93
Multiple R square of scores with factors          0.91 0.84 0.86
Minimum correlation of possible factor scores     0.82 0.69 0.72
farisksharm$uniquenesses
    Q17_1     Q17_2     Q17_3     Q17_4     Q17_5     Q17_6     Q17_7     Q17_8 
0.3247476 0.4994333 0.3586525 0.3569362 0.2188308 0.3189625 0.3899694 0.2519643 
    Q17_9    Q17_10 
0.3407855 0.2856545 
farisksharm$loadings

Loadings:
       MR1    MR2    MR3   
Q17_1   0.853              
Q17_2          0.643       
Q17_3   0.721              
Q17_4   0.249         0.656
Q17_5          0.175  0.765
Q17_6          0.593  0.276
Q17_7   0.542  0.101  0.198
Q17_8   0.599  0.325       
Q17_9          0.744       
Q17_10  0.833              

                 MR1   MR2   MR3
SS loadings    2.668 1.480 1.147
Proportion Var 0.267 0.148 0.115
Cumulative Var 0.267 0.415 0.530
#Interpret the factors of risks using the sheet with variables descriptions


#Q2 = Age of child
#Q26 = Perception of Control and Q27 = Perception of Easyness 
#Style= Parental style of managing children access to the online.



#Model based recursive partionioning

mob_obj <- mob(personalrisks ~ ParentStF  + socialstatus +Q2 + 
                 digskillsparents+Q26+Q27|Age + Child.Age.3.groups, data = FinalKids7,model = linearModel)
print(mob_obj)
1) Age <= 37; criterion = 0.999, statistic = 33.995
  2)*  weights = 174 
Terminal node model
Linear model with coefficients:
     (Intercept)  ParentStFParents      socialstatus                Q2  
         2.69460           0.47142          -0.02872           0.03012  
digskillsparents               Q26               Q27  
         0.61645          -0.23748           0.16477  

1) Age > 37
  3)*  weights = 551 
Terminal node model
Linear model with coefficients:
     (Intercept)  ParentStFParents      socialstatus                Q2  
        5.217362         -0.246169         -0.052174         -0.005624  
digskillsparents               Q26               Q27  
        0.277386         -0.069907          0.151011  

Fit the model-based recursive partitioning tree. Using linear regression as the model (lm)