Decision Trees

Author

Giuseppe A. Veltri

Decision Trees in R

A decision tree is a type of supervised machine learning used to categorize or make predictions based on how a previous set of questions were answered. The model is a form of supervised learning, meaning that the model is trained and tested on a set of data that contains the desired categorization.

The decision tree may not always provide a clear-cut answer or decision. Instead, it may present options so the data scientist can make an informed decision on their own. Decision trees imitate human thinking, so it’s generally easy for data scientists to understand and interpret the results.

Let’s define some key terms of a decision tree.

Root node: The base of the decision tree.
Splitting: The process of dividing a node into multiple sub-nodes.
Decision node: When a sub-node is further split into additional sub-nodes.
Leaf node: When a sub-node does not further split into additional sub-nodes; represents possible outcomes.
Pruning: The process of removing sub-nodes of a decision tree.
Branch: A subsection of the decision tree consisting of multiple nodes.

A decision tree resembles, well, a tree. The base of the tree is the root node. From the root node flows a series of decision nodes that depict decisions to be made. From the decision nodes are leaf nodes that represent the consequences of those decisions. Each decision node represents a question or split point, and the leaf nodes that stem from a decision node represent the possible answers. Leaf nodes sprout from decision nodes similar to how a leaf sprouts on a tree branch. This is why we call each subsection of a decision tree a “branch.” Let’s take a look at an example for this. You’re a golfer, and a consistent one at that. On any given day you want to predict where your score will be in two buckets: below par or over par.

There are two main types of decision trees: categorical and continuous. The divisions are based on the type of outcome variables used.

Categorical Variable Decision Tree

In a categorical variable decision tree, the answer neatly fits into one category or another. Was the coin toss heads or tails? Is the animal a reptile or a mammal? In this type of decision tree, data is placed into a single category based on the decisions at the nodes throughout the tree.

Continuous Variable Decision Tree or Regression Tree

A continuous variable decision tree is one where there is not a simple yes or no answer. It’s also known as a regression tree because the decision or outcome variable depends on other decisions farther up the tree or the type of choice involved in the decision.

The benefit of a continuous variable decision tree is that the outcome can be predicted based on multiple variables rather than on a single variable as in a categorical variable decision tree. Continuous variable decision trees are used to create predictions. The system can be used for both linear and non-linear relationships if the correct algorithm is selected.

Classification tree

When you click the Render button a document will be generated that includes both content and the output of embedded code. You can embed code like this:

# Load the party package. It will automatically load other dependent packages.

# Install the packages if not already installed
if (!requireNamespace("rpart", quietly = TRUE)) {
  install.packages("rpart")
}

if (!requireNamespace("rpart.plot", quietly = TRUE)) {
  install.packages("rpart.plot")
}

# Load the packages
library(rpart)
library(rpart.plot)

Print some records from data set readingSkills and Create the input data frame.

data(iris)

# Set the seed for reproducibility
set.seed(123)

# Shuffle the dataset
shuffled_iris <- iris[sample(nrow(iris)), ]

# Split the dataset into 70% training and 30% testing
train_index <- 1:round(0.7 * nrow(shuffled_iris))
train_data <- shuffled_iris[train_index, ]
test_data <- shuffled_iris[-train_index, ]

# Create the classification tree
tree_model <- rpart(Species ~ ., data = train_data, method = "class")

# Plot the tree
rpart.plot(tree_model, extra = 1)

# Make predictions
predicted_species <- predict(tree_model, test_data, type = "class")

# Calculate accuracy
accuracy <- mean(predicted_species == test_data$Species)
cat("Accuracy:", accuracy, "\n")

Accuracy: 0.9777778

Regression tree

# Load the mtcars dataset
data(mtcars)

# View the first few rows of the dataset
head(mtcars)

                   mpg cyl disp  hp drat    wt  qsec vs am gear carb
Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

# Set a seed for reproducibility
set.seed(123)

# Split the data into training (70%) and testing (30%) sets
train_indices <- sample(1:nrow(mtcars), 0.7 * nrow(mtcars))
train_data <- mtcars[train_indices, ]
test_data <- mtcars[-train_indices, ]

# Fit the regression tree model using rpart
regression_tree <- rpart(mpg ~ ., data = train_data, method = "anova")

# Print the regression tree model
print(regression_tree)

n= 22 

node), split, n, deviance, yval
      * denotes terminal node

1) root 22 933.87320 21.05909  
  2) cyl>=5 12  72.94917 15.95833 *
  3) cyl< 5 10 174.05600 27.18000 *

# Plot the regression tree using rpart.plot
rpart.plot(regression_tree, type = 3, box.palette = "RdBu", shadow.col = "gray", nn = TRUE)

# Make predictions on the test data
predictions <- predict(regression_tree, test_data)

# Calculate mean squared error (MSE)
mse <- mean((test_data$mpg - predictions)^2)
print(paste("Mean squared error:", mse))

[1] "Mean squared error: 16.7763025"

Model based recursive partitioning

# Install and load the necessary packages
library(partykit)

Caricamento del pacchetto richiesto: grid

Caricamento del pacchetto richiesto: libcoin

Caricamento del pacchetto richiesto: mvtnorm

# Load an example dataset
###Import Data, the CSV file should be in your R working directory####
FinalKids7 <- read.csv(file="KidsITA7.csv", header = T) 
attach(FinalKids7)

###### Libraries ########
library(foreign)
library(corrplot)

corrplot 0.92 loaded

library(REdaS)
library(psych)
library(QuantPsyc)

Caricamento del pacchetto richiesto: boot


Caricamento pacchetto: 'boot'

Il seguente oggetto è mascherato da 'package:psych':

    logit

Caricamento del pacchetto richiesto: dplyr


Caricamento pacchetto: 'dplyr'

I seguenti oggetti sono mascherati da 'package:stats':

    filter, lag

I seguenti oggetti sono mascherati da 'package:base':

    intersect, setdiff, setequal, union

Caricamento del pacchetto richiesto: purrr

Caricamento del pacchetto richiesto: MASS


Caricamento pacchetto: 'MASS'

Il seguente oggetto è mascherato da 'package:dplyr':

    select


Caricamento pacchetto: 'QuantPsyc'

Il seguente oggetto è mascherato da 'package:base':

    norm

library(car)

Caricamento del pacchetto richiesto: carData


Caricamento pacchetto: 'car'

Il seguente oggetto è mascherato da 'package:purrr':

    some

Il seguente oggetto è mascherato da 'package:dplyr':

    recode

Il seguente oggetto è mascherato da 'package:boot':

    logit

Il seguente oggetto è mascherato da 'package:psych':

    logit

library(ggplot2)


Caricamento pacchetto: 'ggplot2'

I seguenti oggetti sono mascherati da 'package:psych':

    %+%, alpha

library(FactoMineR)
library(corrplot)
library(zoo)


Caricamento pacchetto: 'zoo'

I seguenti oggetti sono mascherati da 'package:base':

    as.Date, as.Date.numeric

library(rpart)
library(rpart.plot)
library(party)

Caricamento del pacchetto richiesto: modeltools

Caricamento del pacchetto richiesto: stats4


Caricamento pacchetto: 'modeltools'

Il seguente oggetto è mascherato da 'package:car':

    Predict

Caricamento del pacchetto richiesto: strucchange

Caricamento del pacchetto richiesto: sandwich


Caricamento pacchetto: 'party'

Il seguente oggetto è mascherato da 'package:dplyr':

    where

I seguenti oggetti sono mascherati da 'package:partykit':

    cforest, ctree, ctree_control, edge_simple, mob, mob_control,
    node_barplot, node_bivplot, node_boxplot, node_inner, node_surv,
    node_terminal, varimp

library(sjPlot)
library(sjmisc)


Caricamento pacchetto: 'sjmisc'

Il seguente oggetto è mascherato da 'package:purrr':

    is_empty

library(RCA)

Caricamento del pacchetto richiesto: igraph


Caricamento pacchetto: 'igraph'

Il seguente oggetto è mascherato da 'package:modeltools':

    clusters

I seguenti oggetti sono mascherati da 'package:purrr':

    compose, simplify

I seguenti oggetti sono mascherati da 'package:dplyr':

    as_data_frame, groups, union

I seguenti oggetti sono mascherati da 'package:stats':

    decompose, spectrum

Il seguente oggetto è mascherato da 'package:base':

    union

Caricamento del pacchetto richiesto: gplots


Caricamento pacchetto: 'gplots'

Il seguente oggetto è mascherato da 'package:stats':

    lowess

library(GPArotation)


Caricamento pacchetto: 'GPArotation'

I seguenti oggetti sono mascherati da 'package:psych':

    equamax, varimin

library(ggparty)

###Import Data, the CSV file should be in your R working directory####
FinalKids7 <- read.csv(file="KidsITA7.csv", header = T)
attach(FinalKids7)

I seguenti oggetti sono mascherati da FinalKids7 (pos = 29):

    Ads.unhealhty.lifestyle.Risk_8_score.,
    Ads.unhealthy.food.Risk_9_score., Age, age_re2,
    Bullied.online.Risk_3_score., CEDAD, Child.Age.3.groups, COUNTRY,
    Data.Tracking.Risk_7_score.,
    Digital.identity.theft.fraud.Risk_10_score., digskillschildren,
    digskillsparents, Education.two.groups, Efficacy.online.marketing,
    Efficacy.online.threat, enabling,
    Exposed.to.targeted.Ads.Risk_2_score., filter_., financialrisks,
    Gender, GenderTrue, healthrisks,
    Hidden.ads.Advergames.Risk_6_score., IDC,
    Incentives.in.app.Risk_5_score.,
    Money.in.games.in.app.Risk_4_score., numchild,
    Parenting.mediation.styles, parentstatus, ParentStF,
    pcreativeskills, personalrisks, pinfonav, pmobileskills,
    poperational, psocialskills, Q1, Q10_1, Q10_10, Q10_11, Q10_12,
    Q10_13, Q10_14, Q10_15, Q10_16, Q10_17, Q10_18, Q10_19, Q10_2,
    Q10_20, Q10_21, Q10_22, Q10_23, Q10_24, Q10_3, Q10_4, Q10_5, Q10_6,
    Q10_7, Q10_8, Q10_9, Q11_1, Q11_1_re, Q11_2, Q11_2_re, Q11_3,
    Q11_3_re, Q11_4, Q11_4_re, Q11_5, Q11_5_re, Q11_5N, Q12_1, Q12_2,
    Q12_3, Q12_4, Q12_5, Q12_5N, Q12_6, Q13_1, Q13_10, Q13_11, Q13_12,
    Q13_13, Q13_14, Q13_15, Q13_16, Q13_17, Q13_2, Q13_3, Q13_4, Q13_5,
    Q13_6, Q13_7, Q13_8, Q13_9, Q14_1, Q14_1_re, Q14_2, Q14_2_re,
    Q14_3, Q14_3_re, Q14_4, Q14_4_re, Q14_5, Q14_5_re, Q14_6, Q14_6_re,
    Q14_7, Q14_7_re, Q14_8, Q14_8_re, Q14_8N, Q15_1, Q15_2, Q15_3,
    Q15_4, Q15_5, Q15_6, Q15_7, Q15_8, Q15_9, Q15_9N, Q16_1, Q16_1_re,
    Q16_2, Q16_2_re, Q16_3, Q16_3_re, Q16_4, Q16_4_re, Q16_5, Q16_5_re,
    Q16_6, Q16_6_re, Q16_6N, Q17_1, Q17_10, Q17_2, Q17_3, Q17_4, Q17_5,
    Q17_6, Q17_7, Q17_8, Q17_9, Q18_1, Q18_10, Q18_2, Q18_3, Q18_4,
    Q18_5, Q18_6, Q18_7, Q18_8, Q18_9, Q19_1, Q19_10, Q19_2, Q19_3,
    Q19_4, Q19_5, Q19_6, Q19_7, Q19_8, Q19_9, Q2, Q2_RE, Q2_re2, Q20_1,
    Q20_2, Q20_3, Q21_1, Q21_2, Q21_3, Q21_4, Q21_5, Q21_6, Q21_7,
    Q21_8, Q22_1, Q22_2, Q22_3, Q22_4, Q22_5, Q22_6, Q22_7, Q23_1,
    Q23_2, Q23_3, Q23_4, Q23_5, Q23_6, Q23_7, Q24_1, Q24_2, Q24_3,
    Q24_4, Q24_5, Q24_6, Q25_1, Q25_2, Q25_3, Q25_4, Q25_5, Q25_6, Q26,
    Q27, Q28, Q29, Q3_1, Q3_2, Q3_3, Q3_4, Q3_5, Q3_6, Q3_7, Q3_8,
    Q3_8N, Q3_9, Q3_OTHER, Q30, Q31, Q32_1, Q32_2, Q32_3, Q33_1,
    Q33_10, Q33_11, Q33_12, Q33_13, Q33_14, Q33_15, Q33_16, Q33_17,
    Q33_18, Q33_19, Q33_2, Q33_20, Q33_21, Q33_22, Q33_23, Q33_24,
    Q33_3, Q33_4, Q33_5, Q33_6, Q33_7, Q33_8, Q33_9, Q34, Q35, Q36,
    Q37, Q38, Q39, Q4_1, Q4_1_re, Q4_2, Q4_2_re, Q4_3, Q4_3_re, Q4_4,
    Q4_4_re, Q4_5, Q4_5_re, Q4_6, Q4_6_re, Q4_6N, Q4_re6N, Q40, Q41,
    Q42, Q43, Q44, Q45, Q46_1, Q46_2, Q46_3, Q47_1, Q47_2, Q47_3, Q5_1,
    Q5_2, Q5_3, Q5_4, Q5_5, Q5_5N, Q6, Q7, Q8, Q9_1, Q9_1_re, Q9_10,
    Q9_10_re, Q9_11, Q9_11_re, Q9_12, Q9_12_re, Q9_13, Q9_13_re, Q9_14,
    Q9_14_re, Q9_15, Q9_15_re, Q9_16, Q9_16_re, Q9_17, Q9_17_re, Q9_2,
    Q9_2_re, Q9_3, Q9_3_re, Q9_4, Q9_4_re, Q9_5, Q9_5_re, Q9_6,
    Q9_6_re, Q9_7, Q9_7_re, Q9_8, Q9_8_re, Q9_9, Q9_9_re, QEST, QTEST,
    Range.Scale.Risk.perception.overall., regione, REGISTRO,
    restrictive, Risk.overall.percentage, Risk.perception.overall,
    Risk_1_score, Risk_10_score, Risk_2_score, Risk_3_score,
    Risk_4_score, Risk_5_score, Risk_6_score, Risk_7_score,
    Risk_8_score, Risk_9_score, riskprofile, S3_re3, S5, S6,
    socialstatus, Standardize.Efficacy.online.marketing.,
    Standardize.Risk.perception.overall., style,
    Violent.Images.Risk_1_score., WEIGHT, Weights2, X

#Explore the dataset
view_df(FinalKids7)

Following 1 variables have only missing values and are not shown:

Q2_RE [250]

Data frame: FinalKids7
ID	Name	Label	Values	Value Labels
1	X		range: 2401-3200
2	REGISTRO		range: 8-3119
3	COUNTRY			<output omitted>
4	IDC			<output omitted>
5	QTEST		range: 1-1
6	Age		range: 25-62
7	Gender		range: 1-2
8	numchild		range: 1-5
9	parentstatus		range: 1-2
10	S5		range: 1-3
11	S6		range: 1-1
12	Q1		range: 1-2
13	Q2		range: 6-14
14	Q3_1			<output omitted>
15	Q3_2			<output omitted>
16	Q3_3			<output omitted>
17	Q3_4			<output omitted>
18	Q3_5			<output omitted>
19	Q3_6			<output omitted>
20	Q3_7			<output omitted>
21	Q3_8			<output omitted>
22	Q3_9			<output omitted>
23	Q3_OTHER
24	Q4_1		range: 0-1
25	Q4_2		range: 0-1
26	Q4_3		range: 0-1
27	Q4_4		range: 0-1
28	Q4_5		range: 0-1
29	Q4_6		range: 0-1
30	Q5_1		range: 0-1
31	Q5_2		range: 0-1
32	Q5_3		range: 0-1
33	Q5_4		range: 0-1
34	Q5_5		range: 0-1
35	Q6		range: 0-1
36	Q7		range: 1-6
37	Q8		range: 1-6
38	Q9_1		range: 1-5
39	Q9_2		range: 1-5
40	Q9_3		range: 1-5
41	Q9_4		range: 1-5
42	Q9_5		range: 1-5
43	Q9_6		range: 1-5
44	Q9_7		range: 1-5
45	Q9_8		range: 1-5
46	Q9_9		range: 1-5
47	Q9_10		range: 1-5
48	Q9_11		range: 1-5
49	Q9_12		range: 1-5
50	Q9_13		range: 1-5
51	Q9_14		range: 1-5
52	Q9_15		range: 1-5
53	Q9_16		range: 1-5
54	Q9_17		range: 1-5
55	Q10_1		range: 1-5
56	Q10_2		range: 1-5
57	Q10_3		range: 1-5
58	Q10_4		range: 1-5
59	Q10_5		range: 1-5
60	Q10_6		range: 1-5
61	Q10_7		range: 1-5
62	Q10_8		range: 1-5
63	Q10_9		range: 1-5
64	Q10_10		range: 1-5
65	Q10_11		range: 1-5
66	Q10_12		range: 1-5
67	Q10_13		range: 1-5
68	Q10_14		range: 1-5
69	Q10_15		range: 1-5
70	Q10_16		range: 1-5
71	Q10_17		range: 1-5
72	Q10_18		range: 1-5
73	Q10_19		range: 1-5
74	Q10_20		range: 1-5
75	Q10_21		range: 1-5
76	Q10_22		range: 1-5
77	Q10_23		range: 1-5
78	Q10_24		range: 1-5
79	Q11_1		range: 1-5
80	Q11_2		range: 1-5
81	Q11_3		range: 1-5
82	Q11_4		range: 1-5
83	Q11_5		range: 1-5
84	Q12_1		range: 0-1
85	Q12_2		range: 0-1
86	Q12_3		range: 0-1
87	Q12_4		range: 0-1
88	Q12_5		range: 0-1
89	Q13_1		range: 1-3
90	Q13_2		range: 1-3
91	Q13_3		range: 1-3
92	Q13_4		range: 1-3
93	Q13_5		range: 1-3
94	Q13_6		range: 1-3
95	Q13_7		range: 1-3
96	Q13_8		range: 1-3
97	Q13_9		range: 1-3
98	Q13_10		range: 1-3
99	Q13_11		range: 1-3
100	Q13_12		range: 1-3
101	Q13_13		range: 1-3
102	Q13_14		range: 1-3
103	Q13_15		range: 1-3
104	Q13_16		range: 1-3
105	Q13_17		range: 1-3
106	Q14_1		range: 1-5
107	Q14_2		range: 1-5
108	Q14_3		range: 1-5
109	Q14_4		range: 1-5
110	Q14_5		range: 1-5
111	Q14_6		range: 1-5
112	Q14_7		range: 1-5
113	Q14_8		range: 1-5
114	Q15_1		range: 0-1
115	Q15_2		range: 0-1
116	Q15_3		range: 0-1
117	Q15_4		range: 0-1
118	Q15_5		range: 0-1
119	Q15_6		range: 0-1
120	Q15_7		range: 0-1
121	Q15_8		range: 0-1
122	Q15_9		range: 0-1
123	Q16_1		range: 1-5
124	Q16_2		range: 1-5
125	Q16_3		range: 1-5
126	Q16_4		range: 1-5
127	Q16_5		range: 1-5
128	Q16_6		range: 1-5
129	Q17_1		range: 1-7
130	Q17_2		range: 1-7
131	Q17_3		range: 1-7
132	Q17_4		range: 1-7
133	Q17_5		range: 1-7
134	Q17_6		range: 1-7
135	Q17_7		range: 1-7
136	Q17_8		range: 1-7
137	Q17_9		range: 1-7
138	Q17_10		range: 1-7
139	Q18_1		range: 1-7
140	Q18_2		range: 1-7
141	Q18_3		range: 1-7
142	Q18_4		range: 1-7
143	Q18_5		range: 1-7
144	Q18_6		range: 1-7
145	Q18_7		range: 1-7
146	Q18_8		range: 1-7
147	Q18_9		range: 1-7
148	Q18_10		range: 1-7
149	Q19_1			<output omitted>
150	Q19_2			<output omitted>
151	Q19_3			<output omitted>
152	Q19_4			<output omitted>
153	Q19_5			<output omitted>
154	Q19_6			<output omitted>
155	Q19_7			<output omitted>
156	Q19_8			<output omitted>
157	Q19_9			<output omitted>
158	Q19_10			<output omitted>
159	Q20_1		range: 0-1
160	Q20_2		range: 0-1
161	Q21_1		range: 1-4
162	Q21_2		range: 1-4
163	Q21_3		range: 1-4
164	Q21_4		range: 1-4
165	Q21_5		range: 1-4
166	Q21_6		range: 1-4
167	Q21_7		range: 1-4
168	Q21_8		range: 1-4
169	Q22_1		range: 0-1
170	Q22_2		range: 0-1
171	Q22_3		range: 0-1
172	Q22_4		range: 0-1
173	Q22_5		range: 0-1
174	Q22_6		range: 0-1
175	Q22_7		range: 0-1
176	Q23_1		range: 1-7
177	Q23_2		range: 1-7
178	Q23_3		range: 1-7
179	Q23_4		range: 1-7
180	Q23_5		range: 1-7
181	Q23_6		range: 1-7
182	Q23_7		range: 1-7
183	Q24_1		range: 0-1
184	Q24_2		range: 0-1
185	Q24_3		range: 0-1
186	Q24_4		range: 0-1
187	Q24_5		range: 0-1
188	Q24_6		range: 0-1
189	Q25_1		range: 1-7
190	Q25_2		range: 1-7
191	Q25_3		range: 1-7
192	Q25_4		range: 1-7
193	Q25_5		range: 1-7
194	Q25_6		range: 1-7
195	Q26		range: 1-7
196	Q27		range: 1-7
197	Q28		range: 1-7
198	Q29		range: 1-7
199	Q30		range: 1-7
200	Q31		range: 1-7
201	Q32_1		range: 1-7
202	Q32_2		range: 1-7
203	Q32_3		range: 1-7
204	Q33_1		range: 1-5
205	Q33_2		range: 1-5
206	Q33_3		range: 1-5
207	Q33_4		range: 1-5
208	Q33_5		range: 1-5
209	Q33_6		range: 1-5
210	Q33_7		range: 1-5
211	Q33_8		range: 1-5
212	Q33_9		range: 1-5
213	Q33_10		range: 1-5
214	Q33_11		range: 1-5
215	Q33_12		range: 1-5
216	Q33_13		range: 1-5
217	Q33_14		range: 1-5
218	Q33_15		range: 1-5
219	Q33_16		range: 1-5
220	Q33_17		range: 1-5
221	Q33_18		range: 1-5
222	Q33_19		range: 1-5
223	Q33_20		range: 1-5
224	Q33_21		range: 1-5
225	Q33_22		range: 1-5
226	Q33_23		range: 1-5
227	Q34		range: 6-40
228	Q35			<output omitted>
229	Q36		range: 1-4
230	Q37		range: 1-8
231	Q38		range: 1-6
232	Q39		range: 1-14
233	Q40		range: 1-10
234	Q41		range: 2-13
235	Q42		range: 0-4
236	Q43		range: 1-5
237	Q44		range: 0-2
238	Q45		range: 1-6
239	Q46_1		range: 1-7
240	Q46_2		range: 1-7
241	Q46_3		range: 1-7
242	Q47_1		range: 1-7
243	Q47_2		range: 1-7
244	Q47_3		range: 1-7
245	Q12_6		range: 1-1
246	Q20_3		range: 0-1
247	Q33_24		range: 1-5
248	CEDAD		range: 1-3
249	QEST		range: 2-3
251	WEIGHT		range: 1.0-1.0
252	Q2_re2		range: 1-2
253	S3_re3		range: 1-3
254	Q3_8N		range: 1-8
255	Q4_6N		range: 0-46
256	Q5_5N		range: 0-5
257	Q9_1_re		range: 0-1
258	Q9_2_re		range: 0-1
259	Q9_3_re		range: 0-1
260	Q9_4_re		range: 0-1
261	Q9_5_re		range: 0-1
262	Q9_6_re		range: 0-1
263	Q9_7_re		range: 0-1
264	Q9_8_re		range: 0-1
265	Q9_9_re		range: 0-1
266	Q9_10_re		range: 0-1
267	Q9_11_re		range: 0-1
268	Q9_12_re		range: 0-1
269	Q9_13_re		range: 0-1
270	Q9_14_re		range: 0-1
271	Q9_15_re		range: 0-1
272	Q9_16_re		range: 0-1
273	Q9_17_re		range: 0-1
274	Q11_1_re		range: 0-1
275	Q11_2_re		range: 0-1
276	Q11_3_re		range: 0-1
277	Q11_4_re		range: 0-1
278	Q11_5_re		range: 0-1
279	Q11_5N		range: 0-5
280	Q12_5N		range: 0-5
281	Q14_1_re		range: 0-1
282	Q14_2_re		range: 0-1
283	Q14_3_re		range: 0-1
284	Q14_4_re		range: 0-1
285	Q14_5_re		range: 0-1
286	Q14_6_re		range: 0-1
287	Q14_7_re		range: 0-1
288	Q14_8_re		range: 0-1
289	Q14_8N		range: 1-8
290	Q15_9N		range: 0-81
291	Q16_1_re		range: 0-1
292	Q16_2_re		range: 0-1
293	Q16_3_re		range: 0-1
294	Q16_4_re		range: 0-1
295	Q16_5_re		range: 0-1
296	Q16_6_re		range: 0-1
297	Q16_6N		range: 0-6
298	Q4_1_re		range: 0-1
299	Q4_2_re		range: 0-1
300	Q4_3_re		range: 0-1
301	Q4_4_re		range: 0-1
302	Q4_5_re		range: 0-1
303	Q4_6_re		range: 0-1
304	Q4_re6N		range: 0-6
305	Weights2		range: 1.0-1.0
306	filter_.		range: 0-0
307	age_re2		range: 1-3
308	Risk_1_score		range: 1-49
309	Risk_2_score		range: 1-49
310	Risk_3_score		range: 1-49
311	Risk_4_score		range: 1-49
312	Risk_5_score		range: 1-49
313	Risk_6_score		range: 1-49
314	Risk_7_score		range: 1-49
315	Risk_8_score		range: 1-49
316	Risk_9_score		range: 1-49
317	Risk_10_score		range: 1-49
318	Risk.perception.overall		range: 10-490
319	Risk.overall.percentage		range: 2.0-98.0
320	Efficacy.online.threat		range: 1-49
321	Efficacy.online.marketing		range: 1-49
322	Violent.Images.Risk_1_score.		range: 0.0-1.0
323	Exposed.to.targeted.Ads.Risk_2_score.		range: 0.0-1.0
324	Bullied.online.Risk_3_score.		range: 0.0-1.0
325	Money.in.games.in.app.Risk_4_score.		range: 0.0-1.0
326	Incentives.in.app.Risk_5_score.		range: 0.0-1.0
327	Hidden.ads.Advergames.Risk_6_score.		range: 0.0-1.0
328	Data.Tracking.Risk_7_score.		range: 0.0-1.0
329	Ads.unhealhty.lifestyle.Risk_8_score.		range: 0.0-1.0
330	Ads.unhealthy.food.Risk_9_score.		range: 0.0-1.0
331	Digital.identity.theft.fraud.Risk_10_score.		range: 0.0-1.0
332	Range.Scale.Risk.perception.overall.		range: 0.0-1.0
333	Standardize.Risk.perception.overall.		range: -2.0-2.1
334	Standardize.Efficacy.online.marketing.		range: -1.2-2.9
335	Child.Age.3.groups			<output omitted>
336	Education.two.groups			<output omitted>
337	Parenting.mediation.styles
338	GenderTrue			<output omitted>
339	ParentStF			<output omitted>
340	digskillsparents		range: 1.6-9.0
341	digskillschildren		range: 1.0-99.0
342	poperational		range: 1.0-9.0
343	pinfonav		range: 1.4-9.0
344	psocialskills		range: 1.4-9.0
345	pcreativeskills		range: 1.0-9.0
346	pmobileskills		range: 1.0-9.0
347	enabling		range: -6.1-3.8
348	restrictive		range: -2.0-10.2
349	style			<output omitted>
350	personalrisks		range: 1.0-7.0
351	healthrisks		range: 1.0-7.0
352	financialrisks		range: 1.0-7.0
353	riskprofile			<output omitted>
354	socialstatus		range: 1-10
355	regione			<output omitted>

# Correlation and factor analysis of risks using perceived harm only
riskdataharm <- data.frame(Q17_1, Q17_2, Q17_3, Q17_4, Q17_5, Q17_6, Q17_7, Q17_8, Q17_9, Q17_10)
riskdataharmCOR <- cor (riskdataharm, use="complete", method = "pearson")
corrplot(riskdataharmCOR, method = "number", order = "AOE", type="lower")

#Perform Factor Analysis (from previous analysis we know that we should have 3 factors)
farisksharm <- fa(riskdataharm,3, rotate  = "oblimin", scores = "regression", missing=T,
                  impute = "median", fm="minres" )
(farisksharm)

Factor Analysis using method =  minres
Call: fa(r = riskdataharm, nfactors = 3, rotate = "oblimin", scores = "regression", 
    missing = T, impute = "median", fm = "minres")
Standardized loadings (pattern matrix) based upon correlation matrix
         MR1   MR2   MR3   h2   u2 com
Q17_1   0.85  0.03 -0.07 0.68 0.32 1.0
Q17_2   0.05  0.64  0.04 0.50 0.50 1.0
Q17_3   0.72  0.05  0.06 0.64 0.36 1.0
Q17_4   0.25 -0.08  0.66 0.64 0.36 1.3
Q17_5  -0.02  0.17  0.76 0.78 0.22 1.1
Q17_6   0.02  0.59  0.28 0.68 0.32 1.4
Q17_7   0.54  0.10  0.20 0.61 0.39 1.3
Q17_8   0.60  0.32  0.01 0.75 0.25 1.5
Q17_9   0.10  0.74  0.00 0.66 0.34 1.0
Q17_10  0.83 -0.07  0.08 0.71 0.29 1.0

                       MR1  MR2  MR3
SS loadings           3.15 1.91 1.60
Proportion Var        0.31 0.19 0.16
Cumulative Var        0.31 0.51 0.67
Proportion Explained  0.47 0.29 0.24
Cumulative Proportion 0.47 0.76 1.00

 With factor correlations of 
     MR1  MR2  MR3
MR1 1.00 0.69 0.76
MR2 0.69 1.00 0.70
MR3 0.76 0.70 1.00

Mean item complexity =  1.2
Test of the hypothesis that 3 factors are sufficient.

df null model =  45  with the objective function =  6.87 with Chi Square =  5461.88
df of  the model are 18  and the objective function was  0.24 

The root mean square of the residuals (RMSR) is  0.02 
The df corrected root mean square of the residuals is  0.04 

The harmonic n.obs is  781 with the empirical chi square  38.64  with prob <  0.0032 
The total n.obs was  800  with Likelihood Chi Square =  190.28  with prob <  8.7e-31 

Tucker Lewis Index of factoring reliability =  0.92
RMSEA index =  0.109  and the 90 % confidence intervals are  0.096 0.124
BIC =  69.96
Fit based upon off diagonal values = 1
Measures of factor score adequacy             
                                                   MR1  MR2  MR3
Correlation of (regression) scores with factors   0.95 0.92 0.93
Multiple R square of scores with factors          0.91 0.84 0.86
Minimum correlation of possible factor scores     0.82 0.69 0.72

farisksharm$uniquenesses

    Q17_1     Q17_2     Q17_3     Q17_4     Q17_5     Q17_6     Q17_7     Q17_8 
0.3247476 0.4994333 0.3586525 0.3569362 0.2188308 0.3189625 0.3899694 0.2519643 
    Q17_9    Q17_10 
0.3407855 0.2856545

farisksharm$loadings


Loadings:
       MR1    MR2    MR3   
Q17_1   0.853              
Q17_2          0.643       
Q17_3   0.721              
Q17_4   0.249         0.656
Q17_5          0.175  0.765
Q17_6          0.593  0.276
Q17_7   0.542  0.101  0.198
Q17_8   0.599  0.325       
Q17_9          0.744       
Q17_10  0.833              

                 MR1   MR2   MR3
SS loadings    2.668 1.480 1.147
Proportion Var 0.267 0.148 0.115
Cumulative Var 0.267 0.415 0.530

#Interpret the factors of risks using the sheet with variables descriptions


#Q2 = Age of child
#Q26 = Perception of Control and Q27 = Perception of Easyness 
#Style= Parental style of managing children access to the online.



#Model based recursive partionioning

mob_obj <- mob(personalrisks ~ ParentStF  + socialstatus +Q2 + 
                 digskillsparents+Q26+Q27|Age + Child.Age.3.groups, data = FinalKids7,model = linearModel)
print(mob_obj)

1) Age <= 37; criterion = 0.999, statistic = 33.995
  2)*  weights = 174 
Terminal node model
Linear model with coefficients:
     (Intercept)  ParentStFParents      socialstatus                Q2  
         2.69460           0.47142          -0.02872           0.03012  
digskillsparents               Q26               Q27  
         0.61645          -0.23748           0.16477  

1) Age > 37
  3)*  weights = 551 
Terminal node model
Linear model with coefficients:
     (Intercept)  ParentStFParents      socialstatus                Q2  
        5.217362         -0.246169         -0.052174         -0.005624  
digskillsparents               Q26               Q27  
        0.277386         -0.069907          0.151011

Fit the model-based recursive partitioning tree. Using linear regression as the model (lm)