#Načítanie a príprava dát Dataset obsahuje informácie o jednotlivých cestách – destináciu, trvanie, vek cestujúceho, pohlavie, národnosť, typ ubytovania a náklady na ubytovanie a dopravu. Numerické premenné použité v datasete sú: Duration (days) – dĺžka cesty v dňoch Traveler age – vek cestujúceho Accommodation cost – cena ubytovania Transportation cost – cena dopravy
'data.frame': 137 obs. of 13 variables:
$ Trip.ID : int 1 2 3 4 5 6 7 8 9 10 ...
$ Destination : chr "London, UK" "Phuket, Thailand" "Bali, Indonesia" "New York, USA" ...
$ Start.date : chr "01/05/2023" "15/06/2023" "01/07/2023" "15/08/2023" ...
$ End.date : chr "08/05/2023" "20/06/2023" "08/07/2023" "29/08/2023" ...
$ Duration..days. : int 7 5 7 14 7 5 10 7 7 7 ...
$ Traveler.name : chr "John Smith" "Jane Doe" "David Lee" "Sarah Johnson" ...
$ Traveler.age : int 35 28 45 29 26 42 33 25 31 39 ...
$ Traveler.gender : chr "Male" "Female" "Male" "Female" ...
$ Traveler.nationality: chr "American" "Canadian" "Korean" "British" ...
$ Accommodation.type : chr "Hotel" "Resort" "Villa" "Hotel" ...
$ Accommodation.cost : int 1200 800 1000 2000 700 1500 500 900 1200 2500 ...
$ Transportation.type : chr "Flight" "Flight" "Flight" "Flight" ...
$ Transportation.cost : int 600 500 700 1000 200 800 1200 600 200 800 ...
#Prevod typov premenných
library(dplyr)
#Výber numerických premenných
travel.num <- travel %>%
select(Duration..days., Traveler.age, Accommodation.cost, Transportation.cost)
#Základné štatistiky
library(knitr)
library(kableExtra)
summary_stats <- travel.num %>%
summarise(
n = n(),
mean_duration = mean(Duration..days., na.rm = TRUE),
sd_duration = sd(Duration..days., na.rm = TRUE),
mean_age = mean(Traveler.age, na.rm = TRUE),
sd_age = sd(Traveler.age, na.rm = TRUE),
mean_accommodation = mean(Accommodation.cost, na.rm = TRUE),
sd_accommodation = sd(Accommodation.cost, na.rm = TRUE),
mean_transport = mean(Transportation.cost, na.rm = TRUE),
sd_transport = sd(Transportation.cost, na.rm = TRUE)
)
kable(summary_stats, digits = 2, caption = "Základné štatistiky cestovateľských údajov") %>%
kable_styling(full_width = FALSE, bootstrap_options = c("striped", "hover", "condensed"))
n | mean_duration | sd_duration | mean_age | sd_age | mean_accommodation | sd_accommodation | mean_transport | sd_transport |
---|---|---|---|---|---|---|---|---|
137 | 7.61 | 1.6 | 33.18 | 7.15 | 1245.11 | 1337.35 | 645.18 | 584.48 |
#Grafy ##Scatterplot Graf ukazuje, ako spolu súvisia výdavky na ubytovanie a dopravu — destinácie s vyššími nákladmi na ubytovanie majú často aj vyššie dopravné náklady.
ggplot(travel, aes(x = Accommodation.cost, y = Transportation.cost)) +
geom_point(color = "steelblue", size = 3, alpha = 0.7) +
labs(title = "Vzťah medzi nákladmi na ubytovanie a dopravou",
x = "Náklady na ubytovanie (€)",
y = "Náklady na dopravu (€)") +
theme_minimal()
##Boxplot Boxplot ukazuje, že hotely a rezorty majú spravidla vyššie mediánové ceny ako hostely či apartmány. Vidno aj prítomnosť niekoľkých extrémnych hodnôt (luxusnejšie pobyty).
ggplot(travel, aes(x = Accommodation.type, y = Accommodation.cost, fill = Accommodation.type)) +
geom_boxplot(alpha = 0.7) +
labs(title = "Rozdelenie nákladov na ubytovanie podľa typu",
x = "Typ ubytovania",
y = "Cena ubytovania (€)") +
theme_minimal() +
theme(legend.position = "none")
#Testovanie hypotéz ##T-test: Rozdiel v nákladoch na ubytovanie medzi mužmi a ženami Test zisťuje, či existuje štatisticky významný rozdiel v nákladoch na ubytovanie medzi pohlaviami.
t.test(
travel$Accommodation.cost[travel$Traveler.gender == "Male"],
travel$Accommodation.cost[travel$Traveler.gender == "Female"]
)
Welch Two Sample t-test
data: travel$Accommodation.cost[travel$Traveler.gender == "Male"] and travel$Accommodation.cost[travel$Traveler.gender == "Female"]
t = -0.23362, df = 126.58, p-value = 0.8157
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-504.1316 397.6667
sample estimates:
mean of x mean of y
1217.910 1271.143
##ANOVA: Rozdiel v nákladoch na dopravu podľa typu dopravy ANOVA testuje, či sa priemerné náklady na dopravu líšia medzi rôznymi typmi dopravy (vlak, lietadlo, autobus atď.).
Df Sum Sq Mean Sq F value Pr(>F)
Transportation.type 8 31339350 3917419 33.66 <2e-16 ***
Residuals 127 14778320 116365
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
1 observation deleted due to missingness
##Lineárna regresia: Predikcia nákladov na dopravu Model skúma, ako dĺžka pobytu, vek cestovateľa a náklady na ubytovanie ovplyvňujú výšku dopravných nákladov.
model <- lm(Transportation.cost ~ Duration..days. + Traveler.age + Accommodation.cost, data = travel)
summary(model)
Call:
lm(formula = Transportation.cost ~ Duration..days. + Traveler.age +
Accommodation.cost, data = travel)
Residuals:
Min 1Q Median 3Q Max
-1113.24 -264.22 -69.84 226.15 1121.18
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -185.37312 224.60543 -0.825 0.4107
Duration..days. 38.91657 19.51562 1.994 0.0482 *
Traveler.age 2.91813 4.35678 0.670 0.5042
Accommodation.cost 0.34921 0.02327 15.005 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 358.8 on 132 degrees of freedom
(1 observation deleted due to missingness)
Multiple R-squared: 0.6315, Adjusted R-squared: 0.6232
F-statistic: 75.41 on 3 and 132 DF, p-value: < 2.2e-16
#Heatmapa korelačnej matice Heatmapa vizualizuje korelácie medzi numerickými premennými. Napríklad silná korelácia medzi nákladmi na ubytovanie a dopravou môže naznačovať luxusnejší štýl cestovania.
Error in install.packages : Updating loaded packages
library(corrplot)
cor_matrix <- cor(travel.num, use = "complete.obs")
corrplot(cor_matrix, method = "color", type = "upper", tl.col = "black", tl.srt = 45)
This is an R Markdown Notebook. When you execute code within the notebook, the results appear beneath the code.
Try executing this chunk by clicking the Run button within the chunk or by placing your cursor inside it and pressing Ctrl+Shift+Enter.
Add a new chunk by clicking the Insert Chunk button on the toolbar or by pressing Ctrl+Alt+I.
When you save the notebook, an HTML file containing the code and output will be saved alongside it (click the Preview button or press Ctrl+Shift+K to preview the HTML file).
The preview shows you a rendered HTML copy of the contents of the editor. Consequently, unlike Knit, Preview does not run any R code chunks. Instead, the output of the chunk when it was last run in the editor is displayed.