Data Dive 3: Group_by analysis

Visualization:

Now we can plot this smaller group alone with just positive cases of new customers:

# Plot 1: Product Related Duration vs Bounce Rates
plot1 <- ggplot(df_product_duration |> filter(Revenue == TRUE, VisitorType == "New_Visitor"), 
                aes(x = ProductRelated_Duration, y = BounceRates)) +
  geom_smooth(method = "lm", formula = y ~ poly(x, 2), se = FALSE) +
  labs(title = "Bounce vs. Product",
       x = "Product Related Duration",
       y = "Bounce Rates") +
  scale_y_continuous(limits = c(0, NA)) +
  theme_minimal()

# Plot 2: Informational Duration vs Bounce Rates
plot2 <- ggplot(df_info_duration |> filter(Revenue == TRUE, VisitorType == "New_Visitor"), 
                aes(x = Informational_Duration, y = BounceRates)) +
  geom_smooth(method = "lm", formula = y ~ poly(x, 2), se = FALSE) +
  labs(title = "Bounce vs. Informational",
       x = "Informational Duration",
       y = "Bounce Rates") +
  scale_y_continuous(limits = c(0, NA)) +
  theme_minimal()

# Plot 3: Administrative Duration vs Bounce Rates
plot3 <- ggplot(df_admin_duration |> filter(Revenue == TRUE, VisitorType == "New_Visitor"), 
                aes(x = Administrative_Duration, y = BounceRates)) +
  geom_smooth(method = "lm", formula = y ~ poly(x, 2), se = FALSE) +
  labs(title = "Bounce vs. Admin",
       x = "Administrative Duration",
       y = "Bounce Rates") +
  scale_y_continuous(limits = c(0, NA)) +
  theme_minimal()

# Plot 4: Product Related Duration vs Exit Rates
plot4 <- ggplot(df_product_duration |> filter(Revenue == TRUE, VisitorType == "New_Visitor"), 
                aes(x = ProductRelated_Duration, y = ExitRates)) +
  geom_smooth(method = "lm", formula = y ~ poly(x, 2), se = FALSE) +
  labs(title = "Exit vs. Product",
       x = "Product Related Duration",
       y = "Exit Rates") +
  scale_y_continuous(limits = c(0, NA)) +
  theme_minimal()

# Plot 5: Informational Duration vs Exit Rates
plot5 <- ggplot(df_info_duration |> filter(Revenue == TRUE, VisitorType == "New_Visitor"), 
                aes(x = Informational_Duration, y = ExitRates)) +
  geom_smooth(method = "lm", formula = y ~ poly(x, 2), se = FALSE) +
  labs(title = "Exit vs. Informational",
       x = "Informational Duration",
       y = "Exit Rates") +
  scale_y_continuous(limits = c(0, NA)) +
  theme_minimal()

# Plot 6: Administrative Duration vs Exit Rates
plot6 <- ggplot(df_admin_duration |> filter(Revenue == TRUE, VisitorType == "New_Visitor"), 
                aes(x = Administrative_Duration, y = ExitRates)) +
  geom_smooth(method = "lm", formula = y ~ poly(x, 2), se = FALSE) +
  labs(title = "Exit vs. Admin",
       x = "Administrative Duration",
       y = "Exit Rates") +
  scale_y_continuous(limits = c(0, NA)) +
  theme_minimal()

# Combine plots into a grid
Exit_vs_bounce3 <- grid.arrange(plot1, plot2, plot3, plot4, plot5, plot6,
                                ncol = 3,
                                top = "Rates Vs Duration (All Positive Cases, New Visitor only)"
)
## Warning: Removed 15 rows containing missing values (`geom_smooth()`).

print(Exit_vs_bounce3)
## TableGrob (3 x 3) "arrange": 7 grobs
##   z     cells    name                grob
## 1 1 (2-2,1-1) arrange      gtable[layout]
## 2 2 (2-2,2-2) arrange      gtable[layout]
## 3 3 (2-2,3-3) arrange      gtable[layout]
## 4 4 (3-3,1-1) arrange      gtable[layout]
## 5 5 (3-3,2-2) arrange      gtable[layout]
## 6 6 (3-3,3-3) arrange      gtable[layout]
## 7 7 (1-1,1-3) arrange text[GRID.text.689]

Final Hypothesis: New customers who spend more time on the site are more engaged and find what they need, leading to lower bounce and exit rates. This could be due to a higher relevance of content or products, better site usability, or targeted marketing efforts. In contrast, the general customer base, which includes both new and returning customers, might exhibit different engagement patterns based on their familiarity with the site and their specific needs.