R Markdown
Question
library(ggplot2)
# Read the CSV file (adjust the path as needed)
data <- read.csv("dataa.csv")
# Renaming columns for ease of use
colnames(data) <- c("Country", "Internet", "Facebook")
# Display the first few rows of the dataset
head(data)
## Country Internet Facebook
## 1 Argentina 55.80% 48.8%
## 2 Australia 82.40% 51.5%
## 3 Belgium 82% 44.2%
## 4 Brazil 49.90% 29.5%
## 5 Canada 86.80% 51.9%
## 6 Chile 61.40% 55.5%
# Convert Internet and Facebook columns to numeric (handling non-numeric values)
data$Internet <- as.numeric(gsub("%", "", data$Internet))
data$Facebook <- as.numeric(gsub("%", "", data$Facebook))
# Graphical Summary: Histogram for Internet Penetration
ggplot(data, aes(x = Internet)) +
geom_histogram(binwidth = 5, fill = "blue", color = "black", alpha = 0.7) +
labs(title = "Histogram of Internet Penetration", x = "Internet Penetration (%)", y = "Frequency")

# Graphical Summary: Histogram for Facebook Penetration
ggplot(data, aes(x = Facebook)) +
geom_histogram(binwidth = 5, fill = "green", color = "black", alpha = 0.7) +
labs(title = "Histogram of Facebook Penetration", x = "Facebook Penetration (%)", y = "Frequency")

# Numerical Summaries: Internet Penetration
internet_summary <- summary(data$Internet)
internet_sd <- sd(data$Internet)
# Numerical Summaries: Facebook Penetration
facebook_summary <- summary(data$Facebook)
facebook_sd <- sd(data$Facebook)
# Display summaries
print(internet_summary)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 12.60 43.65 56.90 59.16 81.25 94.00
print(internet_sd)
## [1] 22.38135
print(facebook_summary)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.10 24.38 34.45 33.82 47.08 56.40
print(facebook_sd)
## [1] 15.92896
# Scatterplot to display the relationship between Internet and Facebook penetration
ggplot(data, aes(x = Internet, y = Facebook)) +
geom_point(color = "purple") +
labs(title = "Scatterplot of Internet vs. Facebook Penetration", x = "Internet Penetration (%)", y = "Facebook Penetration (%)") +
theme_minimal()

# Conclusion : More internet penetration , more facebook user.
# Calculate the correlation coefficient
correlation_coefficient <- cor(data$Internet, data$Facebook)
print(correlation_coefficient)
## [1] 0.6108464
R = 0.6108, interpret the result
The value 0.6108 is a positive correlation, which means that there is a moderate positive linear relationship between Internet penetration and Facebook penetration in the dataset.
This suggests that, in general, countries with a higher Internet penetration rate tend to also have a higher Facebook penetration rate.
However, since the correlation is not very close to 1, it is not a strong linear relationship. There may be other factors influencing Facebook penetration beyond just Internet penetration, or the relationship might not be perfectly linear.
The correlation coefficient of 0.6108 indicates a moderate positive linear relationship between Internet penetration and Facebook penetration in the data. This suggests a tendency that countries with more Internet usage might also have a higher usage of Facebook, but it's not a strong or perfect relationship.