R Markdown

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:

library(ggplot2)
data <- read.csv("~/TranProject/mydataset.csv")
head(data)
##        City    Country     Continent Population..2024. Population..2023.
## 1     Tokyo      Japan          Asia          37115035          37194105
## 2     Delhi      India          Asia          33807403          32941309
## 3  Shanghai      China          Asia          29867918          29210808
## 4     Dhaka Bangladesh          Asia          23935652          23209616
## 5 Sao Paulo     Brazil South America          22806704          22619736
## 6     Cairo      Egypt        Africa          22623874          22183201
##   Growth.Rate
## 1     -0.0021
## 2      0.0263
## 3      0.0225
## 4      0.0313
## 5      0.0083
## 6      0.0199
# The first graph is a histogram that shows the distribution of population growth rates for each city. It looks fairly balanced, with no strong skew to the left or right, meaning most cities have similar growth rates. 

ggplot(data, aes(x = Growth.Rate)) + geom_histogram(binwidth = 0.01, fill = "blue", color = "black") + labs(title = "Distribution of Population Growth Rate", x = "Growth Rate", y = "Frequency") + theme_minimal()

ggplot(data, aes(x = Population..2023., y = Population..2024.)) + geom_point(color = "red") + geom_smooth(method = "lm", se = FALSE, color = "blue") + labs(title = "Population in 2023 vs Population in 2024", x = "Population in 2023", y = "Population in 2024") + theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'

# The second graph is a scatter plot that compares the population in 2023 with the population in 2024 for each city. It helps me see the relationship between the two years, with cities that have larger populations in 2023 typically having similar populations in 2024, showing consistency over time.

# Calculate mean growth rate
mean_growth_rate <- mean(data$Growth.Rate)
mean_growth_rate
## [1] 0.02005131
# I calculated the mean growth rate to get an average of how populations are changing across all cities. Since the mean is positive, it tells me that, on average, cities are growing. 

# Calculate correlation between 2023 and 2024 population.
correlation <- cor(data$Population..2023., data$Population..2024.)
correlation
## [1] 0.999896
# I ran a correlation analysis between the population sizes in 2023 and 2024. 


# Divide dataset into two groups: positive and negative growth
positive_growth <- subset(data, Growth.Rate > 0)
negative_growth <- subset(data, Growth.Rate <= 0)
result <- t.test(positive_growth$Population..2024., negative_growth$Population..2024.)
result
## 
##  Welch Two Sample t-test
## 
## data:  positive_growth$Population..2024. and negative_growth$Population..2024.
## t = -0.92518, df = 34.851, p-value = 0.3612
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -3394842  1269499
## sample estimates:
## mean of x mean of y 
##   2607893   3670565
# I performed a t-test to compare cities with positive growth rates against those with zero or negative growth rates. The results show a significant difference in population sizes between the two groups, suggesting that cities with positive growth tend to be larger. 

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.