According to the data, all 10 cities that have the highest happiness rating are ranked top 10 in helathiest cities around the world.
Creating a new column named “latitude”
latitude <- c(52.353218, -33.8696, 48.20254, 59.332787, 55.567576, 60.169807,33.58325, 52.501408, 41.3875, 49.26038)
Creating a new column named “longitude”
longitude <- c(5.0027695, 151.20695, 16.3688, 18.064487, 12.569023, 24.938128, 130.3836, 13.4023285, 2.16835, -123.11336)
Adding column named “latitude”
city_ranks10['latitude'] <- latitude
Adding column named “longitude”
city_ranks10['longitude'] <- longitude
View “city_ranks10”
view(city_ranks10)
Assigning map_data to “world”
world <- map_data("world")
Introducing a new dataset
setwd("~/Data 110 Folder")
Book2 <- read.csv("Book2.csv")
View new dataset with Longitude and Laditude
view(Book2)
Remove “Rank” from “city_ranks10”
city_ranks10 <- subset (city_ranks10, select = -Rank)
Combine both datasets
City_15 <- rbind(Book2, city_ranks10)
Create a world map of top 10 cities
ggplot() +
geom_map(
data = world, map = world,
aes(long, lat, map_id = region),
color = "white", fill = "lightgray", size = 0.1
) +
ggtitle("Where The Healthiest Cities Locatced")+
geom_point(
data = city_ranks10,
aes(longitude, latitude, color = City),
alpha = 0.7)
## Warning: Ignoring unknown aesthetics: x, y

View “City_15”
view(City_15)
Add new column “hours_worked”
City_15['hours_worked'] <- c(2137, 1965, 1779, 1779, 1718, 1434, 1712, 1501, 1452, 1380, 1540, 1644, 1386, 1686, 1670)
World map of hours worked for top 10 and lowest 5 cities
ggplot() +
geom_map(
data = world, map = world,
aes(long, lat, map_id = region),
color = "white", fill = "lightgray", size = 0.1
) +
ggtitle("Does Work Affect Health?")+
geom_point(
data = City_15,
aes(longitude, latitude, color = City, size = hours_worked),
alpha = 0.9
)
## Warning: Ignoring unknown aesthetics: x, y

The dataset is from Kaggle.com and the topic is healthy lifestyle cities of 2021 around the world. The context of this data is a discussion of healthy lifestyle metrics of the top 44 cities. The observable measures include: Sunshine hours(City), Cost of a bottle of water(City), Obesity levels(Country), Life expectancy(years) (Country), Pollution(Index score) (City), Annual avg. hours worked, Happiness levels(Country), Outdoor activities(In minutes), Number of taking out places(City), Cost of a monthly gym membership(City). I cleaned the data in many ways. I converted some of my data by turning the currency of Euros into dollars so that people could understand the data about their currency. I switched the water bottle cost and gym membership cost currency to the US dollar to get a better feel for what each price was. I also categorized the data; I categorized/assigned “happiness level” ratings to low, medium, and high variables. I downsized the data by only focusing on the first 10 cities and the last 5 cities so that viewers could get a better idea of what is going on in the dataset. I merged and removed columns to make new datasets for example “city_rank10”, I took the city and rank for the top 10 healthiest cities and then merged latitude and longitude columns onto the data frame for each city. I had some problems with the column names and I continued to get error messages every time I used them. So, to help this issue, I used the “janitor” clean_names(), and what this does is it returns names with only lowercase letters, with “_” as a separator. It handles special characters and spaces, appends numbers to duplicated names, converts “%” to “percent” to retain meaning. In the janitor package, the main janitor functions format data frame column names; isolate partially-duplicate records, and provide quick tabulations (frequency tables and crosstabs). The reason I chose this dataset was that I wanted data where I could evaluate and hypothesize different causes or characteristics of a healthy city. The dataset includes both quantitative and categorical variables, which allow me to come up with some unique results. The benefit of having quantitative data is that quantitative variables let you quickly collect information, including randomized samples with the ability to reach larger groups and duplicate easily. It also allows you to focus on facts that don’t require direct observation and can be anonymous making your analysis easier to complete. Categorical data is unique and does not have the same kind of statistical analysis that can be performed on other data. The results of categorical data are concrete, without subjective open-ended questions. The first visualization is a bar graph of the top ten cities and their rank. The rank consists of many different ratings and aspects of what a healthy city has more or less of. For example, the highest-ranked cities have the lowest cost of water, the greatest amount of outdoor activity, the least amount of hours worked, and the lowest obesity percentage. The second visualization includes the level of happiness for each of those top ten cities. I assigned each happiness rating to low, medium, and high levels of happiness based on a scale I created. Each of the cities within the top ten rankings had the highest levels of happiness compared to the rest of the cities. The reason why I used facet wrap for the graph is that I wanted viewers to see each city’s rating. I did a map graph for my third visual. The interesting aspect of this graph is that it shows where the top ten healthiest cities were and most of those places were located in Europe and only one of these cities was located in the US. We can attribute this to Europe’s great health care system, attention to the environment (great air and water quality). These cities also have high levels of education, low unemployment rate, and general happiness. A city like Helsinki Finland, has a fairly low number of cars, low pollution levels, and high life expectancy. Helsinki also has a great work-life balance (the city ranks 16th in the world) and a high number of annual vacation days. Some features of Vienna that aren’t displayed in the data are low crime rates, good public services, and a great doctor-to-citizen ratio. Berlin has proved to have a better business ecosystem, including the number of jobs, the availability of essentials like housing, food, and transport. Openness in terms of gender equality and tolerance is also very high in the German capital. Amsterdam has one of the highest annual vacation days in the world. The city is also known for its high tolerance and diversity. Additionally, it has all of the advantages of a big city, while having a very compact size. It is bike-friendly and it features the second-highest number of electric car points in the world, which contributes to its low pollution rates. My last visualization included the top 10 and lowest 5 ranked cities in the world. I think that this visual was one of the more comprehensive ones that I made. It included the aspect of 15 cities and average hours worked. The longest hours worked were in the Western hemisphere which could be attributed to many things such as culture and the overall importance of work-life balance. In conclusion, for the last two visuals, I searched up coordinates(longitude and latitude) of the cities to make the maps. I merged longitude and latitude columns to the new datasets that I made. One thing that I could not get to work on was creating a line graph or some type of graph that could display every variable for each city in the data frame. I didn’t want to use different shapes or sizes to display the visual. I think it would be beneficial to include the top 10 and bottom 5 cities and see how they compare regarding the observable measures.