This report provides an analysis of dog breeds from a dataset titled Dog Breeds Around the World. The objective is to explore breed characteristics, including size, grooming needs, exercise requirements, lifespan, and more.The goal is to provide valuable insights into the diverse traits of dog breeds, helping dog owners and enthusiasts make informed decisions based on breed-specific needs and behaviors.
# Clean and transform the dataset
dog_data_clean=dog_data %>%
# Convert relevant columns to appropriate data types
mutate(
Name = as.factor(Name),
Origin = as.factor(Origin),
Type = as.factor(Type),
Unique.Feature = as.factor(Unique.Feature),
Size = factor(Size, levels = c("Small", "Medium", "Large")),
Grooming.Needs = factor(Grooming.Needs, levels = c("Low", "Moderate", "High", "Very High")),
Exercise.Requirements..hrs.day. = as.numeric(Exercise.Requirements..hrs.day.),
Good.with.Children = as.factor(Good.with.Children),
Health.Issues.Risk = factor(Health.Issues.Risk, levels = c("Low", "Moderate", "High")),
Average.Weight..kg. = as.numeric(Average.Weight..kg.),
Training.Difficulty..1.10. = as.numeric(Training.Difficulty..1.10.),
Friendly.Rating..1.10. = as.numeric(Friendly.Rating..1.10.),
Intelligence.Rating..1.10. = as.numeric(Intelligence.Rating..1.10.),
Shedding.Level = factor(Shedding.Level, levels = c("Low", "Moderate", "High", "Very High")),
Life.Span = as.numeric(Life.Span)
) %>%
drop_na() #remove rows with missing values
# Display the first few rows of the cleaned data
head(dog_data_clean)## Name Origin Type Unique.Feature
## 1 Affenpinscher Germany Toy Monkey-like face
## 2 Afghan Hound Afghanistan Hound Long silky coat
## 3 Airedale Terrier England Terrier Largest of terriers
## 4 Akita Japan Working Strong loyalty
## 5 Alaskan Malamute Alaska USA Working Strong pulling ability
## 6 American Bulldog USA Working Muscular build
## Friendly.Rating..1.10. Life.Span Size Grooming.Needs
## 1 7 14 Small High
## 2 5 13 Large Very High
## 3 8 12 Medium High
## 4 6 11 Large Moderate
## 5 7 11 Large High
## 6 8 11 Large Low
## Exercise.Requirements..hrs.day. Good.with.Children Intelligence.Rating..1.10.
## 1 1.5 Yes 8
## 2 2.0 No 4
## 3 2.0 Yes 7
## 4 2.0 With Training 7
## 5 3.0 Yes 6
## 6 2.0 Yes 6
## Shedding.Level Health.Issues.Risk Average.Weight..kg.
## 1 Moderate Low 4
## 2 High Moderate 25
## 3 Moderate Low 21
## 4 High High 45
## 5 Very High Moderate 36
## 6 Moderate Moderate 42
## Training.Difficulty..1.10.
## 1 6
## 2 8
## 3 6
## 4 9
## 5 8
## 6 7
## Name Origin Type
## Affenpinscher : 1 England :24 Sporting :25
## Afghan Hound : 1 Germany :16 Hound :24
## Airedale Terrier: 1 France :11 Working :20
## Akita : 1 Scotland:11 Terrier :19
## Alaskan Malamute: 1 USA : 7 Herding :18
## American Bulldog: 1 China : 6 Non-Sporting:18
## (Other) :136 (Other) :67 (Other) :18
## Unique.Feature Friendly.Rating..1.10. Life.Span Size
## Curly coat : 2 Min. : 5.000 Min. : 8.00 Small :44
## Energetic : 2 1st Qu.: 7.000 1st Qu.:11.00 Medium:56
## Pack hunting ability: 2 Median : 7.000 Median :12.00 Large :42
## Ridge of fur on back: 2 Mean : 7.444 Mean :12.08
## Short legs long body: 2 3rd Qu.: 8.000 3rd Qu.:13.00
## Tri-colored markings: 2 Max. :10.000 Max. :16.00
## (Other) :130
## Grooming.Needs Exercise.Requirements..hrs.day. Good.with.Children
## Low :31 Min. :1.000 No : 12
## Moderate :50 1st Qu.:1.500 With Training: 20
## High :46 Median :2.000 Yes :110
## Very High:15 Mean :1.852
## 3rd Qu.:2.000
## Max. :3.000
##
## Intelligence.Rating..1.10. Shedding.Level Health.Issues.Risk
## Min. : 4.000 Low :30 Low :55
## 1st Qu.: 7.000 Moderate :86 Moderate:67
## Median : 7.000 High :25 High :20
## Mean : 7.127 Very High: 1
## 3rd Qu.: 8.000
## Max. :10.000
##
## Average.Weight..kg. Training.Difficulty..1.10.
## Min. : 2.00 Min. :4.00
## 1st Qu.: 8.00 1st Qu.:6.00
## Median :20.00 Median :7.00
## Mean :20.44 Mean :6.57
## 3rd Qu.:28.75 3rd Qu.:7.00
## Max. :70.00 Max. :9.00
##
# Bar plot of dog breeds by Type
ggplot(dog_data_clean, aes(x = reorder(Type, -table(Type)[Type]),
fill = Type)) +
geom_bar() +
theme_minimal() +
coord_flip() +
labs(title = "Distribution of Dog Breeds by Type",
x = "Type",
y = "Count")# Boxplot of weight by dog size
ggplot(dog_data_clean, aes(x = Size,
y = Average.Weight..kg.,fill = Size)) +
geom_boxplot() +
theme_minimal() +
labs(title = "Boxplot of Dog Weight by Size",
x = "Size",
y = "Weight (kg)")# Scatter plot of life span vs average weight
ggplot(dog_data_clean, aes(x = Average.Weight..kg., y = Life.Span)) +
geom_point(aes(color = Size), alpha = 0.7) +
theme_minimal() +
labs(title = "Life Span vs Average Weight",
x = "Average Weight (kg)",
y = "Life Span (years)")# Histogram of exercise requirements (hours/day)
ggplot(dog_data_clean, aes(x = Exercise.Requirements..hrs.day.)) +
geom_histogram(binwidth = 0.5, fill = "lightblue", color = "black") +
theme_minimal() +
labs(title = "Distribution of Exercise Requirements",
x = "Exercise Requirements (hrs/day)",
y = "Frequency")# Correlation matrix for relevant numerical variables
cor_data=dog_data_clean %>%
select(Life.Span, Friendly.Rating..1.10.,
Training.Difficulty..1.10.,Average.Weight..kg.) %>%
cor()
# Display the correlation matrix
print(cor_data)## Life.Span Friendly.Rating..1.10.
## Life.Span 1.00000000 0.06745533
## Friendly.Rating..1.10. 0.06745533 1.00000000
## Training.Difficulty..1.10. -0.14081077 -0.79068154
## Average.Weight..kg. -0.68546553 -0.17073739
## Training.Difficulty..1.10. Average.Weight..kg.
## Life.Span -0.1408108 -0.6854655
## Friendly.Rating..1.10. -0.7906815 -0.1707374
## Training.Difficulty..1.10. 1.0000000 0.3173441
## Average.Weight..kg. 0.3173441 1.0000000
# ANOVA for weight across grooming needs
anova_result=aov(Average.Weight..kg. ~ Grooming.Needs, data = dog_data_clean)
summary(anova_result)## Df Sum Sq Mean Sq F value Pr(>F)
## Grooming.Needs 3 226 75.41 0.411 0.745
## Residuals 138 25320 183.48
# Visualize the ANOVA result
ggplot(dog_data_clean, aes(x=Grooming.Needs,y=Average.Weight..kg.,
fill=Grooming.Needs)) +
geom_boxplot() +
theme_minimal() +
labs(title = "Weight Across Grooming Needs",
x = "Grooming Needs",
y = "Weight (kg)")# Summary of characteristics by dog size
size_comparison=dog_data_clean %>%
group_by(Size) %>%
summarise(
Mean_Weight = mean(Average.Weight..kg., na.rm = TRUE),
Median_Weight = median(Average.Weight..kg., na.rm = TRUE),
Mean_LifeSpan = mean(Life.Span, na.rm = TRUE),
Median_LifeSpan = median(Life.Span, na.rm = TRUE)
)
# View the summary
print(size_comparison)## # A tibble: 3 × 5
## Size Mean_Weight Median_Weight Mean_LifeSpan Median_LifeSpan
## <fct> <dbl> <dbl> <dbl> <dbl>
## 1 Small 6.5 6.25 12.9 13
## 2 Medium 19.3 20 12.2 12
## 3 Large 36.5 35 11.0 11
This report highlights the diversity of dog breeds and their characteristics, from weight and size to grooming needs and lifespan. Each visualization and analysis provides insights into the unique traits of dogs from around the world, assisting in understanding breed-specific needs and patterns. By examining these breed-specific patterns, dog owners and enthusiasts can better understand the diverse needs and behaviors of different dog breeds.