1.Introduction

This report provides an analysis of dog breeds from a dataset titled Dog Breeds Around the World. The objective is to explore breed characteristics, including size, grooming needs, exercise requirements, lifespan, and more.The goal is to provide valuable insights into the diverse traits of dog breeds, helping dog owners and enthusiasts make informed decisions based on breed-specific needs and behaviors.

2.Data Preparation

Loading Libraries and Dataset

# Load necessary libraries
library(dplyr)    
library(tidyr)    
library(ggplot2)  

# Read in the dataset
dog_data=read.csv("C:/Dog Breads Around The World.csv")

Data Cleaning and Transformation

# Clean and transform the dataset
dog_data_clean=dog_data %>%
  
# Convert relevant columns to appropriate data types
mutate(
Name = as.factor(Name),
Origin = as.factor(Origin),
Type = as.factor(Type),
Unique.Feature = as.factor(Unique.Feature),
Size = factor(Size, levels = c("Small", "Medium", "Large")),
Grooming.Needs = factor(Grooming.Needs, levels = c("Low", "Moderate", "High", "Very High")),
Exercise.Requirements..hrs.day. = as.numeric(Exercise.Requirements..hrs.day.),
Good.with.Children = as.factor(Good.with.Children),
Health.Issues.Risk = factor(Health.Issues.Risk, levels = c("Low", "Moderate", "High")),
Average.Weight..kg. = as.numeric(Average.Weight..kg.),
Training.Difficulty..1.10. = as.numeric(Training.Difficulty..1.10.),
Friendly.Rating..1.10. = as.numeric(Friendly.Rating..1.10.),
Intelligence.Rating..1.10. = as.numeric(Intelligence.Rating..1.10.),
Shedding.Level = factor(Shedding.Level, levels = c("Low", "Moderate", "High", "Very High")),
Life.Span = as.numeric(Life.Span)
) %>%
drop_na() #remove rows with missing values

# Display the first few rows of the cleaned data
head(dog_data_clean)
##               Name      Origin    Type         Unique.Feature
## 1    Affenpinscher     Germany     Toy       Monkey-like face
## 2     Afghan Hound Afghanistan   Hound        Long silky coat
## 3 Airedale Terrier     England Terrier    Largest of terriers
## 4            Akita       Japan Working         Strong loyalty
## 5 Alaskan Malamute  Alaska USA Working Strong pulling ability
## 6 American Bulldog         USA Working         Muscular build
##   Friendly.Rating..1.10. Life.Span   Size Grooming.Needs
## 1                      7        14  Small           High
## 2                      5        13  Large      Very High
## 3                      8        12 Medium           High
## 4                      6        11  Large       Moderate
## 5                      7        11  Large           High
## 6                      8        11  Large            Low
##   Exercise.Requirements..hrs.day. Good.with.Children Intelligence.Rating..1.10.
## 1                             1.5                Yes                          8
## 2                             2.0                 No                          4
## 3                             2.0                Yes                          7
## 4                             2.0      With Training                          7
## 5                             3.0                Yes                          6
## 6                             2.0                Yes                          6
##   Shedding.Level Health.Issues.Risk Average.Weight..kg.
## 1       Moderate                Low                   4
## 2           High           Moderate                  25
## 3       Moderate                Low                  21
## 4           High               High                  45
## 5      Very High           Moderate                  36
## 6       Moderate           Moderate                  42
##   Training.Difficulty..1.10.
## 1                          6
## 2                          8
## 3                          6
## 4                          9
## 5                          8
## 6                          7

Summary Statistics

# Summary statistics for numerical columns
summary(dog_data_clean)
##                Name          Origin             Type   
##  Affenpinscher   :  1   England :24   Sporting    :25  
##  Afghan Hound    :  1   Germany :16   Hound       :24  
##  Airedale Terrier:  1   France  :11   Working     :20  
##  Akita           :  1   Scotland:11   Terrier     :19  
##  Alaskan Malamute:  1   USA     : 7   Herding     :18  
##  American Bulldog:  1   China   : 6   Non-Sporting:18  
##  (Other)         :136   (Other) :67   (Other)     :18  
##               Unique.Feature Friendly.Rating..1.10.   Life.Span         Size   
##  Curly coat          :  2    Min.   : 5.000         Min.   : 8.00   Small :44  
##  Energetic           :  2    1st Qu.: 7.000         1st Qu.:11.00   Medium:56  
##  Pack hunting ability:  2    Median : 7.000         Median :12.00   Large :42  
##  Ridge of fur on back:  2    Mean   : 7.444         Mean   :12.08              
##  Short legs long body:  2    3rd Qu.: 8.000         3rd Qu.:13.00              
##  Tri-colored markings:  2    Max.   :10.000         Max.   :16.00              
##  (Other)             :130                                                      
##    Grooming.Needs Exercise.Requirements..hrs.day.     Good.with.Children
##  Low      :31     Min.   :1.000                   No           : 12     
##  Moderate :50     1st Qu.:1.500                   With Training: 20     
##  High     :46     Median :2.000                   Yes          :110     
##  Very High:15     Mean   :1.852                                         
##                   3rd Qu.:2.000                                         
##                   Max.   :3.000                                         
##                                                                         
##  Intelligence.Rating..1.10.   Shedding.Level Health.Issues.Risk
##  Min.   : 4.000             Low      :30     Low     :55       
##  1st Qu.: 7.000             Moderate :86     Moderate:67       
##  Median : 7.000             High     :25     High    :20       
##  Mean   : 7.127             Very High: 1                       
##  3rd Qu.: 8.000                                                
##  Max.   :10.000                                                
##                                                                
##  Average.Weight..kg. Training.Difficulty..1.10.
##  Min.   : 2.00       Min.   :4.00              
##  1st Qu.: 8.00       1st Qu.:6.00              
##  Median :20.00       Median :7.00              
##  Mean   :20.44       Mean   :6.57              
##  3rd Qu.:28.75       3rd Qu.:7.00              
##  Max.   :70.00       Max.   :9.00              
## 

3.Analysis

Distribution of Dog Breeds by Type

# Bar plot of dog breeds by Type
ggplot(dog_data_clean, aes(x = reorder(Type, -table(Type)[Type]), 
                                fill = Type)) +
  geom_bar() +
  theme_minimal() +
  coord_flip() +
  labs(title = "Distribution of Dog Breeds by Type", 
       x = "Type",
       y = "Count")

Weight Distribution by Size

# Boxplot of weight by dog size
ggplot(dog_data_clean, aes(x = Size,
                           y = Average.Weight..kg.,fill = Size)) +
  geom_boxplot() +
  theme_minimal() +
  labs(title = "Boxplot of Dog Weight by Size",
       x = "Size", 
       y = "Weight (kg)")

Lifespan vs Average Weight

# Scatter plot of life span vs average weight
ggplot(dog_data_clean, aes(x = Average.Weight..kg., y = Life.Span)) +
  geom_point(aes(color = Size), alpha = 0.7) +
  theme_minimal() +
  labs(title = "Life Span vs Average Weight", 
       x = "Average Weight (kg)", 
       y = "Life Span (years)")

Exercise Requirements

# Histogram of exercise requirements (hours/day)
ggplot(dog_data_clean, aes(x = Exercise.Requirements..hrs.day.)) +
  geom_histogram(binwidth = 0.5, fill = "lightblue", color = "black") +
  theme_minimal() +
  labs(title = "Distribution of Exercise Requirements", 
       x = "Exercise Requirements (hrs/day)", 
       y = "Frequency")

Grooming Needs Distribution

# Bar plot of grooming needs
ggplot(dog_data_clean, aes(x = Grooming.Needs, fill = Grooming.Needs)) +
  geom_bar() +
  theme_minimal() +
  labs(title = "Distribution of Grooming Needs", 
       x = "Grooming Needs",
       y = "Count")

4.Correlation Analysis

# Correlation matrix for relevant numerical variables
cor_data=dog_data_clean %>%
  select(Life.Span, Friendly.Rating..1.10.,
         Training.Difficulty..1.10.,Average.Weight..kg.) %>%
cor()

# Display the correlation matrix
print(cor_data)
##                              Life.Span Friendly.Rating..1.10.
## Life.Span                   1.00000000             0.06745533
## Friendly.Rating..1.10.      0.06745533             1.00000000
## Training.Difficulty..1.10. -0.14081077            -0.79068154
## Average.Weight..kg.        -0.68546553            -0.17073739
##                            Training.Difficulty..1.10. Average.Weight..kg.
## Life.Span                                  -0.1408108          -0.6854655
## Friendly.Rating..1.10.                     -0.7906815          -0.1707374
## Training.Difficulty..1.10.                  1.0000000           0.3173441
## Average.Weight..kg.                         0.3173441           1.0000000

5.ANOVA: Weight Across Grooming Needs

# ANOVA for weight across grooming needs
anova_result=aov(Average.Weight..kg. ~ Grooming.Needs, data = dog_data_clean)
summary(anova_result)
##                 Df Sum Sq Mean Sq F value Pr(>F)
## Grooming.Needs   3    226   75.41   0.411  0.745
## Residuals      138  25320  183.48
# Visualize the ANOVA result
ggplot(dog_data_clean, aes(x=Grooming.Needs,y=Average.Weight..kg.,
                           fill=Grooming.Needs)) +
geom_boxplot() +
theme_minimal() +
labs(title = "Weight Across Grooming Needs",
      x = "Grooming Needs", 
      y = "Weight (kg)")

6.Summary by Dog Size

# Summary of characteristics by dog size
size_comparison=dog_data_clean %>%
  group_by(Size) %>%
  summarise(
    Mean_Weight = mean(Average.Weight..kg., na.rm = TRUE),
    Median_Weight = median(Average.Weight..kg., na.rm = TRUE),
    Mean_LifeSpan = mean(Life.Span, na.rm = TRUE),
    Median_LifeSpan = median(Life.Span, na.rm = TRUE)
  )

# View the summary
print(size_comparison)
## # A tibble: 3 × 5
##   Size   Mean_Weight Median_Weight Mean_LifeSpan Median_LifeSpan
##   <fct>        <dbl>         <dbl>         <dbl>           <dbl>
## 1 Small          6.5          6.25          12.9              13
## 2 Medium        19.3         20             12.2              12
## 3 Large         36.5         35             11.0              11

7.Conclusion

This report highlights the diversity of dog breeds and their characteristics, from weight and size to grooming needs and lifespan. Each visualization and analysis provides insights into the unique traits of dogs from around the world, assisting in understanding breed-specific needs and patterns. By examining these breed-specific patterns, dog owners and enthusiasts can better understand the diverse needs and behaviors of different dog breeds.