My dataset explores different property casualty insurance companies that operate in the state of Iowa. It contains information on premiums written, losses paid and taxes paid for the year 2019. The data was webscraped and compiled on https://catalog.data.gov/dataset/iowa-property-casualty-insurance-premiums-and-losses. Premiums written is the amount of money made off of premiums that were written in the past year, losses paid is the cost of losses the insurance companies had to pay to the insured, and taxes paid is the amount of taxes paid by those companies.
Load the library and dataset
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.5.1 ✔ tibble 3.2.1
✔ lubridate 1.9.3 ✔ tidyr 1.3.1
✔ purrr 1.0.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Rows: 39861 Columns: 10
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (5): Iowa Code Chapter, State, Company Name, Line of Insurance, Iowa Com...
dbl (5): Year, Premiums Written, Losses Paid, Taxes Paid, NAIC Number
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
options(scipen=999) #Changes all scientific notation in visualizations into standard notaionhead(taxes_rough)
# A tibble: 6 × 10
Year `Iowa Code Chapter` State `Company Name` `Line of Insurance`
<dbl> <chr> <chr> <chr> <chr>
1 2019 515.48 WI 1st Auto & Casualty Insur… Fire
2 2019 515.48 WI 1st Auto & Casualty Insur… Allied Lines
3 2019 515.48 WI 1st Auto & Casualty Insur… Multiple Peril Crop
4 2019 515.48 WI 1st Auto & Casualty Insur… Federal Flood
5 2019 515.48 WI 1st Auto & Casualty Insur… Farmowners Multipl…
6 2019 515.48 WI 1st Auto & Casualty Insur… Homeowners Multipl…
# ℹ 5 more variables: `Premiums Written` <dbl>, `Losses Paid` <dbl>,
# `Taxes Paid` <dbl>, `NAIC Number` <dbl>, `Iowa Company Code` <chr>
Formatting the titles to make them easier to work with
names(taxes_rough) <-tolower(names(taxes_rough)) #Lowering the title namesnames(taxes_rough) <-gsub(" ", "_", names(taxes_rough)) #Adding an underscore to the title nameshead(taxes_rough)
# A tibble: 6 × 10
year iowa_code_chapter state company_name line_of_insurance premiums_written
<dbl> <chr> <chr> <chr> <chr> <dbl>
1 2019 515.48 WI 1st Auto & C… Fire 0
2 2019 515.48 WI 1st Auto & C… Allied Lines 0
3 2019 515.48 WI 1st Auto & C… Multiple Peril C… 0
4 2019 515.48 WI 1st Auto & C… Federal Flood 0
5 2019 515.48 WI 1st Auto & C… Farmowners Multi… 68977
6 2019 515.48 WI 1st Auto & C… Homeowners Multi… 30721
# ℹ 4 more variables: losses_paid <dbl>, taxes_paid <dbl>, naic_number <dbl>,
# iowa_company_code <chr>
Creating a new variable to help clean the dataset
taxes <- taxes_rough |>mutate(total_losses = losses_paid + taxes_paid) |>#Combines the losses paid and taxes paid into one variable called "total_losses"filter(total_losses !="0") # Filtering any rows with 0 as their total paid amount to clean empty values for companies that did not operate in that specific line of insurancehead(taxes)
# A tibble: 6 × 11
year iowa_code_chapter state company_name line_of_insurance premiums_written
<dbl> <chr> <chr> <chr> <chr> <dbl>
1 2019 515.48 WI 1st Auto & C… Farmowners Multi… 68977
2 2019 515.48 WI 1st Auto & C… Homeowners Multi… 30721
3 2019 515.48 WI 1st Auto & C… Other Liability … 53356
4 2019 515.48 WI 1st Auto & C… Other Private Pa… 1564112
5 2019 515.48 WI 1st Auto & C… Other Commerical… 90602
6 2019 515.48 WI 1st Auto & C… Private Passenge… 1507244
# ℹ 5 more variables: losses_paid <dbl>, taxes_paid <dbl>, naic_number <dbl>,
# iowa_company_code <chr>, total_losses <dbl>
Exploring the amount paid by insurance companies compared to the income gained by writting premiums
ggplot(taxes, aes(x = premiums_written, y = total_losses, color = line_of_insurance)) +geom_point()
Formatting the dataset for use in the first visualiation
taxes_in_hundred_thousands <- taxes |>mutate(total_losses = total_losses /100000) |># Dividing the total losses by 100,000 to produce a more readable visualizationmutate(premiums_written = premiums_written /100000) |># Dividing the premiums written by 100,000 to produce a more readable visualizationfilter(line_of_insurance =="Fire"| line_of_insurance =="Earthquake"| line_of_insurance =="Aircraft"| line_of_insurance =="Boiler and Machinery"| line_of_insurance =="Credit") # Filtering for only five lines of insurancehead(taxes_in_hundred_thousands)
# A tibble: 6 × 11
year iowa_code_chapter state company_name line_of_insurance premiums_written
<dbl> <chr> <chr> <chr> <chr> <dbl>
1 2019 515.48 WI 1st Auto & C… Boiler and Machi… 0.0183
2 2019 515.48 IA Acadia Insur… Boiler and Machi… 0.536
3 2019 515.48 PA ACE American… Fire 0.0424
4 2019 515.48 PA ACE American… Earthquake 0.0976
5 2019 515.48 PA ACE American… Boiler and Machi… 0.0006
6 2019 515.48 PA ACE American… Credit 0.0682
# ℹ 5 more variables: losses_paid <dbl>, taxes_paid <dbl>, naic_number <dbl>,
# iowa_company_code <chr>, total_losses <dbl>
Redoing the first visualization with better aesthetics
ggplot(taxes_in_hundred_thousands, aes(x = premiums_written, y = total_losses, colour = line_of_insurance)) +geom_point() +scale_color_manual(values =c("lightblue", "black", "blue", "green", "red")) +# Changing the colors of the insurance typeslabs(x ="Money Gained From Premiums Written (In Hundreds Of Thousands)",y ="Total Losses (In Hundreds Of Thousands)",title ="Amount Paid by Insurance Companies Over Premiums Written",color ="Type of Insurance Being Covered",caption ="https://catalog.data.gov/dataset/iowa-property-casualty-insurance-premiums-and-losses") # Changing the labels for the visualization
This visualization shows the total losses of a specific insurance type compared to the total gained from writing premiums. The x value shows the amount gained from the written premiums, the y value shows the total losses, being the loss from having to pay the insured person and the tax paid along with that. The color of each dot represents the type of insurance that was being dealt with in that company.
Exploring which line of insurance had the most insurance payouts
ggplot(taxes, aes(x = state, fill = line_of_insurance)) +geom_bar()
Creating a csv of the cleaned up taxes dataset for use in Tableu
#write.csv(taxes, "Iowa_Taxes.csv")
The second visualization with better aesthetics
This is the link to my Tableu visualization of the data with interactivity
This visualization shows the main states covering different types of insurance. The length of the column indicates the number of insurance sold by a company that operated mainly in that state. The color of each part of the column indicates the type of insurance being sold and the columns show which state the companies mainly operate in.
Ending Essay
I started off with a dataset with almost 40,000 entries and i felt the need to deal with some of the entries that had no good data to provide. I first started to make the dataset easier to work with by lower casing all the names and replacing spaces with underscores. I then created a variable called “total_losses” that combined the losses paid and taxes paid variables into one that showed the total losses for the row. I then removed any rows that contained no losses, which usually means that the insurance company had no activity in 2019 for a specific type of insurance, like fire coverage or injury coverage.