Download the cv2010on.csv file from Moodle; save it in your computer; and upload it into the data folder of your RStudio. Revise the read.csv code line so that the code matches both the name and address of the data file. The data contains civil cases filed at the federal district courts in six New England States in 2011 and on. The row represents civil cases, and the column their characteristics. DEF stands for defendants; PLT plaintiffs; and nature_of_suit type of lawsuits. Change comics to civilCases.

Q1 How many cases are there?

36,643 cases

# Load dplyr package
library(dplyr) #for use of dplyr functions such as glimpse(), mutate(), and filter()
library(ggplot2) #for use of ggplot2 functions such ggplot()

# Import data
civilcases <- read.csv("/resources/rstudio/Business Statistics/data/cv2010on.csv") 

# Convert data to tbl_df
civilcases <- tbl_df(civilcases)
str(civilcases)
## Classes 'tbl_df', 'tbl' and 'data.frame':    36643 obs. of  6 variables:
##  $ DISTRICT      : Factor w/ 6 levels "CT","MA","ME",..: 3 3 3 3 3 3 3 3 3 3 ...
##  $ PLT           : Factor w/ 19900 levels "-8",":WALKER EL: VENUS-ANTOINETTE",..: 6393 3300 5130 19442 7175 3482 6269 4384 12436 13162 ...
##  $ DEF           : Factor w/ 19496 levels "-8","'47 BRAND, LLC",..: 8018 11968 5576 10445 5251 14988 7759 1510 8210 13180 ...
##  $ FILEYEAR      : int  2011 2011 2011 2011 2011 2011 2011 2011 2011 2011 ...
##  $ NOS           : int  445 385 442 440 440 190 440 442 190 110 ...
##  $ nature_of_suit: Factor w/ 44 levels "ADMINISTRATIVE PROCEDURE ACT/REVIEW OR APPEAL OF AGENCY\nDECISION",..: 7 37 9 28 28 29 28 9 29 19 ...

Q2 List all values in District variable.

Revise the level code below so that R returns all levels (values) in the id variable.

CT MA ME NH RI VT

Q3 How many HEALTH CARE / PHARM cases were filed in New Hampshire?

Revise the table code below so that R returns the answer for the question.

445 cases were filled

levels(civilcases$nature_of_suit)
##  [1] "ADMINISTRATIVE PROCEDURE ACT/REVIEW OR APPEAL OF AGENCY\nDECISION"
##  [2] "ANTITRUST"                                                        
##  [3] "ARBITRATION"                                                      
##  [4] "ASSAULT, LIBEL, AND SLANDER"                                      
##  [5] "BANKS AND BANKING"                                                
##  [6] "CIVIL RIGHTS ACCOMMODATIONS"                                      
##  [7] "CIVIL RIGHTS ADA EMPLOYMENT"                                      
##  [8] "CIVIL RIGHTS ADA OTHER"                                           
##  [9] "CIVIL RIGHTS JOBS"                                                
## [10] "CONSUMER CREDIT"                                                  
## [11] "CONTRACT FRANCHISE"                                               
## [12] "CONTRACT PRODUCT LIABILITY"                                       
## [13] "COPYRIGHT"                                                        
## [14] "FAIR LABOR STANDARDS ACT"                                         
## [15] "FALSE CLAIMS ACT"                                                 
## [16] "FAMILY AND MEDICAL LEAVE ACT"                                     
## [17] "FOOD AND DRUG ACTS"                                               
## [18] "HEALTH CARE / PHARM"                                              
## [19] "INSURANCE"                                                        
## [20] "INTERSTATE COMMERCE"                                              
## [21] "LABOR/MANAGEMENT RELATIONS ACT"                                   
## [22] "LABOR/MANAGEMENT REPORT & DISCLOSURE"                             
## [23] "MEDICAL MALPRACTICE"                                              
## [24] "MOTOR VEHICLE PERSONAL INJURY"                                    
## [25] "MOTOR VEHICLE PRODUCT LIABILITY"                                  
## [26] "NEGOTIABLE INSTRUMENTS"                                           
## [27] "OCCUPATIONAL SAFETY/HEALTH"                                       
## [28] "OTHER CIVIL RIGHTS"                                               
## [29] "OTHER CONTRACT ACTIONS"                                           
## [30] "OTHER FRAUD"                                                      
## [31] "OTHER LABOR LITIGATION"                                           
## [32] "OTHER PERSONAL INJURY"                                            
## [33] "OTHER PERSONAL PROPERTY DAMAGE"                                   
## [34] "OTHER REAL PROPERTY ACTIONS"                                      
## [35] "OTHER STATUTORY ACTIONS"                                          
## [36] "PERSONAL INJURY -PRODUCT LIABILITY"                               
## [37] "PROPERTY DAMAGE -PRODUCT LIABILTY"                                
## [38] "RENT, LEASE, EJECTMENT"                                           
## [39] "SECURITIES, COMMODITIES, EXCHANGE"                                
## [40] "STOCKHOLDER'S SUITS"                                              
## [41] "TORT PRODUCT LIABILITY"                                           
## [42] "TORTS TO LAND"                                                    
## [43] "TRADEMARK"                                                        
## [44] "TRUTH IN LENDING"

tab <- table(civilcases$DISTRICT, civilcases$DISTRICT)

tab
##     
##         CT    MA    ME    NH    RI    VT
##   CT  9718     0     0     0     0     0
##   MA     0 18705     0     0     0     0
##   ME     0     0  1988     0     0     0
##   NH     0     0     0  2556     0     0
##   RI     0     0     0     0  2628     0
##   VT     0     0     0     0     0  1048

Q4 What is the district that handles the largest number of civil cases?

Revise the barchart code below to find the answer.

MA handles the largest number of civil cases

ggplot(civilcases, aes(x = DISTRICT)) + 
  geom_bar()

Q5 Revise the code as instructed below. Don’t be surprised if your chart doesn’t look right. Explain the reason why.

Map district to the x-axis and nature of suit to color.

Their is too much data for the graph to be visible that is why it dose not look right.

ggplot(civilcases, aes(x = DISTRICT, fill = nature_of_suit)) + 
  geom_bar(position = "fill") #position = "fill", to have a stacked barchart

Q6 Which district has the largest share of HEALTH CARE / PHARM cases?

Simplify the chart in Q5 by filtering for the top five categories in the nature of suit:

MA has the largest share of health care and pharmasutical cases

  1. HEALTH CARE / PHARM
  2. OTHER CIVIL RIGHTS
  3. OTHER CONTRACT ACTIONS
  4. CIVIL RIGHTS JOBS
  5. PERSONAL INJURY -PRODUCT LIABILITY

Revise the filter code below.

civilcases_filtered <- 
  civilcases %>% 
  filter(nature_of_suit %in% c("Male", "Female"))

ggplot(civilcases_filtered, aes(x = DISTRICT, fill = nature_of_suit)) + 
  geom_bar(position = "fill") #position = "fill", to have a stacked barchart