Download the cv2010on.csv file from Moodle; save it in your computer; and upload it into the data folder of your RStudio. Revise the read.csv code line so that the code matches both the name and address of the data file. The data contains civil cases filed at the federal district courts in six New England States in 2011 and on. The row represents civil cases, and the column their characteristics. DEF stands for defendants; PLT plaintiffs; and nature_of_suit type of lawsuits.
Observational ## Q2. How many civil cases have been filed at the U.S. District Courts in New England? Hint: See the result of the str command. 36643 ## Q3. Who is the most frequent defendant in New England? Hint: See the result of the summary command. FRESENIUS MEDICAL CARE 3590 ## Q4. In 2011 Fisher filed a lawsuit against the town of Hermon? What type of lawsuit (nature of suit) was it? Hint: See the result of the head command. Civil Rights ADA Employment
# Load packages
library(dplyr)
civilCases <- read.csv("/resources/rstudio/businessstatistics/data/cv2010on.csv")
civilCases$FILEYEAR <- as.factor(civilCases$FILEYEAR)
str(civilCases)
## 'data.frame': 36643 obs. of 6 variables:
## $ DISTRICT : Factor w/ 6 levels "CT","MA","ME",..: 3 3 3 3 3 3 3 3 3 3 ...
## $ PLT : Factor w/ 19900 levels "-8",":WALKER EL: VENUS-ANTOINETTE",..: 6393 3300 5130 19442 7175 3482 6269 4384 12436 13162 ...
## $ DEF : Factor w/ 19496 levels "-8","'47 BRAND, LLC",..: 8018 11968 5576 10445 5251 14988 7759 1510 8210 13180 ...
## $ FILEYEAR : Factor w/ 8 levels "2011","2012",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ NOS : int 445 385 442 440 440 190 440 442 190 110 ...
## $ nature_of_suit: Factor w/ 44 levels "ADMINISTRATIVE PROCEDURE ACT/REVIEW OR APPEAL OF AGENCY\nDECISION",..: 7 37 9 28 28 29 28 9 29 19 ...
summary(civilCases)
## DISTRICT PLT DEF
## CT: 9718 SMITH : 185 FRESENIUS MEDICAL CARE , ET AL: 3590
## MA:18705 SEALED : 173 ATRIUM MEDICAL CORPORAT, ET AL: 617
## ME: 1988 BROWN : 165 GLAXOSMITHKLINE LLC : 439
## NH: 2556 JOHNSON : 154 FRESENIUS USA, INC., ET AL : 379
## RI: 2628 WILLIAMS: 149 DAVOL, INC., ET AL : 189
## VT: 1048 MARRADI : 146 BOSTON SCIENTIFIC CORP. : 179
## (Other) :35671 (Other) :31250
## FILEYEAR NOS nature_of_suit
## 2014 :6503 Min. :110.0 HEALTH CARE / PHARM : 5145
## 2015 :5884 1st Qu.:360.0 OTHER CIVIL RIGHTS : 4101
## 2013 :4763 Median :367.0 OTHER CONTRACT ACTIONS : 3945
## 2017 :4505 Mean :422.1 CIVIL RIGHTS JOBS : 3026
## 2016 :4155 3rd Qu.:443.0 PERSONAL INJURY -PRODUCT LIABILITY: 2988
## 2011 :4153 Max. :899.0 OTHER PERSONAL INJURY : 2035
## (Other):6680 (Other) :15403
head(civilCases)
## DISTRICT PLT DEF FILEYEAR NOS
## 1 ME FISHER HERMON, TOWN OF 2011 445
## 2 ME CHANDONAIT NAVISTAR INC, ET AL 2011 385
## 3 ME DIONNE EDUCATION, MAINE DEPT, ET AL 2011 442
## 4 ME WILKINS MADORE, ET AL 2011 440
## 5 ME GIBBS DOROTHEA DIX, ET AL 2011 440
## 6 ME CHRISTIAN SCHNEIDER HOMES LLC, ET AL 2011 190
## nature_of_suit
## 1 CIVIL RIGHTS ADA EMPLOYMENT
## 2 PROPERTY DAMAGE -PRODUCT LIABILTY
## 3 CIVIL RIGHTS JOBS
## 4 OTHER CIVIL RIGHTS
## 5 OTHER CIVIL RIGHTS
## 6 OTHER CONTRACT ACTIONS
Revise the count code below so that the result has two columns: nature_of_suit and n. Health Care/ Pharm
# Count number of male and female applicants admitted
civilCases %>%
count(DISTRICT, nature_of_suit) %>%
arrange(desc(n)) # Sort the table by n in descending order
## # A tibble: 248 x 3
## DISTRICT nature_of_suit n
## <fct> <fct> <int>
## 1 MA HEALTH CARE / PHARM 4493
## 2 MA PERSONAL INJURY -PRODUCT LIABILITY 2044
## 3 MA OTHER CONTRACT ACTIONS 1991
## 4 CT OTHER CIVIL RIGHTS 1513
## 5 MA OTHER CIVIL RIGHTS 1455
## 6 CT CIVIL RIGHTS JOBS 1389
## 7 CT OTHER CONTRACT ACTIONS 1066
## 8 CT CONSUMER CREDIT 943
## 9 MA CIVIL RIGHTS JOBS 843
## 10 MA OTHER PERSONAL INJURY 804
## # ... with 238 more rows
Revise the filter code below so that the result only returns a single row for NH that was about “HEALTH CARE / PHARM”. 17.4%
civilCases %>%
count(DISTRICT, nature_of_suit) %>%
# Group by gender
group_by(DISTRICT) %>%
# Create new variable
mutate(prop = n / sum(n)) %>%
# Filter for admitted
filter(nature_of_suit == "HEALTH CARE / PHARM")
## # A tibble: 6 x 4
## # Groups: DISTRICT [6]
## DISTRICT nature_of_suit n prop
## <fct> <fct> <int> <dbl>
## 1 CT HEALTH CARE / PHARM 53 0.00545
## 2 MA HEALTH CARE / PHARM 4493 0.240
## 3 ME HEALTH CARE / PHARM 72 0.0362
## 4 NH HEALTH CARE / PHARM 445 0.174
## 5 RI HEALTH CARE / PHARM 67 0.0255
## 6 VT HEALTH CARE / PHARM 15 0.0143
First, divide the population into homogeneous groups. Then we randomly sample from each group. For example, this sampling method may be used if we want to make sure that low, medium and high-income class is equally represented in a study. This is a form of stratified sampling.