#install.packages("readr")
#install.packages("summarytools")
library(readr)
library(summarytools)

Group members

Grisha R.
Ilanit
Netta

Data Source

Data from:
https://www.kaggle.com/datasets/tsaustin/us-used-car-sales-data

Motivations

Data Loading

data <- read_csv("used_car_sales.csv")

Here we present the results in a nice format:

summarytools::dfSummary(data)

Data Frame Summary
data
Dimensions: 122144 x 13
Duplicates: 0

No Variable Stats / Values Freqs (% of Valid) Graph Valid Missing
1 ID [numeric] Mean (sd) : 85094.2 (47787) min < med < max: 1 < 85555.5 < 165801 IQR (CV) : 82531.2 (0.6) 122144 distinct values . : . : . : . : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 122144 (100.0%) 0 (0.0%)
2 pricesold [numeric] Mean (sd) : 10808.6 (13987.3) min < med < max: 0 < 6500 < 404990 IQR (CV) : 10850 (1.3) 4424 distinct values : : : : : 122144 (100.0%) 0 (0.0%)
3 yearsold [numeric] Mean (sd) : 2019.4 (0.5) min < med < max: 2018 < 2019 < 2020 IQR (CV) : 1 (0) 2018 : 1172 ( 1.0%) 2019 : 73939 (60.5%) 2020 : 47033 (38.5%) IIIIIIIIIIII IIIIIII 122144 (100.0%) 0 (0.0%)
4 zipcode [character] 1. 92868 2. 928 3. 481 4. 48150 5. 77477 6. 191 7. 330 8. 19114 9. 17319 10. 331 [ 15472 others ] 3663 ( 3.0%) 2198 ( 1.8%) 1214 ( 1.0%) 1150 ( 0.9%) 1006 ( 0.8%) 720 ( 0.6%) 693 ( 0.6%) 651 ( 0.5%) 564 ( 0.5%) 519 ( 0.4%) 108857 (89.8%) 121235 (99.3%) 909 (0.7%)
5 Mileage [numeric] Mean (sd) : 1404291 (33355926) min < med < max: 0 < 90000 < 1235668876 IQR (CV) : 95407.8 (23.8) 60843 distinct values : : : : : 122144 (100.0%) 0 (0.0%)
6 Make [character] 1. Ford 2. Chevrolet 3. Toyota 4. Mercedes-Benz 5. Dodge 6. BMW 7. Jeep 8. Cadillac 9. Volkswagen 10. Honda [ 454 others ] 22027 (18.0%) 21171 (17.3%) 6676 ( 5.5%) 6241 ( 5.1%) 5899 ( 4.8%) 5128 ( 4.2%) 4543 ( 3.7%) 3657 ( 3.0%) 3589 ( 2.9%) 3451 ( 2.8%) 39762 (32.6%) III III I I 122144 (100.0%) 0 (0.0%)
7 Model [character] 1. Mustang 2. Corvette 3. F-150 4. Camaro 5. Other 6. F-250 7. Other Pickups 8. Wrangler 9. 3-Series 10. C-10 [ 4279 others ] 4478 ( 3.7%) 3183 ( 2.6%) 2561 ( 2.1%) 2164 ( 1.8%) 2020 ( 1.7%) 1858 ( 1.5%) 1813 ( 1.5%) 1721 ( 1.4%) 1470 ( 1.2%) 1451 ( 1.2%) 98852 (81.3%) 121571 (99.5%) 573 (0.5%)
8 Year [numeric] Mean (sd) : 3959.4 (198451.4) min < med < max: 0 < 2000 < 20140000 IQR (CV) : 31 (50.1) 148 distinct values : : : : : 122144 (100.0%) 0 (0.0%)
9 Trim [character] 1. XLT 2. SE 3. LT 4. – 5. Limited 6. GT 7. LX 8. LS 9. Convertible 10. Sport [ 24972 others ] 1477 ( 2.0%) 1096 ( 1.5%) 956 ( 1.3%) 854 ( 1.2%) 814 ( 1.1%) 813 ( 1.1%) 795 ( 1.1%) 736 ( 1.0%) 690 ( 0.9%) 677 ( 0.9%) 64333 (87.8%) 73241 (60.0%) 48903 (40.0%)
10 Engine [character] 1. 350 2. V8 3. V6 4. 5.0 5. 5.7 6. 302 7. 6 8. 3.6 LITER V6 ENGINE 9. 5.7L Gas V8 10. 3L V6 24V [ 22391 others ] 2085 ( 2.2%) 1894 ( 2.0%) 929 ( 1.0%) 612 ( 0.6%) 590 ( 0.6%) 538 ( 0.6%) 485 ( 0.5%) 437 ( 0.5%) 435 ( 0.5%) 432 ( 0.5%) 86650 (91.1%) 95087 (77.8%) 27057 (22.2%)
11 BodyType [character] 1. Sedan 2. Coupe 3. SUV 4. Convertible 5. Standard Cab Pickup 6. Crew Cab Pickup 7. Hatchback 8. Extended Cab Pickup 9. Wagon 10. 4dr Car [ 2321 others ] 18216 (18.0%) 18046 (17.8%) 15353 (15.1%) 12327 (12.2%) 4289 ( 4.2%) 3708 ( 3.7%) 2805 ( 2.8%) 2693 ( 2.7%) 2586 ( 2.6%) 2369 ( 2.3%) 18970 (18.7%) III III III II 101362 (83.0%) 20782 (17.0%)
12 NumCylinders [numeric] Mean (sd) : 17586.5 (6144603) min < med < max: 0 < 6 < 2147483647 IQR (CV) : 4 (349.4) 17 distinct values : : : : : 122144 (100.0%) 0 (0.0%)
13 DriveType [character] 1. RWD 2. 4WD 3. FWD 4. AWD 5. – 6. 2WD 7. 4x4 8. REAR WHEEL DRIVE 9. Front Wheel Drive 10. 4X4 [ 2806 others ] 42207 (43.4%) 20143 (20.7%) 17029 (17.5%) 9256 ( 9.5%) 725 ( 0.7%) 572 ( 0.6%) 336 ( 0.3%) 213 ( 0.2%) 174 ( 0.2%) 172 ( 0.2%) 6478 ( 6.7%) IIIIIIII IIII III I 97305 (79.7%) 24839 (20.3%)