#install.packages("readr")
#install.packages("summarytools")
library(readr)
library(summarytools)
Grisha R.
Ilanit
Netta
Data from:
https://www.kaggle.com/datasets/tsaustin/us-used-car-sales-data
We 😍 cars and business, so why not combine them in a nice data analysis project? This way we can have fun while studying advanced statistical methodologies.
We would like to investigate the effect of mileage and chronological age on car pricing? Additionally, we would like to discover models that hold their price well. 😱
data <- read_csv("used_car_sales.csv")
Here we present the results in a nice format:
summarytools::dfSummary(data)
Data Frame Summary
data
Dimensions: 122144 x 13
Duplicates: 0
| No | Variable | Stats / Values | Freqs (% of Valid) | Graph | Valid | Missing |
|---|---|---|---|---|---|---|
| 1 | ID [numeric] | Mean (sd) : 85094.2 (47787) min < med < max: 1 < 85555.5 < 165801 IQR (CV) : 82531.2 (0.6) | 122144 distinct values | . : . : . : . : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : | 122144 (100.0%) | 0 (0.0%) |
| 2 | pricesold [numeric] | Mean (sd) : 10808.6 (13987.3) min < med < max: 0 < 6500 < 404990 IQR (CV) : 10850 (1.3) | 4424 distinct values | : : : : : | 122144 (100.0%) | 0 (0.0%) |
| 3 | yearsold [numeric] | Mean (sd) : 2019.4 (0.5) min < med < max: 2018 < 2019 < 2020 IQR (CV) : 1 (0) | 2018 : 1172 ( 1.0%) 2019 : 73939 (60.5%) 2020 : 47033 (38.5%) | IIIIIIIIIIII IIIIIII | 122144 (100.0%) | 0 (0.0%) |
| 4 | zipcode [character] | 1. 92868 2. 928 3. 481 4. 48150 5. 77477 6. 191 7. 330 8. 19114 9. 17319 10. 331 [ 15472 others ] | 3663 ( 3.0%) 2198 ( 1.8%) 1214 ( 1.0%) 1150 ( 0.9%) 1006 ( 0.8%) 720 ( 0.6%) 693 ( 0.6%) 651 ( 0.5%) 564 ( 0.5%) 519 ( 0.4%) 108857 (89.8%) | 121235 (99.3%) | 909 (0.7%) | |
| 5 | Mileage [numeric] | Mean (sd) : 1404291 (33355926) min < med < max: 0 < 90000 < 1235668876 IQR (CV) : 95407.8 (23.8) | 60843 distinct values | : : : : : | 122144 (100.0%) | 0 (0.0%) |
| 6 | Make [character] | 1. Ford 2. Chevrolet 3. Toyota 4. Mercedes-Benz 5. Dodge 6. BMW 7. Jeep 8. Cadillac 9. Volkswagen 10. Honda [ 454 others ] | 22027 (18.0%) 21171 (17.3%) 6676 ( 5.5%) 6241 ( 5.1%) 5899 ( 4.8%) 5128 ( 4.2%) 4543 ( 3.7%) 3657 ( 3.0%) 3589 ( 2.9%) 3451 ( 2.8%) 39762 (32.6%) | III III I I | 122144 (100.0%) | 0 (0.0%) |
| 7 | Model [character] | 1. Mustang 2. Corvette 3. F-150 4. Camaro 5. Other 6. F-250 7. Other Pickups 8. Wrangler 9. 3-Series 10. C-10 [ 4279 others ] | 4478 ( 3.7%) 3183 ( 2.6%) 2561 ( 2.1%) 2164 ( 1.8%) 2020 ( 1.7%) 1858 ( 1.5%) 1813 ( 1.5%) 1721 ( 1.4%) 1470 ( 1.2%) 1451 ( 1.2%) 98852 (81.3%) | 121571 (99.5%) | 573 (0.5%) | |
| 8 | Year [numeric] | Mean (sd) : 3959.4 (198451.4) min < med < max: 0 < 2000 < 20140000 IQR (CV) : 31 (50.1) | 148 distinct values | : : : : : | 122144 (100.0%) | 0 (0.0%) |
| 9 | Trim [character] | 1. XLT 2. SE 3. LT 4. – 5. Limited 6. GT 7. LX 8. LS 9. Convertible 10. Sport [ 24972 others ] | 1477 ( 2.0%) 1096 ( 1.5%) 956 ( 1.3%) 854 ( 1.2%) 814 ( 1.1%) 813 ( 1.1%) 795 ( 1.1%) 736 ( 1.0%) 690 ( 0.9%) 677 ( 0.9%) 64333 (87.8%) | 73241 (60.0%) | 48903 (40.0%) | |
| 10 | Engine [character] | 1. 350 2. V8 3. V6 4. 5.0 5. 5.7 6. 302 7. 6 8. 3.6 LITER V6 ENGINE 9. 5.7L Gas V8 10. 3L V6 24V [ 22391 others ] | 2085 ( 2.2%) 1894 ( 2.0%) 929 ( 1.0%) 612 ( 0.6%) 590 ( 0.6%) 538 ( 0.6%) 485 ( 0.5%) 437 ( 0.5%) 435 ( 0.5%) 432 ( 0.5%) 86650 (91.1%) | 95087 (77.8%) | 27057 (22.2%) | |
| 11 | BodyType [character] | 1. Sedan 2. Coupe 3. SUV 4. Convertible 5. Standard Cab Pickup 6. Crew Cab Pickup 7. Hatchback 8. Extended Cab Pickup 9. Wagon 10. 4dr Car [ 2321 others ] | 18216 (18.0%) 18046 (17.8%) 15353 (15.1%) 12327 (12.2%) 4289 ( 4.2%) 3708 ( 3.7%) 2805 ( 2.8%) 2693 ( 2.7%) 2586 ( 2.6%) 2369 ( 2.3%) 18970 (18.7%) | III III III II | 101362 (83.0%) | 20782 (17.0%) |
| 12 | NumCylinders [numeric] | Mean (sd) : 17586.5 (6144603) min < med < max: 0 < 6 < 2147483647 IQR (CV) : 4 (349.4) | 17 distinct values | : : : : : | 122144 (100.0%) | 0 (0.0%) |
| 13 | DriveType [character] | 1. RWD 2. 4WD 3. FWD 4. AWD 5. – 6. 2WD 7. 4x4 8. REAR WHEEL DRIVE 9. Front Wheel Drive 10. 4X4 [ 2806 others ] | 42207 (43.4%) 20143 (20.7%) 17029 (17.5%) 9256 ( 9.5%) 725 ( 0.7%) 572 ( 0.6%) 336 ( 0.3%) 213 ( 0.2%) 174 ( 0.2%) 172 ( 0.2%) 6478 ( 6.7%) | IIIIIIII IIII III I | 97305 (79.7%) | 24839 (20.3%) |