Dataset name: STAT 8051 Kaggle Competition Codebook - Group 4
Basic summary statistics and codebook, excluding ID variable, for the training dataset from the 2020 Travelers Modeling Competition - Predicting Claim Cost
Metadata for search engines
Date published: 2020-12-13
Creator:
| name | value |
|---|---|
| 1 | Linh Nguyen |
|
#Variables
Market value of the vehicle in $10,000’s
Distribution of values for veh_value
0 missing values.
| name | label | data_type | n_missing | complete_rate | min | median | max | mean | sd | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| veh_value | Market value of the vehicle in $10,000’s | numeric | 0 | 1 | 0 | 1.6 | 26 | 1.866813 | 1.283358 | <U+2587><U+2581><U+2581><U+2581><U+2581> |
The basic unit of risk underlying an insurance premium
Distribution of values for exposure
0 missing values.
| name | label | data_type | n_missing | complete_rate | min | median | max | mean | sd | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| exposure | The basic unit of risk underlying an insurance premium | numeric | 0 | 1 | 0.0028 | 0.46 | 1 | 0.4778526 | 0.2846047 | <U+2587><U+2587><U+2587><U+2586><U+2586> |
Type of vehicles
Distribution of values for veh_body
0 missing values.
| name | label | data_type | ordered | value_labels | n_missing | complete_rate | n_unique | top_counts |
|---|---|---|---|---|---|---|---|---|
| veh_body | Type of vehicles | factor | FALSE | 1. BUS, 2. CONVT, 3. COUPE, 4. HBACK, 5. HDTOP, 6. MCARA, 7. MIBUS, 8. PANVN, 9. RDSTR, 10. SEDAN, 11. STNWG, 12. TRUCK, 13. UTE |
0 | 1 | 13 | SED: 7410, HBA: 6347, STN: 5348, UTE: 1529 |
Age of vehicles
Distribution of values for veh_age
0 missing values.
| name | label | data_type | n_missing | complete_rate | min | median | max | mean | sd | n_value_labels | hist |
|---|---|---|---|---|---|---|---|---|---|---|---|
| veh_age | Age of vehicles | haven_labelled | 0 | 1 | 1 | 3 | 4 | 2.676382 | 1.067311 | 2 | <U+2585><U+2581><U+2586><U+2581><U+2581><U+2587><U+2581><U+2587> |
| name | value |
|---|---|
| Youngest | 1 |
| Oldest | 4 |
Gender of driver
Distribution of values for gender
0 missing values.
| name | label | data_type | ordered | value_labels | n_missing | complete_rate | n_unique | top_counts |
|---|---|---|---|---|---|---|---|---|
| gender | Gender of driver | factor | FALSE | 1. F, 2. M |
0 | 1 | 2 | F: 12850, M: 9760 |
Driving area of residence
Distribution of values for area
0 missing values.
| name | label | data_type | ordered | value_labels | n_missing | complete_rate | n_unique | top_counts |
|---|---|---|---|---|---|---|---|---|
| area | Driving area of residence | factor | FALSE | 1. A, 2. B, 3. C, 4. D, 5. E, 6. F |
0 | 1 | 6 | C: 6846, A: 5436, B: 4445, D: 2723 |
Driver’s age category
Distribution of values for dr_age
0 missing values.
| name | label | data_type | n_missing | complete_rate | min | median | max | mean | sd | n_value_labels | hist |
|---|---|---|---|---|---|---|---|---|---|---|---|
| dr_age | Driver’s age category | haven_labelled | 0 | 1 | 1 | 3 | 6 | 3.488147 | 1.426596 | 2 | <U+2583><U+2586><U+2581><U+2587><U+2587><U+2581><U+2585><U+2583> |
| name | value |
|---|---|
| Young | 1 |
| Old | 6 |
Indicator of claim
Distribution of values for claim_ind
0 missing values.
| name | label | data_type | n_missing | complete_rate | min | median | max | mean | sd | n_value_labels | hist |
|---|---|---|---|---|---|---|---|---|---|---|---|
| claim_ind | Indicator of claim | haven_labelled | 0 | 1 | 0 | 0 | 1 | 0.0678461 | 0.2514872 | 2 | <U+2587><U+2581><U+2581><U+2581><U+2581><U+2581><U+2581><U+2581> |
| name | value |
|---|---|
| No | 0 |
| Yes | 1 |
The number of claims
Distribution of values for claim_count
0 missing values.
| name | label | data_type | n_missing | complete_rate | min | median | max | mean | sd | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| claim_count | The number of claims | numeric | 0 | 1 | 0 | 0 | 3 | 0.0720035 | 0.2745954 | <U+2587><U+2581><U+2581><U+2581><U+2581> |
Claim amount
Distribution of values for claim_cost
0 missing values.
| name | label | data_type | n_missing | complete_rate | min | median | max | mean | sd | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| claim_cost | Claim amount | numeric | 0 | 1 | 0 | 0 | 57896 | 140.0076 | 1123.338 | <U+2587><U+2581><U+2581><U+2581><U+2581> |
JSON-LD metadata
The following JSON-LD can be found by search engines, if you share this codebook publicly on the web.
{
"name": "STAT 8051 Kaggle Competition Codebook - Group 4",
"description": "Basic summary statistics and codebook, excluding ID variable, for the training dataset from the 2020 Travelers Modeling Competition - Predicting Claim Cost\n\n\n## Table of variables\nThis table contains variable names, labels, and number of missing values.\nSee the complete codebook for more.\n\n|name |label | n_missing|\n|:-----------|:------------------------------------------------------|---------:|\n|veh_value |Market value of the vehicle in $10,000’s | 0|\n|exposure |The basic unit of risk underlying an insurance premium | 0|\n|veh_body |Type of vehicles | 0|\n|veh_age |Age of vehicles | 0|\n|gender |Gender of driver | 0|\n|area |Driving area of residence | 0|\n|dr_age |Driver’s age category | 0|\n|claim_ind |Indicator of claim | 0|\n|claim_count |The number of claims | 0|\n|claim_cost |Claim amount | 0|\n\n### Note\nThis dataset was automatically described using the [codebook R package](https://rubenarslan.github.io/codebook/) (version 0.9.2).",
"creator": "Linh Nguyen",
"datePublished": "2020-12-13",
"keywords": ["veh_value", "exposure", "veh_body", "veh_age", "gender", "area", "dr_age", "claim_ind", "claim_count", "claim_cost"],
"@context": "http://schema.org/",
"@type": "Dataset",
"variableMeasured": [
{
"name": "veh_value",
"description": "Market value of the vehicle in $10,000’s",
"@type": "propertyValue"
},
{
"name": "exposure",
"description": "The basic unit of risk underlying an insurance premium",
"@type": "propertyValue"
},
{
"name": "veh_body",
"description": "Type of vehicles",
"value": "1. BUS,\n2. CONVT,\n3. COUPE,\n4. HBACK,\n5. HDTOP,\n6. MCARA,\n7. MIBUS,\n8. PANVN,\n9. RDSTR,\n10. SEDAN,\n11. STNWG,\n12. TRUCK,\n13. UTE",
"@type": "propertyValue"
},
{
"name": "veh_age",
"description": "Age of vehicles",
"value": "1. Youngest,\n4. Oldest",
"maxValue": 4,
"minValue": 1,
"@type": "propertyValue"
},
{
"name": "gender",
"description": "Gender of driver",
"value": "1. F,\n2. M",
"@type": "propertyValue"
},
{
"name": "area",
"description": "Driving area of residence",
"value": "1. A,\n2. B,\n3. C,\n4. D,\n5. E,\n6. F",
"@type": "propertyValue"
},
{
"name": "dr_age",
"description": "Driver’s age category",
"value": "1. Young,\n6. Old",
"maxValue": 6,
"minValue": 1,
"@type": "propertyValue"
},
{
"name": "claim_ind",
"description": "Indicator of claim",
"value": "0. No,\n1. Yes",
"maxValue": 1,
"minValue": 0,
"@type": "propertyValue"
},
{
"name": "claim_count",
"description": "The number of claims",
"@type": "propertyValue"
},
{
"name": "claim_cost",
"description": "Claim amount",
"@type": "propertyValue"
}
]
}`