Metadata

Description

Dataset name: STAT 8051 Kaggle Competition Codebook - Group 4

Basic summary statistics and codebook, excluding ID variable, for the training dataset from the 2020 Travelers Modeling Competition - Predicting Claim Cost

Metadata for search engines

  • Date published: 2020-12-13

  • Creator:

name value
1 Linh Nguyen
x
veh_value
exposure
veh_body
veh_age
gender
area
dr_age
claim_ind
claim_count
claim_cost

#Variables

veh_value

Market value of the vehicle in $10,000’s

Distribution

Distribution of values for veh_value

Distribution of values for veh_value

0 missing values.

Summary statistics

name label data_type n_missing complete_rate min median max mean sd hist
veh_value Market value of the vehicle in $10,000’s numeric 0 1 0 1.6 26 1.866813 1.283358 <U+2587><U+2581><U+2581><U+2581><U+2581>

exposure

The basic unit of risk underlying an insurance premium

Distribution

Distribution of values for exposure

Distribution of values for exposure

0 missing values.

Summary statistics

name label data_type n_missing complete_rate min median max mean sd hist
exposure The basic unit of risk underlying an insurance premium numeric 0 1 0.0028 0.46 1 0.4778526 0.2846047 <U+2587><U+2587><U+2587><U+2586><U+2586>

veh_body

Type of vehicles

Distribution

Distribution of values for veh_body

Distribution of values for veh_body

0 missing values.

Summary statistics

name label data_type ordered value_labels n_missing complete_rate n_unique top_counts
veh_body Type of vehicles factor FALSE 1. BUS,
2. CONVT,
3. COUPE,
4. HBACK,
5. HDTOP,
6. MCARA,
7. MIBUS,
8. PANVN,
9. RDSTR,
10. SEDAN,
11. STNWG,
12. TRUCK,
13. UTE
0 1 13 SED: 7410, HBA: 6347, STN: 5348, UTE: 1529

veh_age

Age of vehicles

Distribution

Distribution of values for veh_age

Distribution of values for veh_age

0 missing values.

Summary statistics

name label data_type n_missing complete_rate min median max mean sd n_value_labels hist
veh_age Age of vehicles haven_labelled 0 1 1 3 4 2.676382 1.067311 2 <U+2585><U+2581><U+2586><U+2581><U+2581><U+2587><U+2581><U+2587>

Value labels

Response choices
name value
Youngest 1
Oldest 4

gender

Gender of driver

Distribution

Distribution of values for gender

Distribution of values for gender

0 missing values.

Summary statistics

name label data_type ordered value_labels n_missing complete_rate n_unique top_counts
gender Gender of driver factor FALSE 1. F,
2. M
0 1 2 F: 12850, M: 9760

area

Driving area of residence

Distribution

Distribution of values for area

Distribution of values for area

0 missing values.

Summary statistics

name label data_type ordered value_labels n_missing complete_rate n_unique top_counts
area Driving area of residence factor FALSE 1. A,
2. B,
3. C,
4. D,
5. E,
6. F
0 1 6 C: 6846, A: 5436, B: 4445, D: 2723

dr_age

Driver’s age category

Distribution

Distribution of values for dr_age

Distribution of values for dr_age

0 missing values.

Summary statistics

name label data_type n_missing complete_rate min median max mean sd n_value_labels hist
dr_age Driver’s age category haven_labelled 0 1 1 3 6 3.488147 1.426596 2 <U+2583><U+2586><U+2581><U+2587><U+2587><U+2581><U+2585><U+2583>

Value labels

Response choices
name value
Young 1
Old 6

claim_ind

Indicator of claim

Distribution

Distribution of values for claim_ind

Distribution of values for claim_ind

0 missing values.

Summary statistics

name label data_type n_missing complete_rate min median max mean sd n_value_labels hist
claim_ind Indicator of claim haven_labelled 0 1 0 0 1 0.0678461 0.2514872 2 <U+2587><U+2581><U+2581><U+2581><U+2581><U+2581><U+2581><U+2581>

Value labels

Response choices
name value
No 0
Yes 1

claim_count

The number of claims

Distribution

Distribution of values for claim_count

Distribution of values for claim_count

0 missing values.

Summary statistics

name label data_type n_missing complete_rate min median max mean sd hist
claim_count The number of claims numeric 0 1 0 0 3 0.0720035 0.2745954 <U+2587><U+2581><U+2581><U+2581><U+2581>

claim_cost

Claim amount

Distribution

Distribution of values for claim_cost

Distribution of values for claim_cost

0 missing values.

Summary statistics

name label data_type n_missing complete_rate min median max mean sd hist
claim_cost Claim amount numeric 0 1 0 0 57896 140.0076 1123.338 <U+2587><U+2581><U+2581><U+2581><U+2581>

Missingness report

Codebook table

JSON-LD metadata The following JSON-LD can be found by search engines, if you share this codebook publicly on the web.

{
  "name": "STAT 8051 Kaggle Competition Codebook - Group 4",
  "description": "Basic summary statistics and codebook, excluding ID variable, for the training dataset from the 2020 Travelers Modeling Competition - Predicting Claim Cost\n\n\n## Table of variables\nThis table contains variable names, labels, and number of missing values.\nSee the complete codebook for more.\n\n|name        |label                                                  | n_missing|\n|:-----------|:------------------------------------------------------|---------:|\n|veh_value   |Market value of the vehicle in $10,000’s               |         0|\n|exposure    |The basic unit of risk underlying an insurance premium |         0|\n|veh_body    |Type of vehicles                                       |         0|\n|veh_age     |Age of vehicles                                        |         0|\n|gender      |Gender of driver                                       |         0|\n|area        |Driving area of residence                              |         0|\n|dr_age      |Driver’s age category                                  |         0|\n|claim_ind   |Indicator of claim                                     |         0|\n|claim_count |The number of claims                                   |         0|\n|claim_cost  |Claim amount                                           |         0|\n\n### Note\nThis dataset was automatically described using the [codebook R package](https://rubenarslan.github.io/codebook/) (version 0.9.2).",
  "creator": "Linh Nguyen",
  "datePublished": "2020-12-13",
  "keywords": ["veh_value", "exposure", "veh_body", "veh_age", "gender", "area", "dr_age", "claim_ind", "claim_count", "claim_cost"],
  "@context": "http://schema.org/",
  "@type": "Dataset",
  "variableMeasured": [
    {
      "name": "veh_value",
      "description": "Market value of the vehicle in $10,000’s",
      "@type": "propertyValue"
    },
    {
      "name": "exposure",
      "description": "The basic unit of risk underlying an insurance premium",
      "@type": "propertyValue"
    },
    {
      "name": "veh_body",
      "description": "Type of vehicles",
      "value": "1. BUS,\n2. CONVT,\n3. COUPE,\n4. HBACK,\n5. HDTOP,\n6. MCARA,\n7. MIBUS,\n8. PANVN,\n9. RDSTR,\n10. SEDAN,\n11. STNWG,\n12. TRUCK,\n13. UTE",
      "@type": "propertyValue"
    },
    {
      "name": "veh_age",
      "description": "Age of vehicles",
      "value": "1. Youngest,\n4. Oldest",
      "maxValue": 4,
      "minValue": 1,
      "@type": "propertyValue"
    },
    {
      "name": "gender",
      "description": "Gender of driver",
      "value": "1. F,\n2. M",
      "@type": "propertyValue"
    },
    {
      "name": "area",
      "description": "Driving area of residence",
      "value": "1. A,\n2. B,\n3. C,\n4. D,\n5. E,\n6. F",
      "@type": "propertyValue"
    },
    {
      "name": "dr_age",
      "description": "Driver’s age category",
      "value": "1. Young,\n6. Old",
      "maxValue": 6,
      "minValue": 1,
      "@type": "propertyValue"
    },
    {
      "name": "claim_ind",
      "description": "Indicator of claim",
      "value": "0. No,\n1. Yes",
      "maxValue": 1,
      "minValue": 0,
      "@type": "propertyValue"
    },
    {
      "name": "claim_count",
      "description": "The number of claims",
      "@type": "propertyValue"
    },
    {
      "name": "claim_cost",
      "description": "Claim amount",
      "@type": "propertyValue"
    }
  ]
}`