We can think of the path a passenger takes as two functions with respect to \(t\). That is, \(path=(x(t),y(t))\). Then what we really would like to know is
Randomness is inserted through the \((x,y)\) coordinates; so that \(x|t~N(x(t-1)+\mu,\sigma^2)\) and, similarly, \(y|t~N(y(t-1)+\mu,\sigma^2)\) so that there is a drift plus the last term. It would be interesting to see if there are correlations between \(x\) and \(y\).
The rate of accidents, x, and y would also depend on other cars, weather, and geography of the area. This is where spatial statistics could help. Spatial statistics is defined as autocorrelated \((x,y)\) coordinates.
Additionally, what we really care about is defining the riskiness profile of a driver. That is, we create a risk score where risk is defined as the rate of accident. We can think of instantaneous risk as a function of \((x,y)\), that is \(inst_risk=f(x(t),y(t))\) and the overall risk as \(risk=\int_{(x,y)_{start}}^{(x,y)_{end}}\). Since insurance is priced on a per exposure basis, we would like to model risk per mile then, we would need a separate model to estimate the number of miles so that we get the following:
\[E[number of accidents]=\frac{\text{number of miles}}{mile}\cdot E[\frac{total miles}{time}].\]
The first model is hard. The second model is easy since we can multiply a lagging average number of miles per day by remaining days.
To get a list of riskiness factors, google search the following:
Causes of accidents.
Map of car accidents.
Time Since Last Drive: proxy for sleepiness, lateness, or busyness.
Time Since Last Accident: proxy for memory.
Time/Date of Driving: proxy for drunkenness.
\(speed/arccos(\theta)\): reckless driving.
Score for reckless driving.
Geography or spatial feature: proxy for weather or traffic.
Lane changiness.
Sharp Curves.
Racing.
Density of area or street.
Going 55 in a 20 and at a 90 degree turn is worse than going 80 on a 65 on the highway. This implies that there is a curviness/turniness interaction.
There seems to be a latitude/longitude interaction; meaning that certain areas and nearby regions are likely to have accidents whereas some streets had almost no accidents.
There are a lot of accidents at intersections. Maybe measure number of turns > 75degrees.
There is definitely a day/time/location interaction since drunk people are out on friday nights at specific locations.
It would be interesting to see if given accidents is 1, the probability of being in another accident.
Additionally, given accident=0, the probability of being (or reporting) an accident.
Maybe instead of risky drivers, those drivers that have been in accidents go on the same road.
We have a vector-valued function \(path(t)=(x(t),y(t))\) and is a position vector.
We can think of each point as a tuple {(x,y),driver action, degree, direction}
where:
driver \(\in {p_1,...,p_n}\).
action \(\in {stop, go, turn left, turn right}\)
degree \(\in [0,180]\)
direction \(\in [0,360]\)
I expect to see seasonality weekly, monthly, and a trend yearly.
I expect to be able to causally model events in the future; namely accidents in the future.
I expect driver coefficients to evolve over time.