Not all telematics data is created equally

July 27, 2017
| United States

Telematics data is changing automobile insurance. There are more than 20 filed usage-based insurance (UBI) products in four states in the U.S. and at least 10 in almost every state. These programs are giving insurers an opportunity to differentiate their product and create a positive dialogue with their customers (as opposed to only communicating through bills or claim handling). But the primary reason most insurers are turning to telematics is the unprecedented improvement in pricing accuracy. This is only possible if the right data is analyzed in the right way.

Granular telematics data is the right data

A starting point for many insurers has been collecting summarized telematics data. It is easy to manage and analyze basic data such as total miles driven, number of harsh brakes, percentage of driving during high risk times or on different road types, average trip duration and average speed. While this approach may work in the short term, much of the predictive power of telematics data is lost when summarized.

Granular telematics data, typically, second-by-second GPS data and higher-frequency accelerometer and other sensor readings, is more analytically complex and incurs additional overhead costs for storage and management. However, for auto insurers, the benefit of granular data will prove to be well worth the additional effort and cost. Granular data benefits include:

  • Contextual data. If you know the location of a vehicle at every second, you can add additional predictive factors to your model. With this, you can learn whether the trip was on a highway or in a city, the type of junctions the driver needed to navigate and whether it was sunny or snowy.
  • Analytical flexibility. Relying on events and averages limits any analysis to the collected events. For example, most harsh braking thresholds are set at levels such that 99.9% of the deceleration events are discarded. With more granular information you can study any behavior, at any threshold, at any point in the future. This will be particularly helpful when insurers need to quantify the impact of non- uniform data elements defined and provided by OEMs and others who can only provide non-granular data.
  • Behavior modification. Highly summarized information can identify drivers who are more likely to have an accident, but does not necessarily identify behaviors that cause accidents. A driver with a lot of harsh brakes is more likely to have an accident, but harsh brakes themselves don’t cause accidents (but rather help drivers avoid them). If you want to reduce accidents, you need to change the behaviors that cause accidents. And you can only identify these behaviors if you have the granular telematics data.

Modeling approach matters too

The goal of an insurance pricing model is to differentiate drivers who are more or less likely to have an accident. To do this, an insurer will use predictive modeling techniques with driving behaviors as the predictors and the associated insurance claims as the response variable. The best modeling approach uses granular data and actual claims. Other approaches can be used, but have significant flaws and will ultimately not be as predictive (Figure 1).

Figure 1. What different levels of data analysis tell you — or may not tell you

Willis Towers Watson Media

Judgmental approach

When actual claim data isn’t available, the score is typically developed judgmentally. The analyst must correctly identify risk behaviors, quantify how much or less risky each behavior is, and to determine how to adjust for attributes that may be highly correlated (e.g., total miles, average trip length, road type and speed) to avoid double counting. The results of this type of educated guesswork can be far from intended. For example, we have tested scores developed judgmentally and found minimal benefits compared to just using verified mileage. In one case, the score was actually worse than using verified mileage. Given that mileage was a component of that score, it meant the combination of the other behaviors added judgmentally destroyed some of the value of verified mileage.

Retrospective study

Some insurers use prior claim history instead of the actual claims during the period the driver was tracked. By definition, this type of model is predicting whether a person who exhibits certain driving behaviors was more or less likely to have had an accident in the past. The effectiveness of this approach, therefore, relies on the assumption that people continue to drive in a similar way. This should be a major concern, given that the prior driving occurred before the driver was being monitored and incentivized to change their driving. It is a particularly questionable assumption when the insurers’ telematics product (and, thereby, the score) intends to effectively change driving behavior.

Prospective study with summarized telematics data

When initially working with telematics data, most insurance analysts will create a data set with a record for each driver or vehicle that includes a series of summarized telematics factors and the number of associated claims that occurred during the time period the data was collected. This is a valid approach if you only have nongranular data and will, by definition, result in a model that predicts how likely it is that drivers who exhibit those patterns are going to have a claim. However, the best this approach can hope for is to determine driving profiles that correlate with the chance of having an accident. It is impossible to determine causality.

Prospective study with granular telematics data

If a program goal is to provide the driver with information to change their driving behaviors, the insurer must identify driving choices that lead to accidents. This requires knowing exactly when accidents occurred and structuring the data set such that an analyst can study choices that led to accidents. Care must be taken to avoid structuring the data in such a way as to create a self-fulfilling model. For example, many accidents are preceded by a harsh brake. Including the pre-crash harsh brake and the accident in the data set will lead to harsh brakes looking like a highly predictive causal factor. Instead, the analyst needs to determine what behavior caused the need to brake harshly to avoid the impending accident. This type of analysis is only possible with granular telematics data.

Telematics data interacts with traditional rating factors

If telematics data was the only data that was going to be used for rating, the job would be done. But there are a number of other traditional factors that are proxies for how, how much, when and where a vehicle is operated that are also used by insurers (Figure 2). Combining these factors optimally with telematics factors is a nontrivial challenge.

To do it effectively, the analyst must create a data set that includes the telematics factors (or resulting score), traditional factors and the associated claim data. Then, by analyzing that data set within a multivariate framework, the analyst can determine the optimum combination of telematics and traditional factors.

Figure 2. Traditional rating factors vs. telematics data — there’s obviously an overlap

Willis Towers Watson Media 

Data inequality will grow

Right now optimizing the combination of traditional and telematics data isn’t a priority, as telematics data isn’t generally collected until after the initial policy is written. Insurance companies tend to modify their traditional rating plan to remove the most obvious overlaps (e.g., annual mileage discounts) and apply the telematics score to the otherwise applicable rate.

However, in the near future, auto manufacturers making connected cars, telecommunication companies and others will make the telematics data available at the point of initial policy issuance. When this data is broadly available, insurers that create a fully formed rating plan will have a huge competitive advantage and subject the rest of the industry to potentially severe adverse selection.