Article

Geocoding: The underappreciated science of catastrophe modeling

May 21, 2018
| United Kingdom, United States, Bermuda +1 more
  • Canada

By Tim Edwards

“Geocoding” is the ability to assign descriptive address information to portfolio locations for the purpose of assessing hazard at a given location. It is a core step that takes place within the catastrophe modeling process, ranging from a single location assessment to bulk look-ups for an entire portfolio.

Despite advances in peril mapping and geocoding technology, the geocoding process often remains a challenge. Here we consider these challenges and how the quality of the data can be improved.

Is the value of high-quality geocoding fully understood?

Many parts of the world are moving toward achieving finer resolutions of data, but as an industry, how much of an understanding do we have over the quality and sensitivity of this data (Figure 1)?

Figure 1. Illustration shows how improving geocoding resolution can increase the accuracy of risk assessment.
Figure 1. Illustration shows how improving geocoding resolution can increase the accuracy of risk assessment.

Source: RMS U.K. flood model outline


At a time of heightened introspection about the uses for exposure data and associated modeling outputs, we consider these challenges and how they can best be overcome.

When is high-resolution geocoding most needed?

There are numerous instances when having a high geographic resolution is essential — for example, when modeling a portfolio of risks with large sums insured where the accumulation of risk across multiple assureds is important to consider.

For instance, when insuring high-value assets such as industrial facilities, to derive an accurate output, it is essential to separately capture the risk from multiple buildings. To get this level of granularity, it is essential to have high-resolution data and geocoding.

For certain perils that are localized in their extent — floods, earthquakes, fires — high-resolution peril modeling is crucial to assess exactly where the insured asset is located.

Frequent changes to high-resolution geocoding data

However, high-resolution geocoding is subject to more frequent changes in geographic information over time. Street networks, postal code or other administrative boundaries can frequently change annually or even quarterly, resulting in multiple versions of the same geographic location needing to be stored, with models then requiring periodic updates. Accurately capturing geographic boundaries or locating a precise building address requires the most up-to-date geographic data available in the market.

High-resolution geocoding information tends to change more often due to population increases (or decreases), new construction developments or politically led boundary changes. By contrast, coarser administrative areas (i.e., CRESTA zones, states, provinces and regions) have fewer boundary changes and are therefore updated much less frequently (Figure 2).

Figure 2. Increasing the analysis resolution — changes to Colonia level (Admin 3) geocoding for Mexico, RMS v16 (left) and RMS v17 (right) due to enhanced geographic coverage

Figure 2. Increasing the analysis resolution — changes to Colonia level (Admin 3) geocoding for Mexico, RMS v16 (left) and RMS v17 (right) due to enhanced geographic coverage
In v17 there is broader coverage, which will lead to more successful matches at the Colonia level if data are captured at this resolution.

How to evaluate whether high resolution equals better data

An evaluation of the impact to modeled loss from geographic data changes is best practice to determine if such an update is warranted. The longer the time span implementing such updates, the larger the change in loss can be expected for a given location, account or portfolio. Therefore, if geocoding data are updated every one to three years, loss changes can typically range from a fraction of a percent to less than 5%. However, if longer than three years, it is not uncommon to see loss changes greater than 5%, 10% or even 20%, in some cases.

When assessing geocoding quality is there too much focus on achieving the highest resolution geocoding possible? In seeking to locate the precise location for a portfolio of risks there is a trade-off between obtaining a high rate of completeness with that of accuracy.

“Completeness” in this context, indicates the proportion of the portfolio of risks that are geocoded to a high resolution. To increase the robustness of any portfolio analysis there is a perceived need to achieve as high a rate as possible to the highest geocoding resolution.

However, this can sometimes be achieved at the expense of accuracy. This is a reflection of how close the modeled and actual coordinates are to one another. Inaccurate geocoding renders any analysis on such data fairly redundant (Figure 3).

Figure 3. Theoretical yet commonly observed example showing how an increase in geocoding “completeness” in moving to point B is typically at the expense of accuracy

Figure 3. Theoretical yet commonly observed example showing how an increase in geocoding “completeness” in moving to point B is typically at the expense of accuracy

Comparing the accuracy of geocoding between two or more providers

Obtaining latitude/longitude coordinates from two providers enables a comparison of these coordinates. Via some simple coordinate geometry, distances between points can be calculated, and the cumulative frequency of distance between providers can be assessed (Figure 4).

Figure 4. In this example evaluating the distance between coordinates from providers A and B shows that 90% of locations have coordinates within 3,000 meters of one another

Figure 4. In this example evaluating the distance between coordinates from providers A and B shows that 90% of locations have coordinates within 3,000 meters of one another

The locations that have either the greatest distance and/or the highest level of risk associated with them can then be validated against local insight or alternative providers to gauge which geocoder provides more accurate coordinates (Figure 5).

Figure 5a. Geocoder A is correct: locates Rue Lavoisier, Armentières, 59160 as being in Lille, in northeast France

Figure 5a. Geocoder A is correct: locates Rue Lavoisier, Armentières, 59160 as being in Lille, in northeast France
Figure 5b. Geocoder B is incorrect: locates Rue Lavoisier, Armentières, 59160 as being near the Pyrenees

Figure 5b. Geocoder B is incorrect: locates Rue Lavoisier, Armentières, 59160 as being near the Pyrenees

In summary, the key to a robust approach to geocoding is to:

  • Use the highest level of geocoding possible, especially for portfolios with high sums insured that are exposed to perils with high hazard gradients.
  • Use geocoding providers that regularly update detailed geocoding data to consider changes in boundaries and to be able to readily quantify the impact on loss assessments.
  • Use intelligent algorithms that can read address data in multiple languages and account for text being misspelt or jumbled with numbers.
  • Be mindful of the trade-off between geocoding completeness and accuracy. Any high-resolution geocode should not be achieved by compromising the accuracy of the original data.

Contact


Related solution