Adventures in GIS Programming: Techniques for Correcting Geospatial Data from Low-End Devices

GPS-Map

The world is full of data and events that we might wish to measure or track. We normally resort to all kinds of sensors to help pick up data points. But sensors are noisy and we cannot completely rely on them for the perfect data points. The GPS on my phone gives me different coordinates every time I sit in my favourite spot in the living room to watch the games. I used to have a thermometer which I would place in the fridge for some time then compare the temperature reading with the one displayed on the fridge. They were almost always different.

Most of the time the solution to these slight variations are obvious. If I find that the readings on my thermometer are different from the ones reported by the fridge, I could round off the temperature readings, ignore the temperature difference or simply buy a new fridge.

But what do we do when the sensor is very noisy, or the environment makes data collection difficult, or the user acts in a way that causes the sensor to pick the wrong data point? We may be trying to track the movement of all vehicles in our fleet. We may want to ensure that our field mapping drone mapped the entire field correctly. I work on mobile applications that collect geospatial data and the mobile devices, mostly low-end devices, create noisy and unreliable data points.

In this article, I will share some of the GIS programming techniques to use to solve these sorts of filtering problems for geospatial data. The geospatial data I will be working with is GPS Polygons (or Map polygons).

I use many different algorithms, some are well-known while others are simply my own techniques to solve the problem. In both cases, the goal is to transform the coordinates of a given GPS polygon into something that is more representative of a decently mapped field. The coordinates were collected using an Android device and no other geospatial information was collected besides the coordinates. For visualisation, I will be using the Geojson mapping tool tool at https://geojson.io/

Before we go into the techniques, let’s understand what a bad map polygon looks like and some of the causes.

Consider the sample polygons below

You will quickly notice that there are many problems with the polygons

  1. Spikes: Abrupt and extreme coordinates may be found in geospatial measurements that are far away from the actual values. These are noise, or outliers, if you like, and they are called spikes. What causes these extreme deviations you ask? It is not always a fault of the device; electrical interference from nearby electrical power lines or even lightning frequently caused spikes. These two sources of interference also affect magnetometers and conductivity meters .
  2. Sharp Angles: You may also find sharp (<30o) angles. Depending on the kind of field, sharp angles can be bad coordinates. Note that this is solely field-type and use-case dependent. For some use-cases, a 30o angle is a perfectly good angle, for others, even a 90o angle may be considered bad.
  3. Lakes: This is not a standard term. By lakes, I am referring to an area of a polygon that seems to drift away from the bigger polygon. It is formed when a set of coordinates form separate polygons outwards that are somewhat still attached to the bigger polygon. This usually happens when the user who picked the polygon tried to dodge (or go around) an obstacle on their path
  4. Islands: This is also a non-standard term. It is the opposite of a lake. An Island occurs when regions of your polygon partially or fully enclose an empty area as seen in the figure above. Instead of going outward around an obstacle, a field mapper may decide to go inward around the obstacle, thus creating an Island.
  5. Switchbacks, Knots, and Loops: These types of errors are introduced when the digitizer has an unsteady movement as they go around the field to be digitized. The polygon ends up with extra vertices and/or nodes.  In the case of switchbacks, the polygon contains extra vertices and the line ends up with a bend in it.  With knots and loops, the line folds back onto itself and sometimes intersect to form an X and small polygon-like geometry which I call a lake.

Other problems Include

  • Duplicate coordinates: When the rate at which a digitizer picks a single coordinate is faster than the rate at which GPS sensors receive coordinates updates, then there will be duplicate coordinates in the data. Duplicates coordinates can be nuances especially when you plan to run some expensive data processing algorithms on the polygons.
  • Too many coordinates: Purely requirements-based. In my experience, I have seen organisations who require their polygons to contain not more than a particular number of coordinates. The big question would then be, if you have a polygon that has more than the required number of coordinates, which do you remove and which do you keep knowing that each coordinate you remove alters the outline of the polygon?
Is there an industry standard algorithm for solving these kinds of problems?

Yes. In 1960, Rudolf E. Kálmán, a Hungarian-American Mathematician developed an algorithm known as the Kalman filter that is used in signal processing and control systems. The Kalman filter uses a series of observed measurements over time to estimate an unknown measurement. It does this by using a system’s dynamic model (e.g. physical laws of motion), known control inputs to that system, and multiple sequential measurements (such as coordinates) to form an estimate of the system’s varying quantities. Engineers have adapted this algorithm in various fields including in geospatial data collection to clean out noise. In geospatial data collection, the algorithm uses GPS accuracy, speed and altitude as known control inputs and a polygon’s coordinates as the sequence of measurement. Thus, it is an example of a data/sensor fusion algorithm – a holistic one.

In my case, however, all I have is the coordinates. No GPS accuracy. No speed data. No altitude information. To use a Kalman filter, I will need to create my own adaptation of the algorithm that does not require any control inputs. So unfortunately, I will not apply the Kalman filter.

Now, let us review some techniques for solving polygon errors.

Removing Duplicate coordinates

Two coordinates are duplicates if their longitude and latitude values are the same. Removing the coordinates is a precursor to all the algorithms and techniques I will use, so it is an important step. Duplicate coordinates do not contribute anything to polygons and removing them means less coordinates for me to work with. So there is nothing to lose.

My approach to removing duplicates is a simple python code that iterates through a list of coordinates, adds each coordinate to a new list, but checks if the coordinate already exists before doing so. The result is a new list of unique coordinates. Once duplicates are out of the way, we can apply any of the smoothing techniques below

Using law of cosines to detect sharp angles

This technique uses the law of cosines to detect sharp angles. In trigonometry, the law of cosines relates the lengths of the sides of a triangle to the cosine of one of its angles. In very simple terms, according to the law of cosine, if you know the length of the 3 sides of a triangle, then you will be able to calculate all of its interior angles. In this case, a length is the distance between 2 coordinates.

Sample polygon with a sharp angle

From this sample polygon, if we know distances a, b and c, then we can find the angle B thus:

Cosine formula
The cosine formula on which this technique is based

To implement this technique, You will need to pick every consecutive 3 coordinates in a polygon, and calculate angle B. You can calculate distances a, b and c from the coordinate. I used a nice python module called geopy to do this. If the angle is less than a threshold (30o in my case), you can either remove the coordinate at B from the list or try to smooth it. Note that if you smooth it, you may still have a little bump around the area. But removing the coordinate will remove every trace of its existence. This method works so well and it is capable of removing even spikes.

Be careful not to make the threshold angle too large. Make it too large and you will end up with a polygon that looks nothing like the original.

In my experience, removing the coordinates at sharp angles tends to create new sharp angles sometimes. You can solve this by repeating this process on the resulting list of coordinates. Repeating the process 2-3 more times should be enough to remove all sharp angles including any new ones created in the process. A more sophisticated approach would be to remove or smooth based on the angle B and length of sides c and a. Which means  instead of removing every offending coordinate or smoothing every offending coordinate, we could smooth coordinates that can be smoothed and remove those that cannot be smoothed.

Here’s the result of this technique on the map above

Spikeless Polygon
Polygon after applying the law of cosines

Notice that the long spike has gone. The cross lines have also gone. Also, not very noticeable, but many small sharp angles have gone as well. We still have one stubborn spike though. We also have the Island to get rid of. Let’s see if the next technique helps.

Nearest Neighbour Smoothing

This is a correction technique that smooths coordinates by sampling neighbouring coordinates. The aim is to correct erratic coordinates in polygons, not to remove coordinates.

Smoothing is most commonly achieved by iterating through and changing the coordinates based on the values of the neighbouring coordinates. For example, we can change every latitude and longitude with an algorithm such as the following:

Nearest neighbour smoothing algorithm
Nearest neighbour smoothing algorithm

The bigger the coefficient, the bigger the impact of the corresponding neighbouring coordinate to the modified location of the current coordinate. The coefficients I use in this example (0.3, 0.4, 0.3) are somewhat arbitrary, but in most cases you will want their sum to equal 1.0. (A more sophisticated approach, for example, would be to use the distance between points and then, the closer the point, the bigger the corresponding coefficient.)

This is the result after performing a nearest neighbour smoothing.

Nearest neighbour smoothing
Nearest neighbour smoothing

Notice how the angle on the left area has adjusted, not removed, but adjusted. Note also that the stubborn spike has reduced to a stump. We still have the Island though. At this point, it is completely up to you and your specific use-case to apply more techniques or stop.

Compute the Convex-hull of the polygon

In computational geometry, the convex hull of a set is the smallest convex polygon that contains all the points of it. To say that in simpler terms, given a set of points in a Euclidean space, the smallest polygon that contains all the points in the set is the convex hull. In the Image below, the left box contains a set of points. The right box shows the convex hull of the points on these points.

Convex-hull diagram
Convex-hull diagram

Computing the convex hull of our polygon is the final blow that will eventually knock off any Islands, lakes or leftover stumps.

Here is the convex hull. A clean polygon without spikes, sharp angles, Islands, lakes or cross-lines.

Convex hull of polygon
Convex hull of polygon
How does the final polygon compare to the original polygon?

We can apply colours and plot the final map on top of the original map for comparison.

Final cleaned out polygon
Final cleaned out polygon

Beautiful isn’t it? In this image, the red polygon is the original polygon that we started with and the green polygon is the final polygon, after we made all corrections.

Other Techniques you can try out

Least Squares fit: Just because it is least squares fit does not mean that it must to be linear. If you have time or acceleration information, you can least-squares-fit a quadratic curve to the coordinates. Then this would fit a scenario in which the user is accelerating. Note that by least squares fit I mean using the coordinates as the dependent variable and time as the independent variable.

Minimum Spanning Tree: Don’t quote me on this one. I perceive there is some possibility of applying the MST algorithm with the shortest path algorithm to automatically remove spikes. I am just not sure if it will work in all cases.

Putting Everything together

We’ve discussed some of the more common types of GPS polygon errors to expect with low-end GPS devices. We’ve provided an understanding of what causes them as well as some GIS programming techniques for correcting them. We’ve seen that in some cases, you may need to combine multiple techniques to get a good output. Because one technique alone might not be sufficient.

In some cases, we can correct the polygons with a high degree of confidence. While In other cases, we can at least alert the user to portions of the track that appear questionable.

In many cases, the techniques we’ve outlined can be a satisfactory “80% solution”. This is good enough to provide a reasonable level of automated improvement of the accuracy of GPS polygons.

You may also like...