Calculating Shared Border Lengths Between Polygons

In a previous post, I showed how to use the aggregate function to find neighbor polygons using QGIS. Using aggregate functions on the same layer allows us to easily do geoprocessing operations between features of a layer. This is very useful in many analysis that would typically require writing custom python scripts.

Here I demonstrate another powerful function array_foreach that allows one to iterate over other features in QGIS expressions – enabling even more powerful analysis by writing just a single expression.

Here’s the problem statement. Given a polygon layer – such as US States – calculate what is the length of shared border between each polygon.

This will first require us to find out which states share border with each other and then calculate length of their intersection. For a pair of polygons – this can be done manually or by putting them in separate layers and intersecting them. But since we have 50 states – doing it for each pair of states manually is not possible. But with help of the QGIS expression engine, it can be done in using built-in functions without any custom code.

If you want to follow along, download the states shapefile cb_2018_us_state_500k.zip provided by the United States Census Bureau.

Locate the downloaded file in the Browser panel. Expand it and drag the cb_2018_us_state_500k.shp to the canvas. First we need to prepare the layer. Since the exercise involves calculating lengths, the layer needs to be in a projected CRS. (While QGIS can perform ellipsoidal length calculations for layers in geographic CRSs, the length function – which we need – can only uses layer CRS for calculation). From Processing Toolbox, search and locate the ‘Reproject layer’ algorithm.

Set the Target CRS to ‘EPSG:5070 NAD83 Conus Albers’. This is a general purpose projection with minimal distortions in the continental united states with meters as units. Once the processing finishes and the new ‘Reprojected’ layer loads, change the project CRS also to EPSG:5070. Next, search and locate the ‘Field Calculator’ algorithm.

We will now add a new string field called ‘neighbors’ which will have a comma-separated list of all neighboring states. The state code is contained in the column STUSPS. So we can use the aggregate function array_agg expression to find all intersecting features and extract the state code. The resulting list has the state itself since the geometry intersects with itself. We use the array_remove_all function to remove the code of the state being evaluated from the results.

array_to_string(array_remove_all(aggregate(
layer:= 'Reprojected',
aggregate:='array_agg',
expression:=STUSPS,
filter:=intersects($geometry, geometry(@parent))
), "STUSPS"))

A new layer ‘Calculated’ will load once the processing finishes. This layer now has a ‘neighbor’ column which has a list of all neighboring states for each feature.

Now we are ready to calculate the border lengths. The idea is to intersect each feature with all the neighboring features and calculate the length of the intersection. But the features are polygons and since they just touch each other, and the intersection of them will be empty. So the trick is to first convert the polygon features to polylines. The intersection of the 2 neighboring polyline features will be a polyline representing the border. You may be tempted to use the ‘Polygons to lines’ algorithm, but this algorithm assigns the shared boundary between the polygons to only one of the resulting features. The correct algorithm is ‘Boundary’ which creates the topological boundary of the polygon.

Once you have the ‘Boundary’ layer, we can apply our expression to compute the border lengths using the ‘Field Calculator’ algorithm.

We will add a new string field ‘lengths’ which will have length in miles for each neighboring state. The key function here is array_foreach. This function takes a list and applies an expression to each element and returns a list with computed values. Each member of the list can be accessed using @element. The expression can be broken down into following logical parts

  • Step 1: Convert the comma-separated list of neighbors to a list (function string_to_array)
  • Step 2: For each neighboring state code, find the corresponding feature geometry (function get_feature and geometry)
  • Step 3: Find the intersecting geometry between the feature and the neighbor (function intersection)
  • Step 4: Compute the length of the intersecting geometry. (function length)
  • Step 5: The length is computed in the layer CRS units which are meters, so we convert it to miles and round it down to the nearest integer (function floor)
  • Step 6: Convert the resulting list to a comma-separated string to be saved to the field. (function array_to_string)

The final expression is below

array_to_string(array_foreach(string_to_array( "neighbors" ),
floor(0.000621371*length(intersection(geometry(get_feature('Boundary', 'STUSPS', trim(@element))), $geometry)))))

The resulting ‘Calculated’ layer will have a field ‘length’ which will have length of the shared border for each neighbor.

A quick check on the results show they make sense. California shares it borders with Nevada, Arizona and Oregon with respective borders being 613 mi, 235 mi, 215 mi long. These match up quite well with lengths recorded in other sources.

Length of Complex Curves

From this dataset, we derive that the longest state border in the United States is between Texas and Oklahoma at 922 miles. This is much higher than other estimates, possibly due to the very complex shape of the red river that forms the border in the census bureau shapefile. I could not find any official source of the border length, but if you know any – please let me know in the comments. The length of curved lines such as these can also vary widely depending on the scale of measurement. Read more at Ireland’s Fractal Coastline

Check out my other posts on aggregate() expressions to see other applications of this very powerful function.

42 Comments

Leave a Comment

    • thanks for this outstanding blog. Just a quick question why the result of the length in mil instead of meter as the CRS.

  1. Hey, thanks for this, its really useful. I have an issue, though, the final step only gives me the length if there is only one neighboring region. Otherwise I get a zero. :S

    Also, how would you adapt this methodology to calculate the coastal length of all coastal regions?Something along the lines of: Total boundary length minus the sum of the neighboring region length (what you have already done).

    • The reason for your error would be if you chose ‘Integer’ as the field type so it fails when there is a comma separated string. Make sure you are adding a field of type ‘String’.

      For coastline, you can definitely do the calculation as you mentioned. Add another field that is perimeter – sum of neighbors.

      To get sum of neighbors from the array, you will have to use the eval() function. See the trick I show in this video https://youtu.be/IXPCec8vgLA?t=866

      • I made sure I selected ‘String’. That is not the issue. I am doing this for Europe, and I think the issue is with selecting the CRS. I’m having trouble working out which is the source and which is my target CRS when I reproject…

        Thanks for the link to the video.

  2. […] We now have a list of neighbors for each station in the layer stations_with_neighbors. To apply the spatial homogeneity test, we should know the distances to each station. We can now use the array_foreach() function to add a new field for every station with the distance to each of its neighbors. I have outlined a similar technique for measure border lengths in detail in my post Calculating Shared Border Lengths Between Polygons. […]

  3. .Hi. This is really gre3at stuff – I’d never get close to this result without the above.

    I am at the stage of calculating the boundary lengths – all previous has worked.

    But what I am getting in the ‘lengths’ field is just a list of commas with no values.

    This is my version of your code:

    array_to_string(

    array_foreach(

    string_to_array(

    “Neighbours”),

    floor(

    length(

    intersection(

    geometry(

    get_feature(

    ‘Boundary’, ‘$ID’,

    trim(@element

    ))),

    $geometry

    )))))

    Field $ID is a string(9)
    Field ID is a qlonglong int8
    … but contain the same ID data.

    Are you able to advise, please?

    Thank you

    Bruce Mitchell

      • Thank you Ujaval, I agree with ‘ID’ instead of ‘$ID’. However, I am finding that while your method works when using the Processing Toolbox, when I try exactly the same sequence using the Graphical Modeller, the lengths are not calculated.

        I copied the code that worked from the Processing Toolbox window into the Graphical Modeller, only to find that it no longer did.

        Output preview in Processing Toolbox showed ‘1306,944,639’
        Output preview in Graphical Modeller showed ‘,,’

        As far as I can see, the process is IDENTICAL. However, when done this way, you only get the comma separated list, without any measurements.

        Is this a bug in QGIS, or am I missing something?

        The code is:

        array_to_string(array_foreach(string_to_array(“Neighbours”),
        floor(length(intersection( geometry
        (get_feature(‘Boundary’, ‘ID’ ,@element)), $geometry )))))

        Occasionally it works in Graphical Modeller and I get the lengths. Then, if I run it again without changing anything, it doesn’t and I’m back to the comma separated empty list.

        Might this have something to do with the output from the two stages (boundary and Field Calc?) Have tried Temp file, SHP and Geopackage.

        Many thanks,

        Bruce

      • Don’t worry about output preview because in modeler you do not have access to attributes of previous layer (which will be computed when model runs). I have successfully used it from processing modeler. Aggregate function doesn’t work in modeler (due to how intermediate layers are computed) but array_foreach works fine consistently for me. Your will have to change the codes to refer to the layer’s using variable syntax. See the last section of this post for details https://spatialthoughts.com/2020/11/26/spatial-homogeneity-testing-qgis/

  4. Thank you for your very quick reply. Just a quckie forbefore I get into applying your suggestion: I notice your first screenshot in the BUILD A PROCESSING MODEL section is titled Model Designer, whereas I have been using Processing Modeller (QGIS 3.10.3) . Has Model Designer taken over from Processing Modeller in a more recent version, or is it another thing altogether, please?

    Many thanks,

      • I am making steps towards my ultimate goal, but ought to check that that goal is achievable. I hope you will be able to advise – you appear to be eminently well qualified to do so 🙂

        I have 110,000 small areal units for which I have the centroids only. These fit into 1,400 districts. I want to generate Thiessen Polygons for the small areal units, and to clip these to the larger districts.

        I have done all the above, and broken all multiparts into singleparts. I have identified neighbours and shared boundary lengths for all singleparts.

        Next, I intend to dissolve …

        each Thiessen singlepart fragment that does not contain a small areal units centroid
        into
        the neighbouring Thiessen singlepart fragments that DOES contain a small areal units centroid
        AND
        with which it shares the longest boundary.

        Can you advise please whether this is possible? It seems so logically.

        ‘How’ isn’t yet an issue – I just want to be reassured that the path is possible, so I am just asking for a simple one-liner response..

        Many thanks.

        Bruce

      • To be clear, the fragments without the Centroid would only be at the edges of the districts, right?

        If that’s the case, you can add a new column with the id of neighbour with longest boundary if there is no Centroid inside, otherwise just add the id of the polygon itself (use if condition). After that dissolve by the newly added column and you are done.

      • Thank you very much, Ujaval. I’m looking into that now. Yes, all fragments without centroids are on periphery of district polygons.

        Kind regards,

        Bruce

  5. Hi!

    First off: Thank you SO much for your post. I was so happy when I found it I did a little dance. 🙂

    I don’t work much with QGIS so this is all very new to me but your explanations are fantastic. I downloaded your dataset and everything worked out perfectly.

    However, when I tried it with my own dataset, I ran into a few problems, and I was wondering if you coud point me in the right direction to solve them?

    1 – I can calculate the neighbouring polygons (yay!) but when I do, two of my polygons suddenly vanish. They are not in the “calculated” layer. Do you happen to know why?

    2 – When I try to calculate the final intersection lenghts, I get a combination of “,0,,0,0,,” instead of actual values. I made sure to select “string” as a field type, and to pick the right CRS. Where might my problme lie? (I literally only changed the field name of the ID and didn’t touch anything else in the code.)

    and

    3 – Is there any way to filter out neighbors on the basis of another layer? In my case, I have a separate layer that shows water courses as lines. I want all my terrestrial units to show up as neighbours of each other – unless there is a water course separating them. (The geometries are inputs for a model.)

    I would be INCREDIBLY grateful for any and all help on this. In any case – this entire site is AMAZING: you’re already a life-saver for this desperately tired (and desperate) grad student. 🙂

    All the best,

    Christina

    • Hi Christina – glad the post turned out to be useful for you.

      Regarding 1) it is likely that your polygons have invalid geometry. Run the “Fix Geometries” algorithm and try running the calculations on the result.

      2) Could be an error in the Expression. Hard to say without looking at your dataset and expression. If you can, share the data or a sample at ujaval@spatialthoughts.com

      3) Maybe first convert the water courses layer to polygons (Lines to Polygon algorithm), then run Difference algorithm between your terrestrial polygons and the water polygons – giving you a geometry of polygons which will work as expected.

      • Hi!

        Thank you so much for your quick response – I’m honestly blown away. 🙂

        I tried to run the fix geometries option that you mentioned, but without luck there. I’ll send you a quick email in a moment.

        Thank you so so much for taking a look! Going by how amazingly proficient you are, you can probably figure out in 30 seconds flat what I’ve spent four hours trying to puzzle out.

        You’re the best!

        Christina

  6. Hi Ujaval. I followed the exact steps in this blog but my final resulting layer has NULL values in length field. Could you please help me identify my mistake?

  7. Hi Ujaval, thanks a lot for all your posts. They are extremely useful. I’m not a GIS expert nor a GIS programmer, and this allows me to produce inputs for my econometric models.

    I sucesfully replicated your example. Now, I need to do the same for all countries in the world. I have two questions that would like to ask you:

    1. Which is your recommended projection for computing this across the world? I suspect that for measuring distances, there is no one that works around the whole globe. In that case, is possible to include a change of projection within the loop, e.g., by continent?

    2. Is there a way of quickly moving from the fields separated by commas to a symmetric matrix which shows length borders from columns and rows?

    thanks,
    Matías

    • You are correct that you need to include a change of projection. The right way to do this would be to create a custom projection for each geometry and use it for the computation. I do not know of a way to do this with expressions. I have some PyQGIS code to do this at https://spatialthoughts.com/2019/04/05/geodesic-buffers-in-qgis/

      For getting the matrix, how would you store it in the attribute table? If you can represent that as a text string, then you can format the results using an expression.

  8. Hi, thanks a lot for this contribution Ujaval. I am trying to do something similar but what I want to do is adding a new field with the shared border of different polygons to only one of the polygons in attribute table. How can I do this?

    • That should be simpler than shown in the post. Use the expression like below and replace the get_feature() with your layer name and the query for the polygon. If the one polygon has an attribute NAME with value ‘MyPolygon’, then an expression like this will work.

      length(intersection(geometry(get_feature(‘Boundary’, ‘NAME’, ‘MyPolygon’)), $geometry)))))

  9. Thanks a lot! This method is really great. I’m wondering if it is possible to sum the results. I achieved to have comma-separated string but how can I create a new column to show the total length for each row?

    • There is now an array_sum() function. Use string_to_array() to create an array from the list of neighbor lengths and the use array_sum()

  10. Thanks for the article!
    I have a doubt and an issue
    Doubt: I have more than 13,000 records in my shapefile, will it take time to process at the initial step?
    Issue: I created a dummy data similar my dataset with 4 records and at the field calculator step after boundary creation, I faced
    Eval error: Column “boundary” not found
    I gave field type as string.

    array_to_string(array_foreach(string_to_array( “neighbors” ),
    floor(0.000621371*length(intersection(geometry(get_feature(‘boundary’, ‘type’, ‘new’)), $geometry)))))

      • I worked with my original dataset and got NULL values in ‘neighbors’ field in the output. I set type as string with 80 width. May I know where did the mistake

        array_to_string(array_remove_all(aggregate(
        layer:= ‘union_2001_2011f’,
        aggregate:=’array_agg’,
        expression:=”ID”,
        filter:=intersects($geometry, geometry(@parent)) AND “typef” = ‘old’
        ), “ID”))

      • If I remember correctly, your field is “Id” (small d). The expressions are case senstive and quote sensitive (remember double-quotes refer to field and single quotes for text)

  11. Thank you. I corrected that. Another issue am facing is that to find neighbors, code with only intersect condition works
    array_to_string(array_remove_all(aggregate(
    layer:= ‘union_2001_2011f’,
    aggregate:=’array_agg’,
    expression:=Id,
    filter:=intersects($geometry, geometry(@parent))
    ), “Id”))
    But when I give the condition, the output has NULL values
    array_to_string(array_remove_all(aggregate(
    layer:= ‘union_2001_2011f’,
    aggregate:=’array_agg’,
    expression:=Id,
    filter:=intersects($geometry, geometry(@parent)) AND “type” = ‘old’
    ), “Id”))
    Could you please where it went wrong?

  12. I have one more doubt. Say if the polygons intersect at vertices, I get output in lengths field as 5,,
    Instead, if i want the algorithm to fill in ‘0’ and get output as 5,0,0
    What changes should be made in the field calculator?

    • Use the coalesce() function. This function returns the first non null value. Put the length value you are calculating as the first argument and 0 as the second. Whenever length is null it will output 0.

      Something like

      array_to_string(array_foreach(string_to_array( “neighbors” ),
      coalesce(floor(0.000621371*length(intersection(geometry(get_feature(‘boundary’, ‘type’, ‘new’)), $geometry))), 0)))

      • Thank you. It worked. But I want to calculate the length in meter upto 2 decimal places. So previously I used

        array_to_string(array_foreach(string_to_array( “neighbors” ),
        round(length(intersection(geometry(get_feature(‘Boundary’, ‘Id’, trim(@element))), $geometry)),2)))

        In order to incorporate coalesce(), i updated as

        array_to_string(array_foreach(string_to_array( “neighbors” ),coalesce(
        round(length(intersection(geometry(get_feature(‘Boundary’, ‘Id’, trim(@element))), $geometry))),0,2)))

        But the output has values as integers instead of real numbers.

  13. Do you have a solution of how to do this using python or where you might recommend help regarding this? Specifically geopandas

  14. Thanks a lot from me also, Ujaval, for this very very interesting article. I managed to repeat almost everything just as you recommended but whatever I do, I end with a lot of shared borders with missing length values (45,,,318,,17 or 45,0,0,318,0 17 if I use the coalesce function you told Linda). That has to happen with polygons that only share one vertex, I understand that, but I have many cases where polygons share a whole segment and still I get Null (oder zero) values. One thing I know is that it seems to be reciprocal – that is, if poly A doesn’t give me length for the border with poly B, but only for C, then poly B will also not give me a length for the border with poly A. Have you come across this particular problem?

    • This is a common problem and it has to do with ‘topology’. If the polygons are not digitized with correct topology then they are not adjacent and do not share any border. Likely your polygons have either small gaps (slivers) or do not share a common edge (even if it visually appears to be). I prefer fixing these in mapshaper. Head to mapshaper.org, drop your shapefile, open Console and type clean and press enter. Then download the results. Hopefully that resolves your issue. If that doesn’t help – feel free to send a copy of your dataset to me at ujaval@spatialthoughts.com and I will take a look.

  15. Of course – I should have known. These are data from field surveys digitized by a lot of different people with a lot of different GI systems and although I did a few basic things to correct for major topological problems in ArcGIS and QGIS, that wasn’t anyway good enough. With the shapes cleaned by mapshaper only one of 84 calculates incorrectly, and that will be some topology thing as well. Thanks again for the blog and especially for the quick reply!

Leave a Reply to ujavalCancel reply