Find Neighbor Polygons using Summary Aggregate Function in QGIS

Read my previous posts Summary Aggregate and Spatial Filters and Advanced Aggregate Expressions to Automate QA to learn more about the powerful aggregate function.

The aggregate function in QGIS was designed to work with 2 separate input vector layers, but we can also make it work with a single layer. Essentially performing spatial queries for features within the layer.

Consider this problem. In a layer containing zip codes, we want to find all neighboring zip codes for every polygon. This can be done very easily by creating a new field with an expression with the aggregate function – using the layer itself as the parent layer.

Below is the Zip Codes shapefile from the City of Seattle Open Data Portal. Our task is to add another field which contains all neighboring zip codes for each feature. Note the layer name is Zip_Codes

Open the Attribute Table for the layer. Note the field name containing the 5-digit zip code is ZIPCODE . Open Field Calculator.

Create a new field with the following expression. The key here is the spatial filter intersects($geometry, geometry(@parent)) that finds all polygons from the layer that intersect the feature being processed.

aggregate(
 layer:= 'Zip_Codes',
 aggregate:='concatenate',
 expression:=ZIPCODE,
 concatenator:=', ',
 filter:=intersects($geometry, geometry(@parent))
 )

Note: If you are not getting accurate results, try the following

  • Add a small buffer to the geometry. Change the filter statement to something like filter:=intersects($geometry, buffer(geometry(@parent), 1)) where 1 is the distance in the units of your layer’s CRS. Change it to an appropriate small number for your CRS units.
  • Instead of aggregating using the same layer, duplicate the layer by right-clicking the layer and selecting ‘Duplicate’. And when running the aggregate function, use the duplicate layer name instead of the original layer.

You will now have a new field that has the neighbors zip codes for every feature in the layer!

The concatenate aggregate expects strings as inputs, so if the field you are trying to aggregate is not a string, you will get an error. To use integer fields with concatenate, you must first convert it to a string using the built-in conversion function to_string(). If the ZIPCODE field was of integer type, you can use it as following

aggregate(
 layer:= 'Zip_Codes',
 aggregate:='concatenate',
 expression:=to_string("ZIPCODE"),
 concatenator:=', ',
 filter:=intersects($geometry, geometry(@parent))
 )

There are many type of aggregate that you can use. If you want to count numbering zip codes, you can use the count aggregate instead of concatenate. The following expression would calculate the number of neighboring polygons.

aggregate(
 layer:= 'Zip_Codes',
 aggregate:='count',
 expression:=$id,
 filter:=touches($geometry, geometry(@parent))
 )

Read Part-2 of this post to learn how to calculate shared border length between neighbor polygons.

52 Comments

Leave a Comment

  1. Hi Ujaval, thanks for sharing this post.

    I have tried your example with the world borders shapefile (https://thematicmapping.org/downloads/world_borders.php) and QGIS returns some strange results (e.g. “Cuba” touches “USA”?!! and other countries that share a common border but they don’t appear in the results).

    Any suggestions to overcome this problem? By the way, I am using QGIS 3.4.7. The code I used can be seen below:

    aggregate(
    layer:= ‘TM_WORLD_BORDERS-0.3′,
    aggregate:=’concatenate’,
    expression:=NAME,
    concatenator:=’, ‘,
    filter:=touches($geometry, geometry(@parent))
    )

    ps1. When I use the count function, it returns zero for all the countries.
    ps2. When I use the Zip Codes shapefile, the same issue appears (e.g. in the zip code 98406, where should be 6 neighbors, but only 3 appear).

  2. Thanks for the super helpful article!

    For anyone else who might be using a shapefile that has polygons which theoretically *should* touch but which do not actually share points (which is what the `touches` function expects), you can use this trick: first, select all your features, then select Vector > Geoprocessing Tools > Buffer and add a small buffer (e.g. 1m, but adjust as you need). This will cause the polygons to expand in size by 1m, and, hence, overlap. You can then use the aggregation code in the article, but replace the `touches` function with `intersects`, since adjacent features now intersect each other: `filter:=intersects($geometry, geometry(@parent))`.

    NB: this will cause polygons that are “kitty corner” to each other (e.g. Utah and New Mexico, to use the example of US states) to show up as adjacent. For my purposes, this is not a problem; it might be for you.

  3. Hi Ujaval,
    thanks for this, because it is the closest I was able to find as a solution to my problem, nevertheless it comes up with empty cells.
    So, I have a grid of my study area, and I want to look into the autocorrelation between the neighbouring quadrants. In my case the field code is the id of the quadrants and I’m was expecting a string of respective neighbouring ids, yet this – seemingly ideal – solution of yours results NULL.
    Is there something I miss?
    Or is this not a solution for my problem?

      • Cheers. It works now!!!
        Though not flawless: in the middle of the grid it sometimes produces 8, sometimes less neighbouring cells. Tried increasing buffer size, did not solved anythin.
        Any suggestion?
        Once again thank you for your help.

    • I found a solution. Instead of using the aggregate function on the same layer, the process gives correct results if you duplicate the layer and run the computation.
      1. Rename your layer to ‘original’.
      2. Right-click and select ‘Duplicate’. Rename it to ‘copy’.
      3. Add a attribute to the ‘copy’ layer with the following expression. I made a change to remove the id of the polygon from the list of neighbors.

      array_to_string(array_remove_all(string_to_array(aggregate(
      layer:= 'original',
      aggregate:='concatenate',
      expression:=id2,
      concatenator:=',',
      filter:=intersects($geometry, geometry(@parent))
      ), ','), "id2" ), ',')

      See the result at https://drive.google.com/file/d/16_QRQAfN-tMWFrutETGwp2m_o0dHsvKe/view?usp=sharing

  4. Hello Chinmay! This process is very slow in computing the number neighbouring polygons. I have around 41000 features and it looks like it will take forever to do that!

  5. Hi there, I am not sure that the method is correct. Looking at your first screenshot, ZIP 98020 code has 3 (or maybe 4) neighbors – 98346, 98177, 98026 (and maybe 98133). However, your formula in the fourth and fifth screenshot show as a result for ZIP 98020 only 2 neighbors. Why is that? I noticed this since in my test data in same cases I get a result of 0 neighbors, while the investigated polygon clearly has a neighbor. Altering the intersect/touch or adding a buffer did not work for me. Any clues?

    • Hi Mike – you are correct. I think ‘touches’ is very sensitive and works only in cases where polygons actually share the boundary. I have subsequently found that intersects works better. Try replacing touches with intersects and see if it helps. I will edit the post with the same.

      Also the following might work if the polygons do not intersect as well. Buffer the geometry slightly (here 1 is the distance in the layer CRS, so change accordingly to a small number for your layer)

      filter:=intersects($geometry, buffer(geometry(@parent), 1)).

  6. Hi Ujaval,

    I’m trying to do something very similar but I’m only looking for pairs along a provincial border. Ideally, I would love QGIS to return the neighbours that are only across a specific border. For example, for a specific census division in British Columbia, I want it to return all of the postal codes that are neighbouring it in Alberta. In my data, I have provincial identifiers as well as census division identifiers. Here’s a link to the data if you need to have a look https://www12.statcan.gc.ca/census-recensement/alternative_alternatif.cfm?l=eng&dispext=zip&teng=lcsd000a16a_e.zip&k=%20%20%20%2023264&loc=http://www12.statcan.gc.ca/census-recensement/2011/geo/bound-limit/files-fichiers/2016/lcsd000a16a_e.zip

    Do you have any advice on how I could accomplish this? I’m still fairly new to QGIS and I’ve been trying out some things but with no luck! I’m certain there is a simple line of code I could use. Thanks for any advice you can give!

    • Hi Jerome,

      This is doable. In the aggregate function, you have a ‘filter’ expression. There you can add a condition to select features that match a census division. Something like

      aggregate(
      layer:= ‘Zip_Codes’,
      aggregate:=’concatenate’,
      expression:=ZIPCODE,
      concatenator:=’, ‘,
      filter:=intersects($geometry, geometry(@parent)) AND PRNAME = ‘Alberta’
      )

      In the dataset you sent, there are no zipcode column. So I am assuming that is a separate layer. You will have to do a table join to get the census division and zipcodes in the same layer. If you can share the zipcode data, I can give you more detailed instructions.

      • Hi ujaval, thank you so much for your quick reply! I

        I should have been more clear on exactly what I’m trying to accomplish. I want to know for any given census subdivision (CSD) which other CSDs border it that are also in another province. In other words, I’m trying to pair each CSD with its provincial neighbour.

        The reason I’m doing this is that I’m essentially replicating a 2010 study on minimum wage by Dube, Lester, and Reich (https://escholarship.org/uc/item/86w5m90m) but with Canadian data. If you look at figure 2 in that paper (https://www.mitpressjournals.org/action/showImage?doi=10.1162%2FREST_a_00039&iName=master.img-001.jpg&type=master), it should be clear what I’m trying to accomplish. In the Dube paper, they pair counties that share a common state border. In my paper, I need to pair CSD’s that share a common provincial border.

        As for the code you provided, it does not want to seem to work for me. It just gives me null results. I’ve tried using the Provincial Unique Identifier instead and tried different province names, but I just keep getting null results. I also tried using a buffer and that didn’t seem to solve the problem either, still a bunch of nulls! Not sure if you have any other ideas or reasons why this is happening?

        Again, thank you so much, this has been incredibly helpful!

      • Hi ujaval,

        I wrote a reply on here about a week ago, but I haven’t seen that comment appear on this webpage. So, if there are two similar replies to this, please ignore this one!

        Anyways, I maybe should have been more clear on what I am trying to accomplish. Basically, I would like to get a string variable for each observation that contains a list of all of that observation’s neighbours that are in another province. So for example, the “neighbours” variable for Toronto (which is far from the provincial border) should be blank. Whereas the “neighbours” variable for Ottawa (which is on the Ontario-Quebec border) should have a list of all of the polygons in Quebec that neighbour it, and none of it’s Ontario neighbours. In other words, if you look at the data I posted above and go to Ottawa (CSDUID = 3506008), the “neighbours” variable for Ottawa should contain “2481017, 2480060, 2482030, 2484005” wheres the “neighbours” variable for Toronto (CSDUID = 3520005) should be empty. I hope this makes sense.

        Just for added clarity, I’m essentially replicating this study (https://escholarship.org/uc/item/86w5m90m) with Canadian data. Here is a map they provided in their paper in case that makes things clearer (https://www.mitpressjournals.org/action/showImage?doi=10.1162/REST_a_00039&iName=master.img-001.jpg&w=361&h=257)

        Also, I tried adding the ” AND PRNAME = ‘Alberta’ ” and I kept getting null results whether I added a buffer or not. I also tried AND PRUID = 10 and also got null results. I’m not sure what the issue is there?!

        Thanks again for all of your help, I really appreciate it!!

  7. Hi Ujaval,

    Thank you very much for this.

    Do you know if it is possible to do this kind of aggregation but limiting it to some kind of condition. E.g. I want to form aggregate neighbouring blocks but limiting the population inside this new conglomerate to 1000.

    We have a block layer and of course a variable with the population in it.

    Thank you in advance for your response or any workabout around this!!

    Braulio

  8. Hi Ujaval,
    First thank you for your great work, though unfortunately it hasn’t worked out for me yet 🙁
    I edited the aggregate function accordingly, with:
    Output field type: Text (String)
    expression:= cell_id #this field is the identifier I’m using for that layer.
    However, I keep getting this error:
    Output preview: Expression is invalid (more info => Eval error: Could not calculate aggregate for cell_id)

    (I tried to increase the length of the field to be created, but it didn’t help.)
    What could be the reason for that?

  9. Dear Juval, thank you for your helpful post. I am working with QGIS 3.8 and trying to find out number of points in neighbouring hexagons (after doing a grid of 20 x 20 meters). I used this for counting the number of neighbouring hexagons:
    aggregate(
    layer:= ‘copy’,
    aggregate:=’concatenate’,
    expression:=id2,
    concatenator:=’, ‘,
    filter:=intersects($geometry, geometry(@parent))
    )

    Then I used this for summing the number of points in the neighbouring hexagons:
    aggregate(
    layer:= ‘copy’,
    aggregate:=’sum’,
    expression:=NUMPOINTS,
    filter:= intersects ($geometry, geometry(@parent))
    )

    This worked for one file (I have many), for others get stuck and QGIS 3.8 not responding (I think because of higher number of points?). Can you help me on this please?

  10. Hi Ujuval,
    Thank you for the perfect solution to my problem! Unfortunately, I cant seem to get it to work. I am using the following code as instructed:
    aggregate(
    layer:= ‘ndmMjrt3RimboPgn1m’,
    aggregate:=’concatenate’,
    expression:=gridcode,
    concatenator:=’, ‘,
    filter:=intersects($geometry, geometry(@parent))
    )

    But my output is Null.

    I have tried it with and without a buffer border.

    Here are the data. Thank you for the helpin advance.

    https://drive.google.com/drive/folders/10tp7ePPIP8Uag8UteWGOLKo2O0_MvO0O?usp=sharing

    Kind Regards
    Nedim

    • Hi Nedim,

      You are using the ‘concatenate’ aggregate to create. a comma separated string. It expectes the inputs to be strings too, but your column is a number. So you can convert it to a string and it will work. Use the function to_string() in the expression and it will take care of the conversion. You may also want to use concatenate_unique which will remove the duplicates for you. Below is the expression that works.

      aggregate(
      layer:= ‘ndmMjrt3RimboPgn’,
      aggregate:=’concatenate_unique’,
      expression:=to_string(gridcode),
      concatenator:=’, ‘,
      filter:=intersects($geometry, geometry(@parent))
      )

      • Hi Ujival,
        Thank you, this was really helpful.
        I have a follow-up question as well: Is it possible to have the length of the neighbouring border shown as well?
        For example,
        polygon 1 with value 2
        is bordering with polygon 2 for 10 meters, and polygon 3 for 4 meters, and polygon 4 for 18 meters.
        And so on for the whole data-set

        Kind regards
        Nedim.

  11. Interesting question. After hours of fiddling around, I have a solution for you. I tested it on a small subset of your dataset.

    First, run ‘Fix geometries’ algorithm as you have many invalid geometries. Next, do not use ‘gridcode’ for neighbor calculation since they are not unique. Use ‘id’ instead. Add a new ‘neighbors’ field with following expression

    array_to_string(array_remove_all(string_to_array(aggregate(
    layer:= ‘Fixed geometries’,
    aggregate:=’concatenate_unique’,
    expression:=to_string(id),
    concatenator:=’,’,
    filter:=intersects($geometry, geometry(@parent)))), “id” ))

    Now, run the ‘Boundary’ algorithm from processing toolbox to convert the polygons to lines. Then add an attribute ‘lengths’ using the following expression. You will have a comma-separated list of lengths.

    array_to_string(array_foreach(string_to_array( “neighbors” ),
    coalesce(length(intersection(geometry(get_feature(‘Boundary’, ‘id’, @element)), $geometry)), 0)))

    Here’s the subset with the results
    https://drive.google.com/open?id=1VvPTfTjUwfs61DkquVq2zZy3d632ZOr8

  12. Hi Ujaval,

    Thanks a lot for this really helpful blogpost. I am trying to do this for the Assembly Constituencies of India using the shapefiles from here – http://projects.datameet.org/maps/assembly-constituencies/. I created a new unique ID for the AC codes as they were not unique, but I still only see ‘invalid expression’. I also tried the tips you recommended (adding buffer, duplicating layer, using touches instead of intersects), but was not able to get it to work. Your help and suggestions would be greatly appreciated.

    Thank you!

    Shruti

      • I first created a new variable ‘ID’ that is unique, and then, following your code above, used the expression:

        aggregate(
        layer:= ‘India_AC’,
        aggregate:=’concatenate’,
        expression:=ID,
        concatenator:=’, ‘,
        filter:=intersects($geometry, geometry(@parent))
        )

        I am new to this, so apologies in advance if I have made a very basic error.

  13. Hi Shruti,

    The ID field is an integer, so when you try to aggregate it using ‘concatenate’ it fails since concatenate creates a string. Change your expression to to_string(“ID”) which will create a string from the ID and pass it to concatenate.

    aggregate(
    layer:= ‘India_AC’,
    aggregate:=’concatenate’,
    expression:=to_string(“ID”),
    concatenator:=’, ‘,
    filter:=intersects($geometry, geometry(@parent))
    )

    • Hi Ujaval,

      Thank you so much. I don’t see an invalid expression anymore and the code seems to be running, but it has been running for over 8 hours now! There are about 4000 unique IDs in the data. Do you have a sense of how long this process usually takes?

      • That seems long, but you are doing pair-wise computation for each polygon which are 4000×4000=16000000 operations. Test on a small subset (for a district or state) first.

  14. hello thanks for your very clear and detailed article, but I wanted to ask if it is possible to use order_by in descending sense because I wanted to sort the polygons from smallest to largest and then with array_last () get the largest polygon. I tried array_first () but the first name of the array is the analyzed polygon.

      • many thanks for your reply. I will study it and in case I write if I have not understood something

      • ok my script works
        array_to_string(array_remove_all(aggregate(
        layer:= ‘africa’,
        aggregate:=’array_agg’,
        expression:= “ADM0_A3″ ,
        filter:=intersects($geometry, geometry(@parent)),order_by:=”area”
        ), “ADM0_A3” ))

        but…
        I have the feeling that “order by” set on the area field is not working, is it set in descending or ascending order? and in case the default can be changed?
        my goal is to put the neighboring polygons in order from the largest to the smallest.
        Thanks in advance

  15. […] We have a point layer named ghcn_stations that has the location of rain-gauge stations in Florida. We will use the aggregate() function to find the neighboring stations within a distance of 10km (Rmax). I describe the technique for finding neighbors in detail in my another post Find Neighbor Polygons using Summary Aggregate Function in QGIS. […]

  16. I tried this out on a US ztca510 shape file. It seems to freeze the application with a Not Responding. Perhaps the dataset is too large?

    aggregate(
    layer:= ‘tl_2019_us_zcta510′,
    aggregate:=’concatenate’,
    expression:=ZCTA5CE10,
    concatenator:=’, ‘,
    filter:=intersects($geometry, geometry(@parent))
    )

    • One limitation of the aggregate function is that it does not use a spatial index. If your layer has 1000 polygons, it will need to do 1000 x 1000 = 1000000 operations to find the neighbours. So it can be quite slow with large layers. Do this with smaller regions or let it run for a longer time.

  17. Hi,
    one additional challenge to this great algorithm is the implementation in to a processing model. In such a model I would run a couple of preprocessing steps like ‘fix’ or ‘dissolve’ to prepare the data. However, I have challenges to call the layer and expression. Any idea how I could adapt the aggregate function to run inside the fieldcalculator inside a Model? Thank You!

  18. Hi,
    I’m trying to use the following “native:aggregate” processing algorithm:

    # aggregate
    alg_params = {
    ‘AGGREGATES’: [{‘aggregate’: ‘concatenate’,’delimiter’: ‘,’,’input’: “ZIPCODE”,’length’: 250,’name’: ‘ZZZ’,’precision’: 0, ‘type’: 10}],
    ‘GROUP_BY’: ‘$area’,
    ‘INPUT’: parameters[‘Inputpolygonlayer’],
    ‘OUTPUT’: QgsProcessing.TEMPORARY_OUTPUT
    }
    outputs[‘Aggregate’] = processing.run(‘native:aggregate’, alg_params, context=context, feedback=feedback, is_child_algorithm=True)

    “ZIPOCODE” is the field with the numeric data to be aggregated, and “ZZZ” is the text field with the aggregation.

    I always get the following error:
    Evaluation error in expression “concatenate(ZIPCODE, $area, TRUE, ‘,’)”: Could not calculate aggregate for: ZIPCODE

    Where is my mistake?

      • Many thanks for your reply.

        Now I’m trying to use “native:aggregate” processing algorithm to find Neighbor Polygons of some selected features in a polygon layer, through the following code:

        # Aggregate
        alg_params = {
        ‘AGGREGATES’: [{‘aggregate’: ‘concatenate’,’delimiter’: ‘,’,’input’: ‘AUTO’,’length’: 250,’name’: ‘ZZZ082640′,’precision’: 0,’type’: 10}],
        ‘FILTER’: ‘=intersects($geometry, geometry(@parent)),order_by:=$area’,
        ‘INPUT’: parameters[‘Inputpolygonlayer’],
        ‘OUTPUT’: QgsProcessing.TEMPORARY_OUTPUT
        }
        outputs[‘Aggregate’] = processing.run(‘native:aggregate’, alg_params, context=context, feedback=feedback, is_child_algorithm=True)

        ‘AUTO’ is a text field with data to be aggregated.

        However, the filter used (‘FILTER’: ‘=intersects($geometry, geometry(@parent)),order_by:=$area’) seams not working. Have I miss something?

        Regards,
        Antonio

      • I think the filter should be ‘FILTER’: ‘intersects($geometry, geometry(@parent)),order_by:=$area’ (no = sign)

  19. Hi there.
    I used all the informations upthere and it all works for me finding the neighbours of the polygons.
    Actually i am working in cadastral parcels. My problem is that i need the neighbours ‘parcel numbers’ to be arranged in a correct order regarding to their position related to the parent parcel . No problem if it is North West South East , just need that the order to be the same for every parcel. Please can you help me adding something to the upthere scripts 🙂

    • You can use order_by parameter in aggregate() function to control the order.
      One option is to sort all neighbors, clock-wise direction using the angle between the parcel’s centroid and centroids of all other parcels. Replace “fid” with “parc_nr” for your dataset.

      array_to_string(array_remove_all(aggregate(
      layer:=@layer,
      aggregate:=’array_agg’,
      expression:=”fid”,
      filter:=intersects($geometry, geometry(@parent)),
      order_by:=azimuth( centroid(geometry(@parent)), centroid($geometry))), “fid”))

      If you really want separate north,south, east and west neighbors, try something like this
      Add an attribute with north parcel(s) using an expression like below

      array_to_string(array_remove_all(aggregate(
      layer:=@layer,
      aggregate:=’array_agg’,
      expression:=”fid”,
      filter:=intersects($geometry, geometry(@parent)) and (angle_at_vertex(make_line( centroid(geometry(@parent)), centroid($geometry)), 0)<  45 OR angle_at_vertex(make_line( centroid(geometry(@parent)), centroid($geometry)), 0) > 315)), “fid”))

      similarly change angles and add attributes for parcels in other directions
      Explanation at https://twitter.com/spatialthoughts/status/1186546679442395137

  20. I have made polygon grid with QGIS create grid tool and MMQGIS grid creation tool. I am trying to find the neighboring polygons for each polygon but it can only considers and lists the neighboring polygons that have higher ID. Any thoughts why this might happen? So in Practise it iterates from upper left corner and does not list the cells already iterated (have smaller ID/row number).

Leave a Reply