Find Neighbor Polygons using Summary Aggregate Function in QGIS

Read my previous posts Summary Aggregate and Spatial Filters and Advanced Aggregate Expressions to Automate QA to learn more about the powerful aggregate function.

The aggregate function in QGIS was designed to work with 2 separate input vector layers, but we can also make it work with a single layer. Essentially performing spatial queries for features within the layer.

Consider this problem. In a layer containing zip codes, we want to find all neighboring zip codes for every polygon. This can be done very easily by creating a new field with an expression with the aggregate function – using the layer itself as the parent layer.

Below is the Zip Codes shapefile from the City of Seattle Open Data Portal. Our task is to add another field which contains all neighboring zip codes for each feature. Note the layer name is Zip_Codes

Open the Attribute Table for the layer. Note the field name containing the 5-digit zip code is ZIPCODE . Open Field Calculator.

Create a new field with the following expression. The key here is the spatial filter intersects($geometry, geometry(@parent)) that finds all polygons from the layer that intersect the feature being processed.

aggregate(
 layer:= 'Zip_Codes',
 aggregate:='concatenate',
 expression:=ZIPCODE,
 concatenator:=', ',
 filter:=intersects($geometry, geometry(@parent))
 )

Note: If you are not getting accurate results, try the following

  • Add a small buffer to the geometry. Change the filter statement to something like filter:=intersects($geometry, buffer(geometry(@parent), 1)) where 1 is the distance in the units of your layer’s CRS. Change it to an appropriate small number for your CRS units.
  • Instead of aggregating using the same layer, duplicate the layer by right-clicking the layer and selecting ‘Duplicate’. And when running the aggregate function, use the duplicate layer name instead of the original layer.

You will now have a new field that has the neighbors zip codes for every feature in the layer!

The concatenate aggregate expects strings as inputs, so if the field you are trying to aggregate is not a string, you will get an error. To use integer fields with concatenate, you must first convert it to a string using the built-in conversion function to_string(). If the ZIPCODE field was of integer type, you can use it as following

aggregate(
 layer:= 'Zip_Codes',
 aggregate:='concatenate',
 expression:=to_string("ZIPCODE"),
 concatenator:=', ',
 filter:=intersects($geometry, geometry(@parent))
 )

There are many type of aggregate that you can use. If you want to count numbering zip codes, you can use the count aggregate instead of concatenate. The following expression would calculate the number of neighboring polygons.

aggregate(
 layer:= 'Zip_Codes',
 aggregate:='count',
 expression:=$id,
 filter:=touches($geometry, geometry(@parent))
 )

34 Comments

Leave a Comment

  1. Hi Ujaval, thanks for sharing this post.

    I have tried your example with the world borders shapefile (https://thematicmapping.org/downloads/world_borders.php) and QGIS returns some strange results (e.g. “Cuba” touches “USA”?!! and other countries that share a common border but they don’t appear in the results).

    Any suggestions to overcome this problem? By the way, I am using QGIS 3.4.7. The code I used can be seen below:

    aggregate(
    layer:= ‘TM_WORLD_BORDERS-0.3′,
    aggregate:=’concatenate’,
    expression:=NAME,
    concatenator:=’, ‘,
    filter:=touches($geometry, geometry(@parent))
    )

    ps1. When I use the count function, it returns zero for all the countries.
    ps2. When I use the Zip Codes shapefile, the same issue appears (e.g. in the zip code 98406, where should be 6 neighbors, but only 3 appear).

  2. Thanks for the super helpful article!

    For anyone else who might be using a shapefile that has polygons which theoretically *should* touch but which do not actually share points (which is what the `touches` function expects), you can use this trick: first, select all your features, then select Vector > Geoprocessing Tools > Buffer and add a small buffer (e.g. 1m, but adjust as you need). This will cause the polygons to expand in size by 1m, and, hence, overlap. You can then use the aggregation code in the article, but replace the `touches` function with `intersects`, since adjacent features now intersect each other: `filter:=intersects($geometry, geometry(@parent))`.

    NB: this will cause polygons that are “kitty corner” to each other (e.g. Utah and New Mexico, to use the example of US states) to show up as adjacent. For my purposes, this is not a problem; it might be for you.

  3. Hi Ujaval,
    thanks for this, because it is the closest I was able to find as a solution to my problem, nevertheless it comes up with empty cells.
    So, I have a grid of my study area, and I want to look into the autocorrelation between the neighbouring quadrants. In my case the field code is the id of the quadrants and I’m was expecting a string of respective neighbouring ids, yet this – seemingly ideal – solution of yours results NULL.
    Is there something I miss?
    Or is this not a solution for my problem?

      • Cheers. It works now!!!
        Though not flawless: in the middle of the grid it sometimes produces 8, sometimes less neighbouring cells. Tried increasing buffer size, did not solved anythin.
        Any suggestion?
        Once again thank you for your help.

    • I found a solution. Instead of using the aggregate function on the same layer, the process gives correct results if you duplicate the layer and run the computation.
      1. Rename your layer to ‘original’.
      2. Right-click and select ‘Duplicate’. Rename it to ‘copy’.
      3. Add a attribute to the ‘copy’ layer with the following expression. I made a change to remove the id of the polygon from the list of neighbors.

      array_to_string(array_remove_all(string_to_array(aggregate(
      layer:= 'original',
      aggregate:='concatenate',
      expression:=id2,
      concatenator:=',',
      filter:=intersects($geometry, geometry(@parent))
      ), ','), "id2" ), ',')

      See the result at https://drive.google.com/file/d/16_QRQAfN-tMWFrutETGwp2m_o0dHsvKe/view?usp=sharing

  4. Hello Chinmay! This process is very slow in computing the number neighbouring polygons. I have around 41000 features and it looks like it will take forever to do that!

  5. Hi there, I am not sure that the method is correct. Looking at your first screenshot, ZIP 98020 code has 3 (or maybe 4) neighbors – 98346, 98177, 98026 (and maybe 98133). However, your formula in the fourth and fifth screenshot show as a result for ZIP 98020 only 2 neighbors. Why is that? I noticed this since in my test data in same cases I get a result of 0 neighbors, while the investigated polygon clearly has a neighbor. Altering the intersect/touch or adding a buffer did not work for me. Any clues?

    • Hi Mike – you are correct. I think ‘touches’ is very sensitive and works only in cases where polygons actually share the boundary. I have subsequently found that intersects works better. Try replacing touches with intersects and see if it helps. I will edit the post with the same.

      Also the following might work if the polygons do not intersect as well. Buffer the geometry slightly (here 1 is the distance in the layer CRS, so change accordingly to a small number for your layer)

      filter:=intersects($geometry, buffer(geometry(@parent), 1)).

  6. Hi Ujaval,

    I’m trying to do something very similar but I’m only looking for pairs along a provincial border. Ideally, I would love QGIS to return the neighbours that are only across a specific border. For example, for a specific census division in British Columbia, I want it to return all of the postal codes that are neighbouring it in Alberta. In my data, I have provincial identifiers as well as census division identifiers. Here’s a link to the data if you need to have a look https://www12.statcan.gc.ca/census-recensement/alternative_alternatif.cfm?l=eng&dispext=zip&teng=lcsd000a16a_e.zip&k=%20%20%20%2023264&loc=http://www12.statcan.gc.ca/census-recensement/2011/geo/bound-limit/files-fichiers/2016/lcsd000a16a_e.zip

    Do you have any advice on how I could accomplish this? I’m still fairly new to QGIS and I’ve been trying out some things but with no luck! I’m certain there is a simple line of code I could use. Thanks for any advice you can give!

    • Hi Jerome,

      This is doable. In the aggregate function, you have a ‘filter’ expression. There you can add a condition to select features that match a census division. Something like

      aggregate(
      layer:= ‘Zip_Codes’,
      aggregate:=’concatenate’,
      expression:=ZIPCODE,
      concatenator:=’, ‘,
      filter:=intersects($geometry, geometry(@parent)) AND PRNAME = ‘Alberta’
      )

      In the dataset you sent, there are no zipcode column. So I am assuming that is a separate layer. You will have to do a table join to get the census division and zipcodes in the same layer. If you can share the zipcode data, I can give you more detailed instructions.

      • Hi ujaval, thank you so much for your quick reply! I

        I should have been more clear on exactly what I’m trying to accomplish. I want to know for any given census subdivision (CSD) which other CSDs border it that are also in another province. In other words, I’m trying to pair each CSD with its provincial neighbour.

        The reason I’m doing this is that I’m essentially replicating a 2010 study on minimum wage by Dube, Lester, and Reich (https://escholarship.org/uc/item/86w5m90m) but with Canadian data. If you look at figure 2 in that paper (https://www.mitpressjournals.org/action/showImage?doi=10.1162%2FREST_a_00039&iName=master.img-001.jpg&type=master), it should be clear what I’m trying to accomplish. In the Dube paper, they pair counties that share a common state border. In my paper, I need to pair CSD’s that share a common provincial border.

        As for the code you provided, it does not want to seem to work for me. It just gives me null results. I’ve tried using the Provincial Unique Identifier instead and tried different province names, but I just keep getting null results. I also tried using a buffer and that didn’t seem to solve the problem either, still a bunch of nulls! Not sure if you have any other ideas or reasons why this is happening?

        Again, thank you so much, this has been incredibly helpful!

      • Hi ujaval,

        I wrote a reply on here about a week ago, but I haven’t seen that comment appear on this webpage. So, if there are two similar replies to this, please ignore this one!

        Anyways, I maybe should have been more clear on what I am trying to accomplish. Basically, I would like to get a string variable for each observation that contains a list of all of that observation’s neighbours that are in another province. So for example, the “neighbours” variable for Toronto (which is far from the provincial border) should be blank. Whereas the “neighbours” variable for Ottawa (which is on the Ontario-Quebec border) should have a list of all of the polygons in Quebec that neighbour it, and none of it’s Ontario neighbours. In other words, if you look at the data I posted above and go to Ottawa (CSDUID = 3506008), the “neighbours” variable for Ottawa should contain “2481017, 2480060, 2482030, 2484005” wheres the “neighbours” variable for Toronto (CSDUID = 3520005) should be empty. I hope this makes sense.

        Just for added clarity, I’m essentially replicating this study (https://escholarship.org/uc/item/86w5m90m) with Canadian data. Here is a map they provided in their paper in case that makes things clearer (https://www.mitpressjournals.org/action/showImage?doi=10.1162/REST_a_00039&iName=master.img-001.jpg&w=361&h=257)

        Also, I tried adding the ” AND PRNAME = ‘Alberta’ ” and I kept getting null results whether I added a buffer or not. I also tried AND PRUID = 10 and also got null results. I’m not sure what the issue is there?!

        Thanks again for all of your help, I really appreciate it!!

  7. Hi Ujaval,

    Thank you very much for this.

    Do you know if it is possible to do this kind of aggregation but limiting it to some kind of condition. E.g. I want to form aggregate neighbouring blocks but limiting the population inside this new conglomerate to 1000.

    We have a block layer and of course a variable with the population in it.

    Thank you in advance for your response or any workabout around this!!

    Braulio

  8. Hi Ujaval,
    First thank you for your great work, though unfortunately it hasn’t worked out for me yet 🙁
    I edited the aggregate function accordingly, with:
    Output field type: Text (String)
    expression:= cell_id #this field is the identifier I’m using for that layer.
    However, I keep getting this error:
    Output preview: Expression is invalid (more info => Eval error: Could not calculate aggregate for cell_id)

    (I tried to increase the length of the field to be created, but it didn’t help.)
    What could be the reason for that?

  9. Dear Juval, thank you for your helpful post. I am working with QGIS 3.8 and trying to find out number of points in neighbouring hexagons (after doing a grid of 20 x 20 meters). I used this for counting the number of neighbouring hexagons:
    aggregate(
    layer:= ‘copy’,
    aggregate:=’concatenate’,
    expression:=id2,
    concatenator:=’, ‘,
    filter:=intersects($geometry, geometry(@parent))
    )

    Then I used this for summing the number of points in the neighbouring hexagons:
    aggregate(
    layer:= ‘copy’,
    aggregate:=’sum’,
    expression:=NUMPOINTS,
    filter:= intersects ($geometry, geometry(@parent))
    )

    This worked for one file (I have many), for others get stuck and QGIS 3.8 not responding (I think because of higher number of points?). Can you help me on this please?

  10. Hi Ujuval,
    Thank you for the perfect solution to my problem! Unfortunately, I cant seem to get it to work. I am using the following code as instructed:
    aggregate(
    layer:= ‘ndmMjrt3RimboPgn1m’,
    aggregate:=’concatenate’,
    expression:=gridcode,
    concatenator:=’, ‘,
    filter:=intersects($geometry, geometry(@parent))
    )

    But my output is Null.

    I have tried it with and without a buffer border.

    Here are the data. Thank you for the helpin advance.

    https://drive.google.com/drive/folders/10tp7ePPIP8Uag8UteWGOLKo2O0_MvO0O?usp=sharing

    Kind Regards
    Nedim

    • Hi Nedim,

      You are using the ‘concatenate’ aggregate to create. a comma separated string. It expectes the inputs to be strings too, but your column is a number. So you can convert it to a string and it will work. Use the function to_string() in the expression and it will take care of the conversion. You may also want to use concatenate_unique which will remove the duplicates for you. Below is the expression that works.

      aggregate(
      layer:= ‘ndmMjrt3RimboPgn’,
      aggregate:=’concatenate_unique’,
      expression:=to_string(gridcode),
      concatenator:=’, ‘,
      filter:=intersects($geometry, geometry(@parent))
      )

      • Hi Ujival,
        Thank you, this was really helpful.
        I have a follow-up question as well: Is it possible to have the length of the neighbouring border shown as well?
        For example,
        polygon 1 with value 2
        is bordering with polygon 2 for 10 meters, and polygon 3 for 4 meters, and polygon 4 for 18 meters.
        And so on for the whole data-set

        Kind regards
        Nedim.

  11. Interesting question. After hours of fiddling around, I have a solution for you. I tested it on a small subset of your dataset.

    First, run ‘Fix geometries’ algorithm as you have many invalid geometries. Next, do not use ‘gridcode’ for neighbor calculation since they are not unique. Use ‘id’ instead. Add a new ‘neighbors’ field with following expression

    array_to_string(array_remove_all(string_to_array(aggregate(
    layer:= ‘Fixed geometries’,
    aggregate:=’concatenate_unique’,
    expression:=to_string(id),
    concatenator:=’,’,
    filter:=intersects($geometry, geometry(@parent)))), “id” ))

    Now, run the ‘Boundary’ algorithm from processing toolbox to convert the polygons to lines. Then add an attribute ‘lengths’ using the following expression. You will have a comma-separated list of lengths.

    array_to_string(array_foreach(string_to_array( “neighbors” ),
    coalesce(length(intersection(geometry(get_feature(‘Boundary’, ‘id’, @element)), $geometry)), 0)))

    Here’s the subset with the results
    https://drive.google.com/open?id=1VvPTfTjUwfs61DkquVq2zZy3d632ZOr8

  12. Hi Ujaval,

    Thanks a lot for this really helpful blogpost. I am trying to do this for the Assembly Constituencies of India using the shapefiles from here – http://projects.datameet.org/maps/assembly-constituencies/. I created a new unique ID for the AC codes as they were not unique, but I still only see ‘invalid expression’. I also tried the tips you recommended (adding buffer, duplicating layer, using touches instead of intersects), but was not able to get it to work. Your help and suggestions would be greatly appreciated.

    Thank you!

    Shruti

      • I first created a new variable ‘ID’ that is unique, and then, following your code above, used the expression:

        aggregate(
        layer:= ‘India_AC’,
        aggregate:=’concatenate’,
        expression:=ID,
        concatenator:=’, ‘,
        filter:=intersects($geometry, geometry(@parent))
        )

        I am new to this, so apologies in advance if I have made a very basic error.

  13. Hi Shruti,

    The ID field is an integer, so when you try to aggregate it using ‘concatenate’ it fails since concatenate creates a string. Change your expression to to_string(“ID”) which will create a string from the ID and pass it to concatenate.

    aggregate(
    layer:= ‘India_AC’,
    aggregate:=’concatenate’,
    expression:=to_string(“ID”),
    concatenator:=’, ‘,
    filter:=intersects($geometry, geometry(@parent))
    )

    • Hi Ujaval,

      Thank you so much. I don’t see an invalid expression anymore and the code seems to be running, but it has been running for over 8 hours now! There are about 4000 unique IDs in the data. Do you have a sense of how long this process usually takes?

      • That seems long, but you are doing pair-wise computation for each polygon which are 4000×4000=16000000 operations. Test on a small subset (for a district or state) first.

Leave a Reply