Read my previous posts Summary Aggregate and Spatial Filters and Advanced Aggregate Expressions to Automate QA to learn more about the powerful aggregate function.

The aggregate function in QGIS was designed to work with 2 separate input vector layers, but* we can also make it work with a single layer*. Essentially performing spatial queries for features within the layer.

Consider this problem. In a layer containing zip codes, we want to find all neighboring zip codes for every polygon. This can be done very easily by creating a new field with an expression with the aggregate function – using the layer itself as the parent layer.

Below is the Zip Codes shapefile from the City of Seattle Open Data Portal. Our task is to add another field which contains all neighboring zip codes for each feature. Note the layer name is **Zip_Codes**

Open the Attribute Table for the layer. Note the field name containing the 5-digit zip code is * ZIPCODE* . Open Field Calculator.

Create a new field with the following expression. The key here is the spatial filter *intersects($geometry, geometry(@parent))* that finds all polygons from the layer that intersect the feature being processed.

```
aggregate(
layer:= 'Zip_Codes',
aggregate:='concatenate',
expression:=ZIPCODE,
concatenator:=', ',
filter:=intersects($geometry, geometry(@parent))
)
```

*Note: If you are not getting accurate results, try the following*

- Add a small buffer to the geometry. Change the filter statement to something like f
*ilter:=intersects($geometry, buffer(geometry(@parent), 1))*where 1 is the distance in the units of your layer’s CRS. Change it to an appropriate small number for your CRS units. - Instead of aggregating using the same layer,
*duplicate the layer by right-clicking the layer and selecting ‘Duplicate’. And when running the aggregate function, use the duplicate layer*name instead of the original layer.

You will now have a new field that has the neighbors zip codes for every feature in the layer!

The `concatenate`

aggregate expects strings as inputs, so if the field you are trying to aggregate is *not a string*, you will get an error. To use integer fields with concatenate, you must first convert it to a string using the built-in conversion function `to_string`

(). If the ZIPCODE field was of integer type, you can use it as following

```
aggregate(
layer:= 'Zip_Codes',
aggregate:='concatenate',
expression:=to_string("ZIPCODE"),
concatenator:=', ',
filter:=intersects($geometry, geometry(@parent))
)
```

There are many type of *aggregate *that you can use. If you want to count numbering zip codes, you can use the ** count **aggregate instead of

*concatenate*. The following expression would calculate the number of neighboring polygons.

```
aggregate(
layer:= 'Zip_Codes',
aggregate:='count',
expression:=$id,
filter:=touches($geometry, geometry(@parent))
)
```

Hi Ujaval, thanks for sharing this post.

I have tried your example with the world borders shapefile (https://thematicmapping.org/downloads/world_borders.php) and QGIS returns some strange results (e.g. “Cuba” touches “USA”?!! and other countries that share a common border but they don’t appear in the results).

Any suggestions to overcome this problem? By the way, I am using QGIS 3.4.7. The code I used can be seen below:

aggregate(

layer:= ‘TM_WORLD_BORDERS-0.3′,

aggregate:=’concatenate’,

expression:=NAME,

concatenator:=’, ‘,

filter:=touches($geometry, geometry(@parent))

)

ps1. When I use the count function, it returns zero for all the countries.

ps2. When I use the Zip Codes shapefile, the same issue appears (e.g. in the zip code 98406, where should be 6 neighbors, but only 3 appear).

Hmm. There is some funkiness going on with the geometries in that layer. There are a bunch of invalid geometries and the United States polygon crosses the anti-meridian that can cause such weird results. If you use the Natural Earth Admin0 countries file, it gives the correct results https://www.naturalearthdata.com/downloads/10m-cultural-vectors/10m-admin-0-countries/

Thanks for the answer. I will have to review some topics on how to create valid geometries (or correct invalid ones) to have this function working properly. And using the Natural Earth Data, the function (almost) works right (e.g. Venezuela appears without neighbors).

[…] Useful idea on “Find Neighbor Polygons using Summary Aggregate Function in QGIS“ […]

Thanks for the super helpful article!

For anyone else who might be using a shapefile that has polygons which theoretically *should* touch but which do not actually share points (which is what the `touches` function expects), you can use this trick: first, select all your features, then select Vector > Geoprocessing Tools > Buffer and add a small buffer (e.g. 1m, but adjust as you need). This will cause the polygons to expand in size by 1m, and, hence, overlap. You can then use the aggregation code in the article, but replace the `touches` function with `intersects`, since adjacent features now intersect each other: `filter:=intersects($geometry, geometry(@parent))`.

NB: this will cause polygons that are “kitty corner” to each other (e.g. Utah and New Mexico, to use the example of US states) to show up as adjacent. For my purposes, this is not a problem; it might be for you.

Thanks Frank. FWIW, you can also do this operation inline without creating a new layer, filter:=intersects($geometry, buffer(geometry(@parent), 1))`.

Even better! Thanks!

Hi Ujaval,

thanks for this, because it is the closest I was able to find as a solution to my problem, nevertheless it comes up with empty cells.

So, I have a grid of my study area, and I want to look into the autocorrelation between the neighbouring quadrants. In my case the field code is the id of the quadrants and I’m was expecting a string of respective neighbouring ids, yet this – seemingly ideal – solution of yours results NULL.

Is there something I miss?

Or is this not a solution for my problem?

Try replacing ‘touches’ with ‘intersects’ and see if that helps.

Cheers. It works now!!!

Though not flawless: in the middle of the grid it sometimes produces 8, sometimes less neighbouring cells. Tried increasing buffer size, did not solved anythin.

Any suggestion?

Once again thank you for your help.

Depends on how your data is structured. Can you share more what your data looks like and what is the goal?

It is a 1000×1000 m grid consisting of ca 4000 quadrants (see the data: https://drive.google.com/drive/folders/1_g71w0ZRDicEJE8NmLqvC_VTiGPrev7e?usp=sharing). I want to calculate the autocorrelation of processes in the neighbouring quadrants, for this I need a list of neighbouring quadrants ids, so I can run calculations.

I found a solution. Instead of using the aggregate function on the same layer, the process gives correct results if you duplicate the layer and run the computation.

1. Rename your layer to ‘original’.

2. Right-click and select ‘Duplicate’. Rename it to ‘copy’.

3. Add a attribute to the ‘copy’ layer with the following expression. I made a change to remove the id of the polygon from the list of neighbors.

`array_to_string(array_remove_all(string_to_array(aggregate(`

layer:= 'original',

aggregate:='concatenate',

expression:=id2,

concatenator:=',',

filter:=intersects($geometry, geometry(@parent))

), ','), "id2" ), ',')

See the result at https://drive.google.com/file/d/16_QRQAfN-tMWFrutETGwp2m_o0dHsvKe/view?usp=sharing

I’m genuinely grateful for your help.

Hello Chinmay! This process is very slow in computing the number neighbouring polygons. I have around 41000 features and it looks like it will take forever to do that!

Hi there, I am not sure that the method is correct. Looking at your first screenshot, ZIP 98020 code has 3 (or maybe 4) neighbors – 98346, 98177, 98026 (and maybe 98133). However, your formula in the fourth and fifth screenshot show as a result for ZIP 98020 only 2 neighbors. Why is that? I noticed this since in my test data in same cases I get a result of 0 neighbors, while the investigated polygon clearly has a neighbor. Altering the intersect/touch or adding a buffer did not work for me. Any clues?

Hi Mike – you are correct. I think ‘touches’ is very sensitive and works only in cases where polygons actually share the boundary. I have subsequently found that intersects works better. Try replacing touches with intersects and see if it helps. I will edit the post with the same.

Also the following might work if the polygons do not intersect as well. Buffer the geometry slightly (here 1 is the distance in the layer CRS, so change accordingly to a small number for your layer)

filter:=intersects($geometry, buffer(geometry(@parent), 1)).

Hi Ujaval,

I’m trying to do something very similar but I’m only looking for pairs along a provincial border. Ideally, I would love QGIS to return the neighbours that are only across a specific border. For example, for a specific census division in British Columbia, I want it to return all of the postal codes that are neighbouring it in Alberta. In my data, I have provincial identifiers as well as census division identifiers. Here’s a link to the data if you need to have a look https://www12.statcan.gc.ca/census-recensement/alternative_alternatif.cfm?l=eng&dispext=zip&teng=lcsd000a16a_e.zip&k=%20%20%20%2023264&loc=http://www12.statcan.gc.ca/census-recensement/2011/geo/bound-limit/files-fichiers/2016/lcsd000a16a_e.zip

Do you have any advice on how I could accomplish this? I’m still fairly new to QGIS and I’ve been trying out some things but with no luck! I’m certain there is a simple line of code I could use. Thanks for any advice you can give!

Hi Jerome,

This is doable. In the aggregate function, you have a ‘filter’ expression. There you can add a condition to select features that match a census division. Something like

aggregate(

layer:= ‘Zip_Codes’,

aggregate:=’concatenate’,

expression:=ZIPCODE,

concatenator:=’, ‘,

filter:=intersects($geometry, geometry(@parent)) AND PRNAME = ‘Alberta’

)

In the dataset you sent, there are no zipcode column. So I am assuming that is a separate layer. You will have to do a table join to get the census division and zipcodes in the same layer. If you can share the zipcode data, I can give you more detailed instructions.

Hi ujaval, thank you so much for your quick reply! I

I should have been more clear on exactly what I’m trying to accomplish. I want to know for any given census subdivision (CSD) which other CSDs border it that are also in another province. In other words, I’m trying to pair each CSD with its provincial neighbour.

The reason I’m doing this is that I’m essentially replicating a 2010 study on minimum wage by Dube, Lester, and Reich (https://escholarship.org/uc/item/86w5m90m) but with Canadian data. If you look at figure 2 in that paper (https://www.mitpressjournals.org/action/showImage?doi=10.1162%2FREST_a_00039&iName=master.img-001.jpg&type=master), it should be clear what I’m trying to accomplish. In the Dube paper, they pair counties that share a common state border. In my paper, I need to pair CSD’s that share a common provincial border.

As for the code you provided, it does not want to seem to work for me. It just gives me null results. I’ve tried using the Provincial Unique Identifier instead and tried different province names, but I just keep getting null results. I also tried using a buffer and that didn’t seem to solve the problem either, still a bunch of nulls! Not sure if you have any other ideas or reasons why this is happening?

Again, thank you so much, this has been incredibly helpful!

Hi ujaval,

I wrote a reply on here about a week ago, but I haven’t seen that comment appear on this webpage. So, if there are two similar replies to this, please ignore this one!

Anyways, I maybe should have been more clear on what I am trying to accomplish. Basically, I would like to get a string variable for each observation that contains a list of all of that observation’s neighbours that are in another province. So for example, the “neighbours” variable for Toronto (which is far from the provincial border) should be blank. Whereas the “neighbours” variable for Ottawa (which is on the Ontario-Quebec border) should have a list of all of the polygons in Quebec that neighbour it, and none of it’s Ontario neighbours. In other words, if you look at the data I posted above and go to Ottawa (CSDUID = 3506008), the “neighbours” variable for Ottawa should contain “2481017, 2480060, 2482030, 2484005” wheres the “neighbours” variable for Toronto (CSDUID = 3520005) should be empty. I hope this makes sense.

Just for added clarity, I’m essentially replicating this study (https://escholarship.org/uc/item/86w5m90m) with Canadian data. Here is a map they provided in their paper in case that makes things clearer (https://www.mitpressjournals.org/action/showImage?doi=10.1162/REST_a_00039&iName=master.img-001.jpg&w=361&h=257)

Also, I tried adding the ” AND PRNAME = ‘Alberta’ ” and I kept getting null results whether I added a buffer or not. I also tried AND PRUID = 10 and also got null results. I’m not sure what the issue is there?!

Thanks again for all of your help, I really appreciate it!!

Hi Ujaval,

Thank you very much for this.

Do you know if it is possible to do this kind of aggregation but limiting it to some kind of condition. E.g. I want to form aggregate neighbouring blocks but limiting the population inside this new conglomerate to 1000.

We have a block layer and of course a variable with the population in it.

Thank you in advance for your response or any workabout around this!!

Braulio

Hi Braulio – your problem is a bit different. Instead of finding the neighbors that meet some criteria – you want to aggregate neighboring polygons till a certain population threshold. I don’t think the aggregate function will help here. I did solve this problem a few years back using a pyqgis script. It needs slight modifications for QGIS3, but you get the general idea of the approach

https://gis.stackexchange.com/questions/138107/qgis-tool-for-aggregating-population-data-in-fishnet

Hi Ujaval,

First thank you for your great work, though unfortunately it hasn’t worked out for me yet 🙁

I edited the aggregate function accordingly, with:

Output field type: Text (String)

expression:= cell_id #this field is the identifier I’m using for that layer.

However, I keep getting this error:

Output preview: Expression is invalid (more info => Eval error: Could not calculate aggregate for cell_id)

(I tried to increase the length of the field to be created, but it didn’t help.)

What could be the reason for that?

Dear Juval, thank you for your helpful post. I am working with QGIS 3.8 and trying to find out number of points in neighbouring hexagons (after doing a grid of 20 x 20 meters). I used this for counting the number of neighbouring hexagons:

aggregate(

layer:= ‘copy’,

aggregate:=’concatenate’,

expression:=id2,

concatenator:=’, ‘,

filter:=intersects($geometry, geometry(@parent))

)

Then I used this for summing the number of points in the neighbouring hexagons:

aggregate(

layer:= ‘copy’,

aggregate:=’sum’,

expression:=NUMPOINTS,

filter:= intersects ($geometry, geometry(@parent))

)

This worked for one file (I have many), for others get stuck and QGIS 3.8 not responding (I think because of higher number of points?). Can you help me on this please?

Hi Ujuval,

Thank you for the perfect solution to my problem! Unfortunately, I cant seem to get it to work. I am using the following code as instructed:

aggregate(

layer:= ‘ndmMjrt3RimboPgn1m’,

aggregate:=’concatenate’,

expression:=gridcode,

concatenator:=’, ‘,

filter:=intersects($geometry, geometry(@parent))

)

But my output is Null.

I have tried it with and without a buffer border.

Here are the data. Thank you for the helpin advance.

https://drive.google.com/drive/folders/10tp7ePPIP8Uag8UteWGOLKo2O0_MvO0O?usp=sharing

Kind Regards

Nedim

Hi Nedim,

You are using the ‘concatenate’ aggregate to create. a comma separated string. It expectes the inputs to be strings too, but your column is a number. So you can convert it to a string and it will work. Use the function to_string() in the expression and it will take care of the conversion. You may also want to use concatenate_unique which will remove the duplicates for you. Below is the expression that works.

aggregate(

layer:= ‘ndmMjrt3RimboPgn’,

aggregate:=’concatenate_unique’,

expression:=to_string(gridcode),

concatenator:=’, ‘,

filter:=intersects($geometry, geometry(@parent))

)

Hi Ujival,

Thank you, this was really helpful.

I have a follow-up question as well: Is it possible to have the length of the neighbouring border shown as well?

For example,

polygon 1 with value 2

is bordering with polygon 2 for 10 meters, and polygon 3 for 4 meters, and polygon 4 for 18 meters.

And so on for the whole data-set

Kind regards

Nedim.

Interesting question. After hours of fiddling around, I have a solution for you. I tested it on a small subset of your dataset.

First, run ‘Fix geometries’ algorithm as you have many invalid geometries. Next, do not use ‘gridcode’ for neighbor calculation since they are not unique. Use ‘id’ instead. Add a new ‘neighbors’ field with following expression

array_to_string(array_remove_all(string_to_array(aggregate(

layer:= ‘Fixed geometries’,

aggregate:=’concatenate_unique’,

expression:=to_string(id),

concatenator:=’,’,

filter:=intersects($geometry, geometry(@parent)))), “id” ))

Now, run the ‘Boundary’ algorithm from processing toolbox to convert the polygons to lines. Then add an attribute ‘lengths’ using the following expression. You will have a comma-separated list of lengths.

array_to_string(array_foreach(string_to_array( “neighbors” ),

coalesce(length(intersection(geometry(get_feature(‘Boundary’, ‘id’, @element)), $geometry)), 0)))

Here’s the subset with the results

https://drive.google.com/open?id=1VvPTfTjUwfs61DkquVq2zZy3d632ZOr8

[…] a previous post, I showed how to use the aggregate function to find neighbor polygons using QGIS. Using aggregate functions on the same layer allows us to easily do geoprocessing […]

Hi Ujaval,

Thanks a lot for this really helpful blogpost. I am trying to do this for the Assembly Constituencies of India using the shapefiles from here – http://projects.datameet.org/maps/assembly-constituencies/. I created a new unique ID for the AC codes as they were not unique, but I still only see ‘invalid expression’. I also tried the tips you recommended (adding buffer, duplicating layer, using touches instead of intersects), but was not able to get it to work. Your help and suggestions would be greatly appreciated.

Thank you!

Shruti

Can you copy/paste your expression here?

I first created a new variable ‘ID’ that is unique, and then, following your code above, used the expression:

aggregate(

layer:= ‘India_AC’,

aggregate:=’concatenate’,

expression:=ID,

concatenator:=’, ‘,

filter:=intersects($geometry, geometry(@parent))

)

I am new to this, so apologies in advance if I have made a very basic error.

Hi Shruti,

The ID field is an integer, so when you try to aggregate it using ‘concatenate’ it fails since concatenate creates a string. Change your expression to to_string(“ID”) which will create a string from the ID and pass it to concatenate.

Hi Ujaval,

Thank you so much. I don’t see an invalid expression anymore and the code seems to be running, but it has been running for over 8 hours now! There are about 4000 unique IDs in the data. Do you have a sense of how long this process usually takes?

That seems long, but you are doing pair-wise computation for each polygon which are 4000×4000=16000000 operations. Test on a small subset (for a district or state) first.

hello thanks for your very clear and detailed article, but I wanted to ask if it is possible to use order_by in descending sense because I wanted to sort the polygons from smallest to largest and then with array_last () get the largest polygon. I tried array_first () but the first name of the array is the analyzed polygon.

There is a better way. Use array_remove_all() to remove the name of the analyzed polygon. See how to do in this post https://spatialthoughts.com/2020/04/08/calculating-shared-border-lengths/

many thanks for your reply. I will study it and in case I write if I have not understood something

ok my script works

array_to_string(array_remove_all(aggregate(

layer:= ‘africa’,

aggregate:=’array_agg’,

expression:= “ADM0_A3″ ,

filter:=intersects($geometry, geometry(@parent)),order_by:=”area”

), “ADM0_A3” ))

but…

I have the feeling that “order by” set on the area field is not working, is it set in descending or ascending order? and in case the default can be changed?

my goal is to put the neighboring polygons in order from the largest to the smallest.

Thanks in advance

Use $area instead of area

[…] We have a point layer named ghcn_stations that has the location of rain-gauge stations in Florida. We will use the aggregate() function to find the neighboring stations within a distance of 10km (Rmax). I describe the technique for finding neighbors in detail in my another post Find Neighbor Polygons using Summary Aggregate Function in QGIS. […]

I tried this out on a US ztca510 shape file. It seems to freeze the application with a Not Responding. Perhaps the dataset is too large?

aggregate(

layer:= ‘tl_2019_us_zcta510′,

aggregate:=’concatenate’,

expression:=ZCTA5CE10,

concatenator:=’, ‘,

filter:=intersects($geometry, geometry(@parent))

)

One limitation of the aggregate function is that it does not use a spatial index. If your layer has 1000 polygons, it will need to do 1000 x 1000 = 1000000 operations to find the neighbours. So it can be quite slow with large layers. Do this with smaller regions or let it run for a longer time.

As of December 25, 2020 the url for the Seattle Zip Code data is: https://data.seattle.gov/dataset/Zip-Codes/dk58-w4ct

[…] I tried to use the "Aggregate" function, with the expression: *sum("flat_num"=25")* but QGIS groups ALL the polygons together. I think I need to find the correct expression. For the neighboring polygons condition, I use: https://spatialthoughts.com/2019/05/23/neighbor-polygons-aggregate-qgis/ […]

Hi,

one additional challenge to this great algorithm is the implementation in to a processing model. In such a model I would run a couple of preprocessing steps like ‘fix’ or ‘dissolve’ to prepare the data. However, I have challenges to call the layer and expression. Any idea how I could adapt the aggregate function to run inside the fieldcalculator inside a Model? Thank You!

Try using @layer to refer to the layer. i.e. layer:=@layer

This was a lifesaver, thanks so much for posting this!