Skip to main content
Notice removed Draw attention by Babel
Bounty Ended with MrXsquared's answer chosen by Babel
added 236 characters in body
Source Link
Babel
  • 80.4k
  • 15
  • 97
  • 245

Histogram, created with DataPlotly plugin: distribution of construction years, with the highest peak at 1980/81, lower peaks at 1928-1931 etc.: enter image description here

Visualization

Visualization

Histogram, created with DataPlotly plugin: distribution of construction years, with the highest peak at 1980/81, lower peaks at 1928-1931 etc.: enter image description here

Visualization

Notice added Draw attention by Babel
Bounty Started worth 100 reputation by Babel

Here are a few ideas I had how to make the expression more efficient. All of these approaches have some shortcomings, however. And particularilyparticularly, I'm stuck how to combine them in the most efficient way. And probably there are other approaches, not considered yet.

  1. Run it sequentially, like in the if-clause above: for features 1 to 999, than for 999 to 1999 and so on. Not very efficient, however.

  2. limit:=1000 to reduce the numernumber of elements in the array created by overlay_nearest() to 1000 is unflexible: the newer a building is, the higher chances are that very close to it, you'll find one that is older. Thus, for the majority of the buildings (who are constructed in the last few decades), you won't need to identify a fixed number of 1000 nearest neighbors - a numbenumber of 50 or 100 or so would be OK. So the fixed value 1000 could be replaced by a formula that returns an inverse proportional value regarding the construction year. However, how to get an "optimum" formula, based on the distribution of the values in my field construction_year? For this, compare the statistical values below.

  3. Not for every feature a "match" has to be found in one pass. For some features, in the first round and using the limits defined, no matching id with an older building could be found. These NULL value cases could be calculated (based on a condition if NULL) in a next iteration. So an iterative approach could be used - but how to set it up?

This shows the principle: each point is labeled with it'sits construction year and connected by a red arrow to the nearest point with an older construction year. The point labeled with 1958 at the very bottom is connected to a point with label 1940, even though it has four neighboring points at a nearer distance, but with newer construction date: 1986, 1969, 1996 and 1960 - so it goes on until the first (nearest) point is found with an older construction date: enter image description here

Here are a few ideas I had how to make the expression more efficient. All of these approaches have some shortcomings, however. And particularily, I'm stuck how to combine them in the most efficient way. And probably there are other approaches, not considered yet.

  1. Run it sequentially, like in the if-clause above: for features 1 to 999, than for 999 to 1999 and so on. Not very efficient, however.

  2. limit:=1000 to reduce the numer of elements in the array created by overlay_nearest() to 1000 is unflexible: the newer a building is, the higher chances are that very close to it you'll find one that is older. Thus for the majority of the buildings (who are constructed in the last few decades), you won't need to identify a fixed number of 1000 nearest neighbors - a numbe of 50 or 100 or so would be OK. So the fixed value 1000 could be replaced by a formula that returns an inverse proportional value regarding the construction year. However, how to get an "optimum" formula, based on the distribution of the values in my field construction_year? For this, compare the statistical values below.

  3. Not for every feature a "match" has to be found in one pass. For some features, in the first round and using the limits defined, no matching id with an older building could be found. These NULL value cases could be calculated (based on a condition if NULL) in a next iteration. So an iterative approach could be used - but how to set it up?

This shows the principle: each point is labeled with it's construction year and connected by a red arrow to the nearest point with an older construction year. The point labeled with 1958 at the very bottom is connected to a point with label 1940, even though it has four neighboring points at a nearer distance, but with newer construction date: 1986, 1969, 1996 and 1960 - so it goes on until the first (nearest) point is found with an older construction date: enter image description here

Here are a few ideas I had how to make the expression more efficient. All of these approaches have some shortcomings, however. And particularly, I'm stuck how to combine them in the most efficient way. And probably there are other approaches, not considered yet.

  1. Run it sequentially, like in the if-clause above: for features 1 to 999, than for 999 to 1999 and so on. Not very efficient, however.

  2. limit:=1000 to reduce the number of elements in the array created by overlay_nearest() to 1000 is unflexible: the newer a building is, the higher chances are that very close to it, you'll find one that is older. Thus, for the majority of the buildings (who are constructed in the last few decades), you won't need to identify a fixed number of 1000 nearest neighbors - a number of 50 or 100 or so would be OK. So the fixed value 1000 could be replaced by a formula that returns an inverse proportional value regarding the construction year. However, how to get an "optimum" formula, based on the distribution of the values in my field construction_year? For this, compare the statistical values below.

  3. Not for every feature a "match" has to be found in one pass. For some features, in the first round and using the limits defined, no matching id with an older building could be found. These NULL value cases could be calculated (based on a condition if NULL) in a next iteration. So an iterative approach could be used - but how to set it up?

This shows the principle: each point is labeled with its construction year and connected by a red arrow to the nearest point with an older construction year. The point labeled with 1958 at the very bottom is connected to a point with label 1940, even though it has four neighboring points at a nearer distance, but with newer construction date: 1986, 1969, 1996 and 1960 - so it goes on until the first (nearest) point is found with an older construction date: enter image description here

Tweeted twitter.com/StackGIS/status/1394352031859544067
added 686 characters in body
Source Link
Babel
  • 80.4k
  • 15
  • 97
  • 245

The setting I have a Geopackage point layer in QGIS 3.18 with over 350.000 features for an area of about 1700 km² (extent ca. 50*60 km), representing centroids of buildings. The points contain an attribute with the construction year of the building: from 1000 to 2020. A few statistical values, based on Basic statistics for fields, can be found below. CRS is EPSG:2056 (projected CRS for Switzerland, units=m).

What I want to do The idea now is to find for each building the nearest building that is older and create an attribute nearest_older with the fid of this next older building - see the visualization at the bottom. 

In a conceptual sense, it is similar to the concept of Topographic isolation: for a summit, find the minimum distance to a point of equal/higher elevation.

Visualization

This shows the principle: each point is labeled with it's construction year and connected by a red arrow to the nearest point with an older construction year. The point labeled with 1958 at the very bottom is connected to a point with label 1940, even though it has four neighboring points at a nearer distance, but with newer construction date: 1986, 1969, 1996 and 1960 - so it goes on until the first (nearest) point is found with an older construction date: enter image description here

The setting I have a Geopackage point layer in QGIS 3.18 with over 350.000 features for an area of about 1700 km² (extent ca. 50*60 km), representing centroids of buildings. The points contain an attribute with the construction year of the building: from 1000 to 2020. A few statistical values, based on Basic statistics for fields, can be found below.

What I want to do The idea now is to find for each building the nearest building that is older and create an attribute nearest_older with the fid of this next older building. In a conceptual sense, it is similar to the concept of Topographic isolation: for a summit, find the minimum distance to a point of equal/higher elevation.

The setting I have a Geopackage point layer in QGIS 3.18 with over 350.000 features for an area of about 1700 km² (extent ca. 50*60 km), representing centroids of buildings. The points contain an attribute with the construction year of the building: from 1000 to 2020. A few statistical values, based on Basic statistics for fields, can be found below. CRS is EPSG:2056 (projected CRS for Switzerland, units=m).

What I want to do The idea now is to find for each building the nearest building that is older and create an attribute nearest_older with the fid of this next older building - see the visualization at the bottom. 

In a conceptual sense, it is similar to the concept of Topographic isolation: for a summit, find the minimum distance to a point of equal/higher elevation.

Visualization

This shows the principle: each point is labeled with it's construction year and connected by a red arrow to the nearest point with an older construction year. The point labeled with 1958 at the very bottom is connected to a point with label 1940, even though it has four neighboring points at a nearer distance, but with newer construction date: 1986, 1969, 1996 and 1960 - so it goes on until the first (nearest) point is found with an older construction date: enter image description here

Source Link
Babel
  • 80.4k
  • 15
  • 97
  • 245
Loading