Skip to content

Conversation

@jesspav
Copy link
Collaborator

@jesspav jesspav commented Dec 9, 2025

First implementation for RS_SRID and RS_CRS.

Important: these are not fast enough. I need to invest into caching or thinking through another way of storing the CRS strings as this solution is currently too slow. I experimented with having an LRUCache in this function and it was 4x faster. Which is great, but this

Example:

> select rs_srid(rs_example());
┌───────────────────────┐
│ rs_srid(rs_example()) │
│         uint32        │
╞═══════════════════════╡
│                  4326 │
└───────────────────────┘


> select rs_crs(rs_example());
┌──────────────────────┐
│ rs_crs(rs_example()) │
│         utf8         │
╞══════════════════════╡
│ OGC:CRS84            │
└──────────────────────┘

Times (after PR with linked CRS perf improvements):

native-raster-rs_crs-Array(Raster(64, 64))
                        time:   [4.1085 ms 4.1121 ms 4.1171 ms]
Found 5 outliers among 100 measurements (5.00%)
  1 (1.00%) low mild
  4 (4.00%) high severe

native-raster-rs_srid-Array(Raster(64, 64))
                        time:   [1.9177 ms 1.9313 ms 1.9466 ms]
Found 14 outliers among 100 measurements (14.00%)
  4 (4.00%) high mild
  10 (10.00%) high severe

Copy link
Member

@paleolimbot paleolimbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I think an LRU cache may also help, for this specific case you are also deserializing all of the raster fields while you only need a single struct child to compute the CRS or SRID.

For the case of the Crs, I think you can just return a clone of that column (i.e., you shouldn't need to peek into the values at all, just clone the ArrayRef!)

@jesspav
Copy link
Collaborator Author

jesspav commented Dec 9, 2025

For the case of the Crs, I think you can just return a clone of that column (i.e., you shouldn't need to peek into the values at all, just clone the ArrayRef!)

I did consider just spitting out the same thing as we read as that would definitely be very fast, but I thought that it would create odd inconsistencies. For example, if you set a raster's CRS to 'EPSG:4326' it would spit that out. If you transform a raster to 'EPSG:4326' then its CRS would be 'OGC:CRS84'. Maybe that doesn't matter.

Either way, when we run functions like intersects we are going to need to ask if the raster with 'EPSG:4326' has the same CRS as the raster/geo with CRS 'OGC:CRS84' (after doing a fast string equality check that will fail), so having a CRS cache is still going to be important.

I have some nice flamegraphs that are pointing to quite a few additional opportunities.

@jesspav
Copy link
Collaborator Author

jesspav commented Dec 9, 2025

The current state of CRS:

LngLat:
native-raster-rs_crs-Array(Raster(64, 64)) =>                time:   [22.640 ms 22.726 ms 22.815 ms]
native-raster-rs_srid-Array(Raster(64, 64)) =>               time:   [22.817 ms 22.864 ms 22.912 ms]

EPSG:3857 :
native-raster-rs_crs-Array(Raster(64, 64)) =>                time:   [40.386 ms 40.486 ms 40.583 ms]
native-raster-rs_srid-Array(Raster(64, 64)) =>               time:   [34.926 ms 35.034 ms 35.147 ms]

After the caching and a few more changes I have it down to:

LngLat:
native-raster-rs_crs-Array(Raster(64, 64)) =>                time:   [4.0605 ms 4.0707 ms 4.0813 ms]
native-raster-rs_srid-Array(Raster(64, 64)) =>               time:   [4.0278 ms 4.0364 ms 4.0451 ms]

EPSG:3857 :
native-raster-rs_crs-Array(Raster(64, 64)) =>                time:   [4.1869 ms 4.1989 ms 4.2110 ms]
native-raster-rs_srid-Array(Raster(64, 64)) =>               time:   [4.1030 ms 4.1114 ms 4.1196 ms]
```

@jesspav
Copy link
Collaborator Author

jesspav commented Dec 9, 2025

I can also see a plus for trying to do more validation during read/build so that we already have it in the right printable/comparable format.

@jesspav jesspav changed the title Rs srid [WIP] Add RS_SRID and RS_CRS Dec 12, 2025
@jesspav
Copy link
Collaborator Author

jesspav commented Dec 18, 2025

Perf improvements with #446 have provided a 5x and 10x improvements to these functions. Updated the summary with the new times.
This is now ready for review.

@jesspav jesspav marked this pull request as ready for review December 18, 2025 21:52
Copy link
Member

@paleolimbot paleolimbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@paleolimbot paleolimbot changed the title [WIP] Add RS_SRID and RS_CRS feat(rust/sedona-raster-functions): Add RS_SRID and RS_CRS Dec 19, 2025
@paleolimbot paleolimbot merged commit 37c30f3 into apache:main Dec 19, 2025
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants