-
Notifications
You must be signed in to change notification settings - Fork 38
feat(rust/sedona-raster-functions): Add RS_SRID and RS_CRS #430
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
paleolimbot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I think an LRU cache may also help, for this specific case you are also deserializing all of the raster fields while you only need a single struct child to compute the CRS or SRID.
For the case of the Crs, I think you can just return a clone of that column (i.e., you shouldn't need to peek into the values at all, just clone the ArrayRef!)
I did consider just spitting out the same thing as we read as that would definitely be very fast, but I thought that it would create odd inconsistencies. For example, if you set a raster's CRS to 'EPSG:4326' it would spit that out. If you transform a raster to 'EPSG:4326' then its CRS would be 'OGC:CRS84'. Maybe that doesn't matter. Either way, when we run functions like intersects we are going to need to ask if the raster with 'EPSG:4326' has the same CRS as the raster/geo with CRS 'OGC:CRS84' (after doing a fast string equality check that will fail), so having a CRS cache is still going to be important. I have some nice flamegraphs that are pointing to quite a few additional opportunities. |
|
The current state of CRS: After the caching and a few more changes I have it down to: |
|
I can also see a plus for trying to do more validation during read/build so that we already have it in the right printable/comparable format. |
|
Perf improvements with #446 have provided a 5x and 10x improvements to these functions. Updated the summary with the new times. |
paleolimbot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
First implementation for
RS_SRIDandRS_CRS.Important: these are not fast enough. I need to invest into caching or thinking through another way of storing the CRS strings as this solution is currently too slow. I experimented with having an LRUCache in this function and it was 4x faster. Which is great, but this
Example:
Times (after PR with linked CRS perf improvements):