-
Notifications
You must be signed in to change notification settings - Fork 38
perf: Implement (2x) faster ST_Buffer kernel using geo
#233
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| } | ||
|
|
||
| #[test] | ||
| fn test_empty_geometry() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test suite here is the same as the geos buffer test suite plus this new function, which I also copied over to geos st_buffer to be sure it works the same.
| // PostGIS returns POLYGON EMPTY for all empty geometries | ||
| let is_empty = is_geometry_empty(wkb).map_err(|e| DataFusionError::External(Box::new(e)))?; | ||
| if is_empty { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was running into an error here with POINT EMPTY since geo apparently doesn't support it.
sedona-db/rust/sedona-geo/src/to_geo.rs
Lines 62 to 66 in 3d75664
| not_impl_err!( | |
| "geo kernel implementation on {}, {}, or {} not supported", | |
| "MULTIPOINT with EMPTY child", | |
| "POINT EMPTY", | |
| "GEOMETRYCOLLECTION" |
My current workaround is to use WKBExecutor here instead of GeoTypesExecutor and use our native is_geometry_empty check.
Wondering if there's a better way we can handle empty points in our item_to_geometry() method. The docstring for geo's try_to_point() function we are in there says returning None represents an empty point. Though returning None is not a safe option, so I can't think of anything better than this workaround atm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wondering if there's a better way we can handle empty points in item_geometry()
We could have it return something like enum ItemToGeometryResult { Unsupported(Wkb), Supported(Geometry))? (No need to do that here unless you're excited about it).
I suppose it might be more complicated because each algorithm might have different considerations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
still not the cleanest solution, but i do like the idea. though, i prefer to keep it simple for now. we can revisit this as we get a better idea of what we need to handle as we encounter this issue more for other functions.
|
Are there any convenient existing ways to benchmark this against the |
This should work: # on main
cargo bench -- st_buffer
# switch to your branch
cargo bench -- st_buffer |
geogeo
|
Looks like this is indeed faster. I've updated the PR description with the benchmark results. |
paleolimbot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
We should probably also look into optimizing GeosGeometry writing to BinaryBuilder...to_wkb() and then builder.write() is probably not as fast as peeking into all the geometries and writing them to the output in place. (I think this will still be faster though and is easier since it's all rust!)
| @pytest.mark.parametrize("eng", [SedonaDB, PostGIS]) | ||
| @pytest.mark.parametrize( | ||
| ("geom", "dist", "expected"), | ||
| [ | ||
| ("POINT EMPTY", 2.0, "POLYGON EMPTY"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the distance doesn't matter here it might be easier for future us to mentally parse if the parameters were just geom and expected (or even just geom since the result is identical)
| // PostGIS returns POLYGON EMPTY for all empty geometries | ||
| let is_empty = is_geometry_empty(wkb).map_err(|e| DataFusionError::External(Box::new(e)))?; | ||
| if is_empty { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wondering if there's a better way we can handle empty points in item_geometry()
We could have it return something like enum ItemToGeometryResult { Unsupported(Wkb), Supported(Geometry))? (No need to do that here unless you're excited about it).
I suppose it might be more complicated because each algorithm might have different considerations.
closes #232
Performance is about 2x faster:
sedona-geos
sedona-geo (this PR)