Skip to content

Conversation

@petern48
Copy link
Collaborator

@petern48 petern48 commented Oct 19, 2025

closes #232

Performance is about 2x faster:

sedona-geos

geos-st_buffer-ArrayScalar(Polygon(10), Float64(1.0, 10.0))
                        time:   [1.2389 s 1.2621 s 1.2871 s]

geos-st_buffer-ArrayScalar(Polygon(50), Float64(1.0, 10.0))
                        time:   [2.3664 s 2.3742 s 2.3825 s]

sedona-geo (this PR)

geo-st_buffer-ArrayScalar(Polygon(10), Float64(1.0, 10.0))
                        time:   [597.68 ms 613.95 ms 632.17 ms]

geo-st_buffer-ArrayScalar(Polygon(50), Float64(1.0, 10.0))
                        time:   [1.1246 s 1.1285 s 1.1327 s]

}

#[test]
fn test_empty_geometry() {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test suite here is the same as the geos buffer test suite plus this new function, which I also copied over to geos st_buffer to be sure it works the same.

Comment on lines +105 to +107
// PostGIS returns POLYGON EMPTY for all empty geometries
let is_empty = is_geometry_empty(wkb).map_err(|e| DataFusionError::External(Box::new(e)))?;
if is_empty {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was running into an error here with POINT EMPTY since geo apparently doesn't support it.

not_impl_err!(
"geo kernel implementation on {}, {}, or {} not supported",
"MULTIPOINT with EMPTY child",
"POINT EMPTY",
"GEOMETRYCOLLECTION"

My current workaround is to use WKBExecutor here instead of GeoTypesExecutor and use our native is_geometry_empty check.

Wondering if there's a better way we can handle empty points in our item_to_geometry() method. The docstring for geo's try_to_point() function we are in there says returning None represents an empty point. Though returning None is not a safe option, so I can't think of anything better than this workaround atm.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering if there's a better way we can handle empty points in item_geometry()

We could have it return something like enum ItemToGeometryResult { Unsupported(Wkb), Supported(Geometry))? (No need to do that here unless you're excited about it).

I suppose it might be more complicated because each algorithm might have different considerations.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still not the cleanest solution, but i do like the idea. though, i prefer to keep it simple for now. we can revisit this as we get a better idea of what we need to handle as we encounter this issue more for other functions.

@petern48
Copy link
Collaborator Author

Are there any convenient existing ways to benchmark this against the geos implementation? I'm not aware of any. Not yet sure if this is faster or not

@paleolimbot
Copy link
Member

Are there any convenient existing ways to benchmark this against the geos implementation?

This should work:

# on main
cargo bench -- st_buffer
# switch to your branch
cargo bench -- st_buffer

@petern48 petern48 changed the title perf: ST_Buffer implementation using geo perf: Implement (2x) faster ST_Buffer kernel using geo Dec 14, 2025
@petern48
Copy link
Collaborator Author

Looks like this is indeed faster. I've updated the PR description with the benchmark results.

@petern48 petern48 marked this pull request as ready for review December 14, 2025 07:25
@petern48 petern48 requested a review from paleolimbot December 14, 2025 07:25
Copy link
Member

@paleolimbot paleolimbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

We should probably also look into optimizing GeosGeometry writing to BinaryBuilder...to_wkb() and then builder.write() is probably not as fast as peeking into all the geometries and writing them to the output in place. (I think this will still be faster though and is easier since it's all rust!)

Comment on lines 270 to 274
@pytest.mark.parametrize("eng", [SedonaDB, PostGIS])
@pytest.mark.parametrize(
("geom", "dist", "expected"),
[
("POINT EMPTY", 2.0, "POLYGON EMPTY"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the distance doesn't matter here it might be easier for future us to mentally parse if the parameters were just geom and expected (or even just geom since the result is identical)

Comment on lines +105 to +107
// PostGIS returns POLYGON EMPTY for all empty geometries
let is_empty = is_geometry_empty(wkb).map_err(|e| DataFusionError::External(Box::new(e)))?;
if is_empty {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering if there's a better way we can handle empty points in item_geometry()

We could have it return something like enum ItemToGeometryResult { Unsupported(Wkb), Supported(Geometry))? (No need to do that here unless you're excited about it).

I suppose it might be more complicated because each algorithm might have different considerations.

@petern48 petern48 merged commit 0f51162 into apache:main Dec 16, 2025
14 checks passed
@petern48 petern48 deleted the st_buffer_geo branch December 16, 2025 15:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

perf: Investigate if geo's ST_Buffer implementation is faster than current geos

2 participants