Skip to content

Conversation

@yutannihilation
Copy link
Contributor

@yutannihilation yutannihilation commented Dec 8, 2025

Related to #180.

It seems it's a bit hard to tackle #180 directly, so this pull request adds a simple validation script to check if the document reflects the implementation correctly. This checks, (1) if the functions listed in the reference matches with the actual implementation, and (2) if the functions are sorted in alphabetical order.

I need to check if each one is actual problem or just a false-positive, but this is the current result:

uv run ./docs/scripts/validate_sql.py

Functions only in implementation:
  - st_aswkb
  - st_aswkt
  - st_distance
  - st_distancesphere
  - st_distancespheroid
  - st_frechetdistance
  - st_geogfromtext
  - st_geometryfromtext
  - st_geomfromtext
  - st_hausdorffdistance
  - st_line_interpolate_point
  - st_line_locate_point
  - st_numinteriorrings

Functions only in document:
  - st_closestpoint
  - st_linelocatepoint


Traceback (most recent call last):
  File "/path/to/sedona-db/./docs/scripts/validate_sql.py", line 51, in <module>
    raise RuntimeError(
RuntimeError: There are some mismatch between the SQL reference and the actual implementation!

TODO:

  • Handle aliases
  • Check the order of the functions section by section
  • Insert missing functions to sql.md

@yutannihilation yutannihilation marked this pull request as draft December 8, 2025 15:18
Copy link
Member

@paleolimbot paleolimbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know you're still working here...this is great! Even as we restructure the SQL docs these checks are great to have will still apply somewhere.

This could get run in https://github.com/apache/sedona-db/blob/main/ci/scripts/build-docs.sh so that we can enforce this in CI.

@yutannihilation
Copy link
Contributor Author

Thanks! Yeah, it's always good to have a validation. What I'm still wondering is when we should enforce it... Before making this check mandatory, I hope this script can have some functionality to insert the new section to sql.md from the description and examples if there's any missing function name.

WHEN 'rs' THEN 'raster'
ELSE 'unknown'
END AS data_type,
count(*) OVER (PARTITION BY description) > 1 as has_alias
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems the information about aliases cannot be retrieved by SQL. I added a tweak to guess it from description, but I'm not sure if this is reliable enough.

I might rewrite the validation script by using Rust just like DataFusion does to generate the documentation.

https://github.com/apache/datafusion/blob/5a01e68643a198a1aaa7124524d7be5be7df24ec/datafusion/core/src/bin/print_functions_docs.rs#L174-L179

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants