Skip to content

Added GenericAliasHandler class [ENG-226]#76

Merged
eywalker merged 2 commits intowalkerlab:devfrom
brian-arnold:claude/generic-alias-hasher
Mar 10, 2026
Merged

Added GenericAliasHandler class [ENG-226]#76
eywalker merged 2 commits intowalkerlab:devfrom
brian-arnold:claude/generic-alias-hasher

Conversation

@brian-arnold
Copy link
Collaborator

The semantic hasher had no registered handler for types.GenericAlias — the type Python uses to represent parameterized generics like dict[int, str] or list[int]. Plain types like dict or list worked fine, but as soon as key/value types were specified, the hasher raised a TypeError in strict mode.

This affected both user-written type annotations on @function_pod functions and orcapod's own internal schema inference, which produces parameterized generics like dict[key_type, value_type].

The fix adds a GenericAliasHandler that produces a stable serializable representation by recursively hashing origin and args, and registers it for both types.GenericAlias and typing._GenericAlias.

Original error message:

Traceback (most recent call last): File "/home/barnold/personal_repos/orcapod-testing/02_simple_ephys_examples/ephys_pipeline.py", line 25, in <module> @function_pod(output_keys=["cluster_spike_trains"]) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/barnold/personal_repos/orcapod-python/src/orcapod/core/function_pod.py", line 633, in decorator packet_function = PythonPacketFunction( ^^^^^^^^^^^^^^^^^^^^^ File "/home/barnold/personal_repos/orcapod-python/src/orcapod/core/packet_function.py", line 359, in init self._output_schema_hash = semantic_hasher.hash_object( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/barnold/personal_repos/orcapod-python/src/orcapod/hashing/semantic_hashing/semantic_hasher.py", line 167, in hash_object expanded = self._expand_structure( ^^^^^^^^^^^^^^^^^^^^^^^ File "/home/barnold/personal_repos/orcapod-python/src/orcapod/hashing/semantic_hashing/semantic_hasher.py", line 253, in _expand_structure return self._expand_mapping(obj, _visited, resolver=resolver) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/barnold/personal_repos/orcapod-python/src/orcapod/hashing/semantic_hashing/semantic_hasher.py", line 313, in _expand_mapping items[str_key] = self._expand_element(v, _visited, resolver=resolver) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/barnold/personal_repos/orcapod-python/src/orcapod/hashing/semantic_hashing/semantic_hasher.py", line 301, in _expand_element return self.hash_object(obj, resolver=resolver).to_string() ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/barnold/personal_repos/orcapod-python/src/orcapod/hashing/semantic_hashing/semantic_hasher.py", line 198, in hash_object fallback = self._handle_unknown(obj) ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/barnold/personal_repos/orcapod-python/src/orcapod/hashing/semantic_hashing/semantic_hasher.py", line 380, in _handle_unknown raise TypeError( TypeError: BaseSemanticHasher (strict): no TypeHandlerProtocol registered for type 'types.GenericAlias' and it does not implement ContentIdentifiableProtocol. Register a TypeHandlerProtocol via the TypeHandlerRegistry or implement identity_structure() on the class. 

@brian-arnold brian-arnold requested a review from eywalker March 6, 2026 01:30
@codecov-commenter
Copy link

codecov-commenter commented Mar 6, 2026

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 84.21053% with 3 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
...capod/hashing/semantic_hashing/builtin_handlers.py 84.21% 3 Missing ⚠️

📢 Thoughts on this report? Let us know!

@brian-arnold brian-arnold changed the title Added GenericAliasHandler class Added GenericAliasHandler class [ENG-226] Mar 6, 2026
Copy link
Contributor

@eywalker eywalker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding support to this! Could you add new tests that cover the GenericAliasHandler usage?

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds support for hashing parameterized generic type annotations (e.g. dict[int, str]) by introducing and registering a new semantic hashing handler, and fixes a truthiness check in Python→Arrow conversion to avoid failures on containers with ambiguous boolean evaluation.

Changes:

  • Introduce GenericAliasHandler and register it for types.GenericAlias and (programmatically) typing._GenericAlias.
  • Update the default data context (v0.1.json) to register the new handler for types.GenericAlias.
  • Adjust Python→Arrow list/map converters to use value is not None rather than if value.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

File Description
src/orcapod/semantic_types/universal_converter.py Avoids truthiness checks when converting list/map values to Arrow-friendly structures.
src/orcapod/hashing/semantic_hashing/builtin_handlers.py Adds and registers GenericAliasHandler for parameterized generic annotations.
src/orcapod/contexts/data/v0.1.json Registers GenericAliasHandler in the versioned default context’s TypeHandlerRegistry.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +175 to +194
class GenericAliasHandler:
"""
Handler for generic alias type annotations such as ``dict[int, list[int]]``
(``types.GenericAlias``) and ``typing`` generics (``typing._GenericAlias``).

Produces a stable dict containing the origin type and a list of hashed
argument types so that structurally identical generic annotations always
yield the same hash, and structurally different ones yield different hashes.
"""

def handle(self, obj: Any, hasher: "SemanticHasherProtocol") -> Any:
origin = getattr(obj, "__origin__", None)
args = getattr(obj, "__args__", None) or ()
if origin is None:
return f"generic_alias:{obj!r}"
return {
"__type__": "generic_alias",
"origin": hasher.hash_object(origin).to_string(),
"args": [hasher.hash_object(arg).to_string() for arg in args],
}
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are extensive semantic hasher tests, but none appear to cover parameterized generics (e.g. dict[int, str], list[int], typing.Dict[int, str]). Adding a regression test that hashes these types in strict mode (and asserts stable equality/inequality across different args) would prevent this handler from silently breaking in future refactors.

Copilot uses AI. Check for mistakes.
Comment on lines 631 to 648
elif origin is list:
element_converter = self.get_python_to_arrow_converter(args[0])
return (
lambda value: [element_converter(item) for item in value]
if value
if value is not None
else []
)

elif origin is dict:
key_converter = self.get_python_to_arrow_converter(args[0])
value_converter = self.get_python_to_arrow_converter(args[1])
return (
lambda value: [
{"key": key_converter(k), "value": value_converter(v)}
for k, v in value.items()
]
if value
if value is not None
else []
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change fixes truthiness checks for list/dict values, but there’s no regression test ensuring python_to_arrow conversion works for iterable containers with an ambiguous/invalid __bool__ (e.g. numpy arrays raise on if value). Consider adding a test using a small custom iterable whose __bool__ raises, and verifying conversion still succeeds when the value is empty/non-empty.

Copilot uses AI. Check for mistakes.
Comment on lines 60 to 65
[{"_type": "types.BuiltinFunctionType"}, {"_class": "orcapod.hashing.semantic_hashing.builtin_handlers.FunctionHandler", "_config": {"function_info_extractor": {"_ref": "function_info_extractor"}}}],
[{"_type": "types.MethodType"}, {"_class": "orcapod.hashing.semantic_hashing.builtin_handlers.FunctionHandler", "_config": {"function_info_extractor": {"_ref": "function_info_extractor"}}}],
[{"_type": "builtins.type"}, {"_class": "orcapod.hashing.semantic_hashing.builtin_handlers.TypeObjectHandler", "_config": {}}],
[{"_type": "types.GenericAlias"}, {"_class": "orcapod.hashing.semantic_hashing.builtin_handlers.GenericAliasHandler", "_config": {}}],
[{"_type": "pyarrow.Table"}, {"_class": "orcapod.hashing.semantic_hashing.builtin_handlers.ArrowTableHandler", "_config": {"arrow_hasher": {"_ref": "arrow_hasher"}}}],
[{"_type": "pyarrow.RecordBatch"}, {"_class": "orcapod.hashing.semantic_hashing.builtin_handlers.ArrowTableHandler", "_config": {"arrow_hasher": {"_ref": "arrow_hasher"}}}]
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

type_handler_registry in this context only registers types.GenericAlias, but the PR description (and register_builtin_handlers) also registers typing._GenericAlias. Because the default semantic_hasher here uses this JSON-constructed registry, typing generics like typing.Dict[int, str] / typing.Optional[int] can still raise in strict mode. Consider adding a typing._GenericAlias entry (or switching this context to a registry implementation that conditionally registers it).

Copilot uses AI. Check for mistakes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@brian-arnold this is a good point. Shall we register _GenericAlias as well?

registry.register(type, TypeObjectHandler())

# generic alias type annotations: dict[int, str], list[str], etc.
import types as _types
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

register_builtin_handlers imports types as _types twice (once for functions, again for GenericAlias). This is redundant and makes the section a bit harder to read; you can reuse the existing _types import from the functions block instead of re-importing it.

Suggested change
import types as _types

Copilot uses AI. Check for mistakes.
…ping._SpecialForm objects (like typing.Union) that appear as __origin__ on typing generics

  - Registered typing._SpecialForm alongside typing._GenericAlias in register_builtin_handlers, and removed the duplicate import types as _types
  - Added typing._GenericAlias and typing._SpecialForm entries to v0.1.json so the JSON-constructed hasher also handles them correctly
@brian-arnold
Copy link
Collaborator Author

@eywalker I addressed all comments above — I've added a TestGenericAliasHandler test class with 11 tests.

I also needed to add a SpecialFormHandler for typing._SpecialForm (e.g. typing.Union), since Optional[int].origin is typing.Union, which was unregistered and caused strict-mode failures.

I also added typing._GenericAlias (to support e.g. Dict[int,str] as well as dict[int,str]) and typing._SpecialForm entries to v0.1.json so the JSON-constructed hasher handles them too.

All new and existing tests pass.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Comment on lines +363 to +376
# generic alias type annotations: dict[int, str], list[str], etc.
generic_alias_handler = GenericAliasHandler()
registry.register(_types.GenericAlias, generic_alias_handler)
# typing._GenericAlias covers Optional[X], Union[X, Y], Dict[K, V], etc.
# typing._SpecialForm covers typing.Union, typing.ClassVar, etc. which
# appear as __origin__ on those generics (e.g. Optional[int].__origin__
# is typing.Union, a _SpecialForm).
try:
import typing as _typing

registry.register(_typing._GenericAlias, generic_alias_handler) # type: ignore[attr-defined]
registry.register(_typing._SpecialForm, SpecialFormHandler()) # type: ignore[attr-defined]
except AttributeError:
pass
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

types.GenericAlias and typing._GenericAlias are now handled, but PEP 604 union annotations (instances of types.UnionType, e.g. int | None / int | str) are still unhandled in strict mode and will raise the same "no TypeHandlerProtocol registered" TypeError when they appear in user annotations or internal schema inference (e.g. default_type | None in semantic_types/pydata_utils.py). Consider adding a handler/registration for types.UnionType (and ensure it hashes equivalently to typing.Union[...] / typing.Optional[...] if you want those forms to be interchangeable).

Copilot uses AI. Check for mistakes.
Comment on lines +539 to +544
def test_optional_hashed(self, hasher):
"""Optional[int] (a typing._GenericAlias) produces a ContentHash."""
import typing

assert isinstance(hasher.hash_object(typing.Optional[int]), ContentHash)

Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new tests cover typing.Optional[...] / typing.Union[...], but they don’t cover PEP 604 unions (int | None, int | str), which are common in Python 3.11+ and are used internally in semantic_types/pydata_utils.py (e.g. default_type | None). Adding at least one test that hasher.hash_object(int | None) succeeds (and ideally matches typing.Optional[int] if that equivalence is intended) would prevent regressions.

Copilot uses AI. Check for mistakes.
Comment on lines +1034 to +1036
assert reg.has_handler(_types.GenericAlias)
assert reg.has_handler(_typing._GenericAlias) # type: ignore[attr-defined]
assert reg.has_handler(_typing._SpecialForm) # type: ignore[attr-defined]
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this test is asserting that the default registry includes the new typing handlers, it would be good to also assert support for PEP 604 unions (types.UnionType) here if you add that handler. Otherwise the default-context registry can still be missing coverage for int | None annotations even though typing.Optional[...] works.

Copilot uses AI. Check for mistakes.
Copy link
Contributor

@eywalker eywalker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@brian-arnold would you mind creating issues corresponding to my comments? When issues are created in Linear, I'm happy to merge this!


def handle(self, obj: Any, hasher: "SemanticHasherProtocol") -> Any:
name = getattr(obj, "_name", None) or repr(obj)
return f"special_form:typing.{name}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the case of Optional[int], I'd like this to hash to identical value as int|None as these two are semantically identical. I'm going to merge this PR but let's be sure to address this correctly when implementing for UnionType

if origin is None:
return f"generic_alias:{obj!r}"
return {
"__type__": "generic_alias",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm starting to think that it'd actually be better to consider how to handle both type and Generic alias in a uniform/consistent manner. Let's address this as a separate issue.

return (
lambda value: [element_converter(item) for item in value]
if value
if value is not None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm actually not sure about dropping the value if it's None. I think this ought to be correctly expressed?

@eywalker eywalker merged commit 1b84eb2 into walkerlab:dev Mar 10, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants