Added GenericAliasHandler class [ENG-226] by brian-arnold · Pull Request #76 · walkerlab/orcapod-python

brian-arnold · 2026-03-06T01:30:55Z

The semantic hasher had no registered handler for types.GenericAlias — the type Python uses to represent parameterized generics like dict[int, str] or list[int]. Plain types like dict or list worked fine, but as soon as key/value types were specified, the hasher raised a TypeError in strict mode.

This affected both user-written type annotations on @function_pod functions and orcapod's own internal schema inference, which produces parameterized generics like dict[key_type, value_type].

The fix adds a GenericAliasHandler that produces a stable serializable representation by recursively hashing origin and args, and registers it for both types.GenericAlias and typing._GenericAlias.

Original error message:

Traceback (most recent call last): File "/home/barnold/personal_repos/orcapod-testing/02_simple_ephys_examples/ephys_pipeline.py", line 25, in <module> @function_pod(output_keys=["cluster_spike_trains"]) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/barnold/personal_repos/orcapod-python/src/orcapod/core/function_pod.py", line 633, in decorator packet_function = PythonPacketFunction( ^^^^^^^^^^^^^^^^^^^^^ File "/home/barnold/personal_repos/orcapod-python/src/orcapod/core/packet_function.py", line 359, in init self._output_schema_hash = semantic_hasher.hash_object( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/barnold/personal_repos/orcapod-python/src/orcapod/hashing/semantic_hashing/semantic_hasher.py", line 167, in hash_object expanded = self._expand_structure( ^^^^^^^^^^^^^^^^^^^^^^^ File "/home/barnold/personal_repos/orcapod-python/src/orcapod/hashing/semantic_hashing/semantic_hasher.py", line 253, in _expand_structure return self._expand_mapping(obj, _visited, resolver=resolver) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/barnold/personal_repos/orcapod-python/src/orcapod/hashing/semantic_hashing/semantic_hasher.py", line 313, in _expand_mapping items[str_key] = self._expand_element(v, _visited, resolver=resolver) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/barnold/personal_repos/orcapod-python/src/orcapod/hashing/semantic_hashing/semantic_hasher.py", line 301, in _expand_element return self.hash_object(obj, resolver=resolver).to_string() ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/barnold/personal_repos/orcapod-python/src/orcapod/hashing/semantic_hashing/semantic_hasher.py", line 198, in hash_object fallback = self._handle_unknown(obj) ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/barnold/personal_repos/orcapod-python/src/orcapod/hashing/semantic_hashing/semantic_hasher.py", line 380, in _handle_unknown raise TypeError( TypeError: BaseSemanticHasher (strict): no TypeHandlerProtocol registered for type 'types.GenericAlias' and it does not implement ContentIdentifiableProtocol. Register a TypeHandlerProtocol via the TypeHandlerRegistry or implement identity_structure() on the class.

codecov-commenter · 2026-03-06T01:37:02Z

⚠️ Please install the to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 84.21053% with 3 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
...capod/hashing/semantic_hashing/builtin_handlers.py	84.21%	3 Missing ⚠️

📢 Thoughts on this report? Let us know!

eywalker

Thanks for adding support to this! Could you add new tests that cover the GenericAliasHandler usage?

Copilot

Pull request overview

Adds support for hashing parameterized generic type annotations (e.g. dict[int, str]) by introducing and registering a new semantic hashing handler, and fixes a truthiness check in Python→Arrow conversion to avoid failures on containers with ambiguous boolean evaluation.

Changes:

Introduce GenericAliasHandler and register it for types.GenericAlias and (programmatically) typing._GenericAlias.
Update the default data context (v0.1.json) to register the new handler for types.GenericAlias.
Adjust Python→Arrow list/map converters to use value is not None rather than if value.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

File	Description
`src/orcapod/semantic_types/universal_converter.py`	Avoids truthiness checks when converting list/map values to Arrow-friendly structures.
`src/orcapod/hashing/semantic_hashing/builtin_handlers.py`	Adds and registers `GenericAliasHandler` for parameterized generic annotations.
`src/orcapod/contexts/data/v0.1.json`	Registers `GenericAliasHandler` in the versioned default context’s `TypeHandlerRegistry`.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-06T20:44:25Z

src/orcapod/hashing/semantic_hashing/builtin_handlers.py

+class GenericAliasHandler:
+    """
+    Handler for generic alias type annotations such as ``dict[int, list[int]]``
+    (``types.GenericAlias``) and ``typing`` generics (``typing._GenericAlias``).
+
+    Produces a stable dict containing the origin type and a list of hashed
+    argument types so that structurally identical generic annotations always
+    yield the same hash, and structurally different ones yield different hashes.
+    """
+
+    def handle(self, obj: Any, hasher: "SemanticHasherProtocol") -> Any:
+        origin = getattr(obj, "__origin__", None)
+        args = getattr(obj, "__args__", None) or ()
+        if origin is None:
+            return f"generic_alias:{obj!r}"
+        return {
+            "__type__": "generic_alias",
+            "origin": hasher.hash_object(origin).to_string(),
+            "args": [hasher.hash_object(arg).to_string() for arg in args],
+        }


There are extensive semantic hasher tests, but none appear to cover parameterized generics (e.g. dict[int, str], list[int], typing.Dict[int, str]). Adding a regression test that hashes these types in strict mode (and asserts stable equality/inequality across different args) would prevent this handler from silently breaking in future refactors.

Copilot · 2026-03-06T20:44:26Z

src/orcapod/semantic_types/universal_converter.py

        elif origin is list:
            element_converter = self.get_python_to_arrow_converter(args[0])
            return (
                lambda value: [element_converter(item) for item in value]
-                if value
+                if value is not None
                else []
            )

        elif origin is dict:
            key_converter = self.get_python_to_arrow_converter(args[0])
            value_converter = self.get_python_to_arrow_converter(args[1])
            return (
                lambda value: [
                    {"key": key_converter(k), "value": value_converter(v)}
                    for k, v in value.items()
                ]
-                if value
+                if value is not None
                else []


This change fixes truthiness checks for list/dict values, but there’s no regression test ensuring python_to_arrow conversion works for iterable containers with an ambiguous/invalid __bool__ (e.g. numpy arrays raise on if value). Consider adding a test using a small custom iterable whose __bool__ raises, and verifying conversion still succeeds when the value is empty/non-empty.

Copilot · 2026-03-06T20:44:26Z

src/orcapod/contexts/data/v0.1.json

                [{"_type": "types.BuiltinFunctionType"}, {"_class": "orcapod.hashing.semantic_hashing.builtin_handlers.FunctionHandler", "_config": {"function_info_extractor": {"_ref": "function_info_extractor"}}}],
                [{"_type": "types.MethodType"},          {"_class": "orcapod.hashing.semantic_hashing.builtin_handlers.FunctionHandler", "_config": {"function_info_extractor": {"_ref": "function_info_extractor"}}}],
                [{"_type": "builtins.type"},       {"_class": "orcapod.hashing.semantic_hashing.builtin_handlers.TypeObjectHandler",   "_config": {}}],
+                [{"_type": "types.GenericAlias"},  {"_class": "orcapod.hashing.semantic_hashing.builtin_handlers.GenericAliasHandler",  "_config": {}}],
                [{"_type": "pyarrow.Table"},        {"_class": "orcapod.hashing.semantic_hashing.builtin_handlers.ArrowTableHandler",  "_config": {"arrow_hasher": {"_ref": "arrow_hasher"}}}],
                [{"_type": "pyarrow.RecordBatch"},  {"_class": "orcapod.hashing.semantic_hashing.builtin_handlers.ArrowTableHandler",  "_config": {"arrow_hasher": {"_ref": "arrow_hasher"}}}]


type_handler_registry in this context only registers types.GenericAlias, but the PR description (and register_builtin_handlers) also registers typing._GenericAlias. Because the default semantic_hasher here uses this JSON-constructed registry, typing generics like typing.Dict[int, str] / typing.Optional[int] can still raise in strict mode. Consider adding a typing._GenericAlias entry (or switching this context to a registry implementation that conditionally registers it).

@brian-arnold this is a good point. Shall we register _GenericAlias as well?

Copilot · 2026-03-06T20:44:26Z

src/orcapod/hashing/semantic_hashing/builtin_handlers.py

    registry.register(type, TypeObjectHandler())

+    # generic alias type annotations: dict[int, str], list[str], etc.
+    import types as _types


register_builtin_handlers imports types as _types twice (once for functions, again for GenericAlias). This is redundant and makes the section a bit harder to read; you can reuse the existing _types import from the functions block instead of re-importing it.

Suggested change

import types as _types

…ping._SpecialForm objects (like typing.Union) that appear as __origin__ on typing generics - Registered typing._SpecialForm alongside typing._GenericAlias in register_builtin_handlers, and removed the duplicate import types as _types - Added typing._GenericAlias and typing._SpecialForm entries to v0.1.json so the JSON-constructed hasher also handles them correctly

brian-arnold · 2026-03-09T18:46:07Z

@eywalker I addressed all comments above — I've added a TestGenericAliasHandler test class with 11 tests.

I also needed to add a SpecialFormHandler for typing._SpecialForm (e.g. typing.Union), since Optional[int].origin is typing.Union, which was unregistered and caused strict-mode failures.

I also added typing._GenericAlias (to support e.g. Dict[int,str] as well as dict[int,str]) and typing._SpecialForm entries to v0.1.json so the JSON-constructed hasher handles them too.

All new and existing tests pass.

Copilot

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Copilot · 2026-03-09T18:52:55Z

src/orcapod/hashing/semantic_hashing/builtin_handlers.py

+    # generic alias type annotations: dict[int, str], list[str], etc.
+    generic_alias_handler = GenericAliasHandler()
+    registry.register(_types.GenericAlias, generic_alias_handler)
+    # typing._GenericAlias covers Optional[X], Union[X, Y], Dict[K, V], etc.
+    # typing._SpecialForm covers typing.Union, typing.ClassVar, etc. which
+    # appear as __origin__ on those generics (e.g. Optional[int].__origin__
+    # is typing.Union, a _SpecialForm).
+    try:
+        import typing as _typing
+
+        registry.register(_typing._GenericAlias, generic_alias_handler)  # type: ignore[attr-defined]
+        registry.register(_typing._SpecialForm, SpecialFormHandler())  # type: ignore[attr-defined]
+    except AttributeError:
+        pass


types.GenericAlias and typing._GenericAlias are now handled, but PEP 604 union annotations (instances of types.UnionType, e.g. int | None / int | str) are still unhandled in strict mode and will raise the same "no TypeHandlerProtocol registered" TypeError when they appear in user annotations or internal schema inference (e.g. default_type | None in semantic_types/pydata_utils.py). Consider adding a handler/registration for types.UnionType (and ensure it hashes equivalently to typing.Union[...] / typing.Optional[...] if you want those forms to be interchangeable).

Copilot · 2026-03-09T18:52:56Z

tests/test_hashing/test_semantic_hasher.py

+    def test_optional_hashed(self, hasher):
+        """Optional[int] (a typing._GenericAlias) produces a ContentHash."""
+        import typing
+
+        assert isinstance(hasher.hash_object(typing.Optional[int]), ContentHash)
+


The new tests cover typing.Optional[...] / typing.Union[...], but they don’t cover PEP 604 unions (int | None, int | str), which are common in Python 3.11+ and are used internally in semantic_types/pydata_utils.py (e.g. default_type | None). Adding at least one test that hasher.hash_object(int | None) succeeds (and ideally matches typing.Optional[int] if that equivalence is intended) would prevent regressions.

Copilot · 2026-03-09T18:52:56Z

tests/test_hashing/test_semantic_hasher.py

+        assert reg.has_handler(_types.GenericAlias)
+        assert reg.has_handler(_typing._GenericAlias)  # type: ignore[attr-defined]
+        assert reg.has_handler(_typing._SpecialForm)  # type: ignore[attr-defined]


Since this test is asserting that the default registry includes the new typing handlers, it would be good to also assert support for PEP 604 unions (types.UnionType) here if you add that handler. Otherwise the default-context registry can still be missing coverage for int | None annotations even though typing.Optional[...] works.

eywalker

@brian-arnold would you mind creating issues corresponding to my comments? When issues are created in Linear, I'm happy to merge this!

eywalker · 2026-03-09T20:53:41Z

src/orcapod/hashing/semantic_hashing/builtin_handlers.py

+
+    def handle(self, obj: Any, hasher: "SemanticHasherProtocol") -> Any:
+        name = getattr(obj, "_name", None) or repr(obj)
+        return f"special_form:typing.{name}"


For the case of Optional[int], I'd like this to hash to identical value as int|None as these two are semantically identical. I'm going to merge this PR but let's be sure to address this correctly when implementing for UnionType

eywalker · 2026-03-09T20:54:47Z

src/orcapod/hashing/semantic_hashing/builtin_handlers.py

+        if origin is None:
+            return f"generic_alias:{obj!r}"
+        return {
+            "__type__": "generic_alias",


I'm starting to think that it'd actually be better to consider how to handle both type and Generic alias in a uniform/consistent manner. Let's address this as a separate issue.

eywalker · 2026-03-09T20:56:05Z

src/orcapod/semantic_types/universal_converter.py

            return (
                lambda value: [element_converter(item) for item in value]
-                if value
+                if value is not None


I'm actually not sure about dropping the value if it's None. I think this ought to be correctly expressed?

Added GenericAliasHandler class

8c0b91a

brian-arnold requested a review from eywalker March 6, 2026 01:30

brian-arnold changed the title ~~Added GenericAliasHandler class~~ Added GenericAliasHandler class [ENG-226] Mar 6, 2026

eywalker requested changes Mar 6, 2026

View reviewed changes

eywalker requested a review from Copilot March 6, 2026 20:39

Copilot started reviewing on behalf of eywalker March 6, 2026 20:39 View session

Copilot AI reviewed Mar 6, 2026

View reviewed changes

brian-arnold requested review from Copilot and eywalker March 9, 2026 18:46

Copilot started reviewing on behalf of brian-arnold March 9, 2026 18:46 View session

Copilot AI reviewed Mar 9, 2026

View reviewed changes

eywalker approved these changes Mar 9, 2026

View reviewed changes

This was referenced Mar 10, 2026

Revisit handling of None as part of identity structure #78

Open

consider how to handle both type and Generic alias in a uniform/consistent manner #79

Open

eywalker merged commit 1b84eb2 into walkerlab:dev Mar 10, 2026
8 checks passed

Conversation

brian-arnold commented Mar 6, 2026

Uh oh!

codecov-commenter commented Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

eywalker left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

eywalker Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

brian-arnold commented Mar 9, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

eywalker left a comment

Choose a reason for hiding this comment

Uh oh!

eywalker Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

eywalker Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

eywalker Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

codecov-commenter commented Mar 6, 2026 •

edited

Loading