Skip to content

bpo-28053: Allow custom reducer when using multiprocessing#15058

Open
pierreglaser wants to merge 32 commits intopython:mainfrom
pierreglaser:inject-custom-context-into-process-cls
Open

bpo-28053: Allow custom reducer when using multiprocessing#15058
pierreglaser wants to merge 32 commits intopython:mainfrom
pierreglaser:inject-custom-context-into-process-cls

Conversation

@pierreglaser
Copy link
Contributor

@pierreglaser pierreglaser commented Jul 31, 2019

This PR is a follow up on #9959. It implements the suggestions I made here #9959 (comment).

I'm making it a new PR to get the CI working now and gather separate feedback.

  • pass the current context to Process objects
  • split AbstractPickler into AbstractPickler and AbstractUnpickler
  • clean everything up (still WIP)

https://bugs.python.org/issue28053

@pierreglaser
Copy link
Contributor Author

You will notice that for the Process class, I chose to pass the reducer (which is the only bit of information we need from the context for now) to the Process class at construction instead of the Context

    def process(self, *args, **kwargs):
        return self._Process(*args, reducer=self.get_reducer(), **kwargs)

This comes from the fact that theProcess is actually relying on the default context to be properly defined:

class Process(process.BaseProcess):
_start_method = None
@staticmethod
def _Popen(process_obj):
return _default_context.get_context().Process._Popen(process_obj)

For the process class, passing the context object a Process bind to would require changing it to something like:

class Process(process.BaseProcess):
    _start_method = None
    @staticmethod
    def _Popen(process_obj):
          if self._ctx is not None:
              return self._ctx.get_context()._Process._Popen(process_obj)
          return _default_context.get_context()._Process._Popen(process_obj)

with _Process being a Process class specified (or not, in this case the default is Process) by the user. You can see the infinite recursion here as if _Process is not specified or set to Process, self._ctx.get_context()._Process._Popen is Process._Popen... To prevent this situation we would need to add a bunch of guards, and this ends up being not so trivial.

So implementation-wise, it is easier to simply pass the reducer in BaseContext.process, and not the whole context. Also, one noticeable consequence is that the reducer in Process is static, as it is not called using something like self._ctx.get_reducer() as it is done elsewhere.

But I am opened to suggestions.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The checks below aren't strict enough; they pass for a .register()ed subclass of Abstract<Reducer|Pickler|Unpickler>, but the code won't actually work with such a class due to depending on undocumented APIs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not understand what you mean by "Undocumented API". This PR comes with documentation for the AbstractReducer, AbstractPickler, and AbstractUnpickler.

Copy link

@pppery pppery Aug 7, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I mean is that the documentation suggests that the below code should work:

class DummyReducer:
    def get_pickler_class():
        return pickle.Pickler
    def get_unpickler_class():
        return pickle.Unpickler
AbstractReducer.register(DummyReducer)
multiprocessing.set_reducer(DummyReducer())

However, in practice, it appears that running that code (and then doing something that uses reduction) will produce a (rather cryptic) error message in some cases. Either the documentation should be clear that virtual subclasses don't work and the code should raise an error when a virtual subclass is passed, or the code should be updated to handle virtual subclasses.

Copy link
Contributor Author

@pierreglaser pierreglaser Aug 7, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I mean is that the documentation suggests that the below code should work

Not really: the get_pickler_class API documentation specifies that get_pickler_class must return anAbstractPickler subclass, but your DummyReducer (which itself, should be an AbstractReducer subclass) simply returns a Pickler, which is not an AbstractPickler subclass. Or maybe I misunderstood your message?

Copy link

@pppery pppery Aug 7, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You didn't misunderstand anything, my example was accidentally bogus, and I misunderstood the code. The actual bug I was trying to point out appears not to exist with reducers, but instead with picklers.

class DummyReducer(AbstractReducer):
    def get_pickler_class():
        return DummyPickler
class DummyPickler(pickle.Pickler):
    pass
AbstractPickler.register(DummyPickler)
multiprocessing.set_reducer(DummyReducer())

This seemingly harmless code appears to have the effect of breaking the passing of all of the custom types that multiprocessing registers, in an entirely non-intuitive way, which I think should either be fixed or documented.

@pierreglaser
Copy link
Contributor Author

@pitrou what could I do to ease the review process of this PR?

@pitrou
Copy link
Member

pitrou commented Aug 7, 2019

@pierreglaser The first necessary condition is for me to find some time... sorry :-/

@pierreglaser
Copy link
Contributor Author

@pitrou no worries :)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

None of the attributes defined between here and __init__ appear to be used.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as below.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This property does not appear to be used.

Copy link
Contributor Author

@pierreglaser pierreglaser Feb 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is work taken from @pablogsal previous commits (see #9959). I assume there was a good reason for this, but he'll probably know better than me.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method (a) doesn't appear to be used and (b) doesn't necessarily work, because nothing requires the get_pickler_class to define a register method.

Copy link
Contributor Author

@pierreglaser pierreglaser Feb 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding (b), this is stated in the documentation that get_pickler_class should return a subclass of an AbstractPickler, that itself (as a subclass of ForkingPickler), implements a register method. Regarding (a), same comment as above (this @pablogsal's work), but I'll think about whether this function can be removed.

@pierreglaser
Copy link
Contributor Author

pierreglaser commented Feb 2, 2020

@pablogsal friendly ping: I am curious to know what are your thoughts on this.

@@ -0,0 +1 @@
Fix the implementation of custom reducers for in `multiprocessing`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'for in' Either a word is missing or there is an extra word.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me, it looks like "for" is the extra word. I'd personally go with:

Suggested change
Fix the implementation of custom reducers for in `multiprocessing`.
Fix the implementation of custom reducers in :mod:`multiprocessing`.

(Also adds the sphinx role :mod: to provide an inline link to the module. It's optional for Misc/NEWS.d, but we may as well include it with the typo fix. This makes it easier to move over to the "What's New" doc later.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also note that `multiprocessing` will cause make check to fail: it should either be :mod:`multiprocessing` (as suggested by @aeros) or ``multiprocessing``.

@terryjreedy
Copy link
Member

(sorry, failed rebasing attempt)

Rebasing is, as evidenced by repeated failures like this, error prone. Besides the obvious noise, removing the spurious review requests does not unsubscribe people.

If you were just trying to update your branch to your current repository/fork master (once synchronized), rebasing is also not needed. After checking out the branch, git pull origin/master will, if there are no conflicts, pull, merge, and commit updates to the branch with a standard commit message.

@tiran tiran removed their request for review April 17, 2021 21:05
@rhettinger rhettinger removed their request for review May 3, 2022 06:31
Copy link
Contributor

@MaxwellDupre MaxwellDupre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Run multiprocessing tests:
Ran 369 tests in 215.720s
OK (skipped=35)
Ran 39 tests in 22.709s
OK
Ran 369 tests in 85.400s
OK (skipped=39)
Ran 369 tests in 99.032s
OK (skipped=32)
Look ok to me.

Gets the current reduction class in use by the module's primitive internals.

For example, substituting the reducer class to use the :mod:`pickle` protocol
version 2 to be able to communicate with a Python 2.x programs.::
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please change documentation and examples not to reference Python 2 as that long EOL at this point. Just say it in the abstract "communicate with processes using older Python versions" and offer some explanation as to how this is even possible by linking to the relevant APIs in multiprocessing that enable people to create such unexpected monsters. (see #67592 (comment))

Most multiprocessing users view it as a module with a single parent process controlling identical by definition Python version child processes that it launches and controls.

.. method:: set_reducer(reduction)

Sets a reduction instance to be used for serialization and deserialization
by the module primitive internals. **reduction** must be an instance of a
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
by the module primitive internals. **reduction** must be an instance of a
by the module primitive internals. *reduction* must be an instance of a

Arguments should be marked in italic: https://devguide.python.org/documenting/#rest-inline-markup


.. method:: dump(obj)

Write a pickled representation of obj to the open file.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Write a pickled representation of obj to the open file.
Write a pickled representation of *obj* to the open file.

Base class providing access to custom ``Pickler`` and ``Unpickler``
classes to be used by ``multiprocessing`` when serializing objects.

.. method:: get_pickler_class():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
.. method:: get_pickler_class():
.. method:: get_pickler_class()

This method must return an subclass of :class:`AbstractPickler` to be used
by ``multiprocessing`` when serializing objects.

.. method:: get_unpickler_class():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
.. method:: get_unpickler_class():
.. method:: get_unpickler_class()

@@ -0,0 +1 @@
Fix the implementation of custom reducers for in `multiprocessing`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also note that `multiprocessing` will cause make check to fail: it should either be :mod:`multiprocessing` (as suggested by @aeros) or ``multiprocessing``.

Comment on lines +1199 to +1203
.. currentmodule:: multiprocessing

Several primitives of the :mod:`multiprocessing` module such as
:class:`multiprocessing.Queue`, :class:`multiprocessing.connection.Listener` or
:class:`multiprocessing.connection.Server` need to serialize and deserialize Python
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aren't either the currentmodule directive or the multiprocessing.*s in the roles redundant in this part?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.