-
Notifications
You must be signed in to change notification settings - Fork 24.7k
[PyTorch] Store Tensor explicitly in IValue #48824
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Enables following diff, which will make toTensor() return `const Tensor&` and allow callers to avoid refcounting overhead. Differential Revision: [D25324617](https://our.internmc.facebook.com/intern/diff/D25324617/) [ghstack-poisoned]
Enables following diff, which will make toTensor() return `const Tensor&` and allow callers to avoid refcounting overhead. Differential Revision: [D25324617](https://our.internmc.facebook.com/intern/diff/D25324617/) ghstack-source-id: 117841198 Pull Request resolved: #48824
💊 CI failures summary and remediationsAs of commit 3b45f26 (more details on the Dr. CI page):
This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.Please report bugs/suggestions to the (internal) Dr. CI Users group. This comment has been revised 94 times. |
aten/src/ATen/core/ivalue.h
Outdated
/// @private [doxygen private] | ||
~IValue() { | ||
if (is_intrusive_ptr) { | ||
c10::raw::intrusive_ptr::decref(payload.as_intrusive_ptr); | ||
} else if (isTensor()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we gain something by putting the isTensor
check first? I would assume that Tensor
objects are much more common than non-Tensor intrusive_ptr
.
aten/src/ATen/core/ivalue.h
Outdated
return *this; | ||
} | ||
|
||
// Tear down our state. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we deduplicate this logic with the logic in the constructor by moving them into a destroy()
method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, but that's more work for the inliner to get right and these code paths are critical. I can try it.
aten/src/ATen/core/ivalue.h
Outdated
c10::raw::intrusive_ptr::decref(payload.as_intrusive_ptr); | ||
} | ||
|
||
if (rhs.isTensor()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and deduplicate this logic with the one from the move constructor?
if (isTensor() && rhs.isTensor()) { | ||
std::swap(payload.as_tensor, rhs.payload.as_tensor); | ||
} else if (isTensor()) { | ||
at::Tensor t = std::move(payload.as_tensor); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ugh this is more involved than I had hoped. I guess it's UB to just relocate the Tensor without destructing and constructing again?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC, you have to construct it in rhs.payload using placement new or it's UB. Skiping the destructor call is legit, and I'll probably try that.
memcpy(&payload, &rhs.payload, sizeof(payload)); | ||
new (&rhs.payload.as_tensor) at::Tensor(std::move(t)); | ||
} else if (rhs.isTensor()) { | ||
rhs.swap(*this); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is potentially slow because it needs to do the isTensor
checks again (depending on how smart the compiler is with inlining this and proving that the extra branches are never executed). Not sure if relevant in practice, but if you want to optimize it, you could just move lines 332 to 335 into their own subfunction swapWithTensor(lhs, rhs)
or something like that and call it from both the isTensor()
and rhs.isTensor()
case.
struct { | ||
DeviceType type; | ||
DeviceIndex index; | ||
} as_device; | ||
|
||
Payload() : as_int(0) {} | ||
~Payload() {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason you're user-defining the destructor? = default
should do the trick and would not make the destructor user defined, or just keep it omitted as before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unions with non-POD types in them are a pain. The destructor cannot be defined by default -- do you run ~Tensor() or not? So, we have to define it to do nothing.
}; | ||
|
||
IValue(Payload p, Tag t, bool i) : payload(p), tag(t), is_intrusive_ptr(i) {} | ||
IValue(const Payload& p, Tag t, bool i) : tag(t), is_intrusive_ptr(i) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
even the largest Payload should be 64bit only and Payload has trivial copy/move constructors, so I would assume passing by value is better. Is passing by reference here related to the Itanium ABI thing you posted about?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Payload has trivial copy/move constructors
Not with Tensor
in it -- do you run the Tensor copy/move constructors or not? It's not copyable.
Enables following diff, which will make toTensor() return `const Tensor&` and allow callers to avoid refcounting overhead. Differential Revision: [D25324617](https://our.internmc.facebook.com/intern/diff/D25324617/) [ghstack-poisoned]
Pull Request resolved: #48824 Enables following diff, which will make toTensor() return `const Tensor&` and allow callers to avoid refcounting overhead. ghstack-source-id: 117906329 Differential Revision: [D25324617](https://our.internmc.facebook.com/intern/diff/D25324617/)
… IValue" Enables following diff, which will make toTensor() return `const Tensor&` and allow callers to avoid refcounting overhead. Differential Revision: [D25324617](https://our.internmc.facebook.com/intern/diff/D25324617/) [ghstack-poisoned]
Enables following diff, which will make toTensor() return `const Tensor&` and allow callers to avoid refcounting overhead. Differential Revision: [D25324617](https://our.internmc.facebook.com/intern/diff/D25324617/) [ghstack-poisoned]
Enables following diff, which will make toTensor() return `const Tensor&` and allow callers to avoid refcounting overhead. Differential Revision: [D25324617](https://our.internmc.facebook.com/intern/diff/D25324617/) [ghstack-poisoned]
@smessmer could you take another look? I've had to change the approach to improve performance. |
… explicitly in IValue" Enables following diff, which will make toTensor() return `const Tensor&` and allow callers to avoid refcounting overhead. Differential Revision: [D25324617](https://our.internmc.facebook.com/intern/diff/D25324617/) [ghstack-poisoned]
Enables following diff, which will make toTensor() return `const Tensor&` and allow callers to avoid refcounting overhead. Differential Revision: [D25324617](https://our.internmc.facebook.com/intern/diff/D25324617/) [ghstack-poisoned]
… IValue" Enables following diff, which will make toTensor() return `const Tensor&` and allow callers to avoid refcounting overhead. Differential Revision: [D25324617](https://our.internmc.facebook.com/intern/diff/D25324617/) [ghstack-poisoned]
Enables following diff, which will make toTensor() return `const Tensor&` and allow callers to avoid refcounting overhead. Differential Revision: [D25324617](https://our.internmc.facebook.com/intern/diff/D25324617/) [ghstack-poisoned]
Pull Request resolved: #48824 Enables following diff, which will make toTensor() return `const Tensor&` and allow callers to avoid refcounting overhead. ghstack-source-id: 118955313 Differential Revision: [D25324617](https://our.internmc.facebook.com/intern/diff/D25324617/)
…IValue" Enables following diff, which will make toTensor() return `const Tensor&` and allow callers to avoid refcounting overhead. Differential Revision: [D25324617](https://our.internmc.facebook.com/intern/diff/D25324617/) [ghstack-poisoned]
Enables following diff, which will make toTensor() return `const Tensor&` and allow callers to avoid refcounting overhead. Differential Revision: [D25324617](https://our.internmc.facebook.com/intern/diff/D25324617/) [ghstack-poisoned]
Enables following diff, which will make toTensor() return `const Tensor&` and allow callers to avoid refcounting overhead. Differential Revision: [D25324617](https://our.internmc.facebook.com/intern/diff/D25324617/) [ghstack-poisoned]
Enables following diff, which will make toTensor() return `const Tensor&` and allow callers to avoid refcounting overhead. Differential Revision: [D25324617](https://our.internmc.facebook.com/intern/diff/D25324617/) [ghstack-poisoned]
This pull request has been merged in 1b31e13. |
Summary: Pull Request resolved: pytorch#48824 Enables following diff, which will make toTensor() return `const Tensor&` and allow callers to avoid refcounting overhead. ghstack-source-id: 119327370 Test Plan: ivalue_test Internal benchmark to ensure perf parity. Some interesting steps during the debugging process: - First version was about a 5% regression - Directly implementing move construction instead of using swap lowered the regression to 2-3% - Directly implementing move assign was maybe an 0.5% improvement - Adding C10_ALWAYS_INLINE on move assign got our regression to negligible - Fixing toTensor() to actually be correct regressed us again, but omitting the explicit dtor call as exhaustively spelled out in a comment fixed it. Reviewed By: bwasti Differential Revision: D25324617 fbshipit-source-id: 7518c1c67f6f2661f151b43310aaddf4fb6e511a
Stack from ghstack:
Enables following diff, which will make toTensor() return
const Tensor&
and allow callers to avoid refcounting overhead.Differential Revision: D25324617