Skip to content

[PyTorch] Store Tensor explicitly in IValue #48824

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 13 commits into from

Conversation

swolchok
Copy link
Contributor

@swolchok swolchok commented Dec 4, 2020

Stack from ghstack:

Enables following diff, which will make toTensor() return
const Tensor& and allow callers to avoid refcounting overhead.

Differential Revision: D25324617

Enables following diff, which will make toTensor() return
`const Tensor&` and allow callers to avoid refcounting overhead.

Differential Revision: [D25324617](https://our.internmc.facebook.com/intern/diff/D25324617/)

[ghstack-poisoned]
swolchok added a commit that referenced this pull request Dec 4, 2020
Enables following diff, which will make toTensor() return
`const Tensor&` and allow callers to avoid refcounting overhead.

Differential Revision: [D25324617](https://our.internmc.facebook.com/intern/diff/D25324617/)

ghstack-source-id: 117841198
Pull Request resolved: #48824
@dr-ci
Copy link

dr-ci bot commented Dec 4, 2020

💊 CI failures summary and remediations

As of commit 3b45f26 (more details on the Dr. CI page):


  • 1/1 failures possibly* introduced in this PR
    • 1/1 non-CircleCI failure(s)

This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

This comment has been revised 94 times.

/// @private [doxygen private]
~IValue() {
if (is_intrusive_ptr) {
c10::raw::intrusive_ptr::decref(payload.as_intrusive_ptr);
} else if (isTensor()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we gain something by putting the isTensor check first? I would assume that Tensor objects are much more common than non-Tensor intrusive_ptr.

return *this;
}

// Tear down our state.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we deduplicate this logic with the logic in the constructor by moving them into a destroy() method?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, but that's more work for the inliner to get right and these code paths are critical. I can try it.

c10::raw::intrusive_ptr::decref(payload.as_intrusive_ptr);
}

if (rhs.isTensor()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and deduplicate this logic with the one from the move constructor?

if (isTensor() && rhs.isTensor()) {
std::swap(payload.as_tensor, rhs.payload.as_tensor);
} else if (isTensor()) {
at::Tensor t = std::move(payload.as_tensor);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ugh this is more involved than I had hoped. I guess it's UB to just relocate the Tensor without destructing and constructing again?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, you have to construct it in rhs.payload using placement new or it's UB. Skiping the destructor call is legit, and I'll probably try that.

memcpy(&payload, &rhs.payload, sizeof(payload));
new (&rhs.payload.as_tensor) at::Tensor(std::move(t));
} else if (rhs.isTensor()) {
rhs.swap(*this);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is potentially slow because it needs to do the isTensor checks again (depending on how smart the compiler is with inlining this and proving that the extra branches are never executed). Not sure if relevant in practice, but if you want to optimize it, you could just move lines 332 to 335 into their own subfunction swapWithTensor(lhs, rhs) or something like that and call it from both the isTensor() and rhs.isTensor() case.

struct {
DeviceType type;
DeviceIndex index;
} as_device;

Payload() : as_int(0) {}
~Payload() {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason you're user-defining the destructor? = default should do the trick and would not make the destructor user defined, or just keep it omitted as before.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unions with non-POD types in them are a pain. The destructor cannot be defined by default -- do you run ~Tensor() or not? So, we have to define it to do nothing.

};

IValue(Payload p, Tag t, bool i) : payload(p), tag(t), is_intrusive_ptr(i) {}
IValue(const Payload& p, Tag t, bool i) : tag(t), is_intrusive_ptr(i) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

even the largest Payload should be 64bit only and Payload has trivial copy/move constructors, so I would assume passing by value is better. Is passing by reference here related to the Itanium ABI thing you posted about?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Payload has trivial copy/move constructors

Not with Tensor in it -- do you run the Tensor copy/move constructors or not? It's not copyable.

Enables following diff, which will make toTensor() return
`const Tensor&` and allow callers to avoid refcounting overhead.

Differential Revision: [D25324617](https://our.internmc.facebook.com/intern/diff/D25324617/)

[ghstack-poisoned]
swolchok added a commit that referenced this pull request Dec 4, 2020
Pull Request resolved: #48824

Enables following diff, which will make toTensor() return
`const Tensor&` and allow callers to avoid refcounting overhead.
ghstack-source-id: 117906329

Differential Revision: [D25324617](https://our.internmc.facebook.com/intern/diff/D25324617/)
… IValue"

Enables following diff, which will make toTensor() return
`const Tensor&` and allow callers to avoid refcounting overhead.

Differential Revision: [D25324617](https://our.internmc.facebook.com/intern/diff/D25324617/)

[ghstack-poisoned]
Enables following diff, which will make toTensor() return
`const Tensor&` and allow callers to avoid refcounting overhead.

Differential Revision: [D25324617](https://our.internmc.facebook.com/intern/diff/D25324617/)

[ghstack-poisoned]
Enables following diff, which will make toTensor() return
`const Tensor&` and allow callers to avoid refcounting overhead.

Differential Revision: [D25324617](https://our.internmc.facebook.com/intern/diff/D25324617/)

[ghstack-poisoned]
@swolchok swolchok requested a review from smessmer December 15, 2020 22:37
@swolchok
Copy link
Contributor Author

@smessmer could you take another look? I've had to change the approach to improve performance.

… explicitly in IValue"

Enables following diff, which will make toTensor() return
`const Tensor&` and allow callers to avoid refcounting overhead.

Differential Revision: [D25324617](https://our.internmc.facebook.com/intern/diff/D25324617/)

[ghstack-poisoned]
Enables following diff, which will make toTensor() return
`const Tensor&` and allow callers to avoid refcounting overhead.

Differential Revision: [D25324617](https://our.internmc.facebook.com/intern/diff/D25324617/)

[ghstack-poisoned]
… IValue"

Enables following diff, which will make toTensor() return
`const Tensor&` and allow callers to avoid refcounting overhead.

Differential Revision: [D25324617](https://our.internmc.facebook.com/intern/diff/D25324617/)

[ghstack-poisoned]
Enables following diff, which will make toTensor() return
`const Tensor&` and allow callers to avoid refcounting overhead.

Differential Revision: [D25324617](https://our.internmc.facebook.com/intern/diff/D25324617/)

[ghstack-poisoned]
swolchok added a commit that referenced this pull request Dec 19, 2020
Pull Request resolved: #48824

Enables following diff, which will make toTensor() return
`const Tensor&` and allow callers to avoid refcounting overhead.
ghstack-source-id: 118955313

Differential Revision: [D25324617](https://our.internmc.facebook.com/intern/diff/D25324617/)
…IValue"

Enables following diff, which will make toTensor() return
`const Tensor&` and allow callers to avoid refcounting overhead.

Differential Revision: [D25324617](https://our.internmc.facebook.com/intern/diff/D25324617/)

[ghstack-poisoned]
Enables following diff, which will make toTensor() return
`const Tensor&` and allow callers to avoid refcounting overhead.

Differential Revision: [D25324617](https://our.internmc.facebook.com/intern/diff/D25324617/)

[ghstack-poisoned]
Enables following diff, which will make toTensor() return
`const Tensor&` and allow callers to avoid refcounting overhead.

Differential Revision: [D25324617](https://our.internmc.facebook.com/intern/diff/D25324617/)

[ghstack-poisoned]
Enables following diff, which will make toTensor() return
`const Tensor&` and allow callers to avoid refcounting overhead.

Differential Revision: [D25324617](https://our.internmc.facebook.com/intern/diff/D25324617/)

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

This pull request has been merged in 1b31e13.

@facebook-github-bot facebook-github-bot deleted the gh/swolchok/30/head branch January 10, 2021 15:17
hwangdeyu pushed a commit to hwangdeyu/pytorch that referenced this pull request Jan 14, 2021
Summary:
Pull Request resolved: pytorch#48824

Enables following diff, which will make toTensor() return
`const Tensor&` and allow callers to avoid refcounting overhead.
ghstack-source-id: 119327370

Test Plan:
ivalue_test

Internal benchmark to ensure perf parity. Some interesting steps
during the debugging process:

- First version was about a 5% regression
- Directly implementing move construction instead of using swap
  lowered the regression to 2-3%
- Directly implementing move assign was maybe an 0.5% improvement
- Adding C10_ALWAYS_INLINE on move assign got our regression to
  negligible
- Fixing toTensor() to actually be correct regressed us again, but
  omitting the explicit dtor call as exhaustively spelled out in a
  comment fixed it.

Reviewed By: bwasti

Differential Revision: D25324617

fbshipit-source-id: 7518c1c67f6f2661f151b43310aaddf4fb6e511a
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants