Skip to content

[PyTorch] Store Tensor explicitly in IValue #48824

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 13 commits into from
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
131 changes: 106 additions & 25 deletions aten/src/ATen/core/ivalue.h
Original file line number Diff line number Diff line change
Expand Up @@ -163,23 +163,61 @@ struct CAFFE2_API IValue final {
c10::raw::intrusive_ptr::incref(payload.as_intrusive_ptr);
}
}
IValue(IValue&& rhs) noexcept : IValue() {
swap(rhs);
IValue(IValue&& rhs) noexcept : tag(rhs.tag), is_intrusive_ptr(rhs.is_intrusive_ptr) {
if (isTensor()) {
new (&payload.as_tensor) at::Tensor(std::move(rhs.payload.as_tensor));
rhs.payload.as_tensor.~Tensor();
} else {
memcpy(&payload, &rhs.payload, sizeof(payload));
}
rhs.tag = Tag::None;
rhs.is_intrusive_ptr = false;
}

/// @private [doxygen private]
~IValue() {
if (is_intrusive_ptr) {
c10::raw::intrusive_ptr::decref(payload.as_intrusive_ptr);
} else if (isTensor()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we gain something by putting the isTensor check first? I would assume that Tensor objects are much more common than non-Tensor intrusive_ptr.

payload.as_tensor.~Tensor();
}
}
IValue& operator=(IValue&& rhs) & noexcept {
IValue(std::move(rhs)).swap(*this); // this also sets rhs to None

// Always-inline for performance -- this gets called frequently
// inside the core of the static runtime.
C10_ALWAYS_INLINE IValue& operator=(IValue&& rhs) & noexcept {
if (&rhs == this) {
return *this;
}

// Tear down our state.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we deduplicate this logic with the logic in the constructor by moving them into a destroy() method?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, but that's more work for the inliner to get right and these code paths are critical. I can try it.

if (isTensor()) {
payload.as_tensor.~Tensor();
} else if (is_intrusive_ptr) {
c10::raw::intrusive_ptr::decref(payload.as_intrusive_ptr);
}

if (rhs.isTensor()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and deduplicate this logic with the one from the move constructor?

new (&payload.as_tensor) at::Tensor(std::move(rhs.payload.as_tensor));
rhs.payload.as_tensor.~Tensor();
} else {
// No need to specially handle rhs being an intrusive_ptr -- we
// steal the reference.
memcpy(&payload, &rhs.payload, sizeof(payload));
}

tag = rhs.tag;
is_intrusive_ptr = rhs.is_intrusive_ptr;
rhs.tag = Tag::None;
rhs.is_intrusive_ptr = false;
return *this;
}

IValue& operator=(IValue const& rhs) & {
IValue(rhs).swap(*this);
return *this;
}

void dump() const;

/**
Expand Down Expand Up @@ -288,7 +326,19 @@ struct CAFFE2_API IValue final {

/// @private [doxygen private]
void swap(IValue& rhs) noexcept {
std::swap(payload, rhs.payload);
if (isTensor() && rhs.isTensor()) {
std::swap(payload.as_tensor, rhs.payload.as_tensor);
} else if (isTensor()) {
at::Tensor t = std::move(payload.as_tensor);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ugh this is more involved than I had hoped. I guess it's UB to just relocate the Tensor without destructing and constructing again?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, you have to construct it in rhs.payload using placement new or it's UB. Skiping the destructor call is legit, and I'll probably try that.

payload.as_tensor.~Tensor();
memcpy(&payload, &rhs.payload, sizeof(payload));
new (&rhs.payload.as_tensor) at::Tensor(std::move(t));
} else if (rhs.isTensor()) {
rhs.swap(*this);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is potentially slow because it needs to do the isTensor checks again (depending on how smart the compiler is with inlining this and proving that the extra branches are never executed). Not sure if relevant in practice, but if you want to optimize it, you could just move lines 332 to 335 into their own subfunction swapWithTensor(lhs, rhs) or something like that and call it from both the isTensor() and rhs.isTensor() case.

return;
} else {
std::swap(reinterpret_cast<char(&)[sizeof(payload)]>(*&payload), reinterpret_cast<char(&)[sizeof(payload)]>(*&rhs.payload));
}
std::swap(is_intrusive_ptr, rhs.is_intrusive_ptr);
std::swap(tag, rhs.tag);
}
Expand All @@ -297,21 +347,16 @@ struct CAFFE2_API IValue final {
// While some of these accessors could be generated through templates,
// we prefer to write them manually for clarity

IValue(at::Tensor t) : tag(Tag::Tensor), is_intrusive_ptr(t.defined()) {
// Note: the undefined tensor is not refcounted, so while it
// is tagged as a tensor, is_intrusive_ptr is set to false.
// This is not an optional optimization: our incref call
// *will not* do the right thing when called on an
// undefined tensor.
payload.as_intrusive_ptr = t.unsafeReleaseTensorImpl();
IValue(at::Tensor t) : tag(Tag::Tensor), is_intrusive_ptr(false) {
new (&payload.as_tensor) at::Tensor(std::move(t));
}
bool isTensor() const {
return Tag::Tensor == tag;
}
at::Tensor toTensor() &&;
at::Tensor toTensor() const&;
at::TensorImpl* unsafeToTensorImpl() const {
return static_cast<at::TensorImpl*>(payload.as_intrusive_ptr);
return payload.as_tensor.unsafeGetTensorImpl();
}

const IValue& toIValue() const {
Expand Down Expand Up @@ -565,7 +610,7 @@ struct CAFFE2_API IValue final {
c10::intrusive_ptr<ivalue::EnumHolder> toEnumHolder() const&;

// None
IValue() : payload{0}, tag(Tag::None), is_intrusive_ptr(false) {}
IValue() : tag(Tag::None), is_intrusive_ptr(false) {}
bool isNone() const {
return Tag::None == tag;
}
Expand Down Expand Up @@ -826,13 +871,23 @@ struct CAFFE2_API IValue final {
double as_double;
bool as_bool;
c10::intrusive_ptr_target* as_intrusive_ptr;
at::Tensor as_tensor;
struct {
DeviceType type;
DeviceIndex index;
} as_device;

Payload() : as_int(0) {}
~Payload() {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason you're user-defining the destructor? = default should do the trick and would not make the destructor user defined, or just keep it omitted as before.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unions with non-POD types in them are a pain. The destructor cannot be defined by default -- do you run ~Tensor() or not? So, we have to define it to do nothing.

};

IValue(Payload p, Tag t, bool i) : payload(p), tag(t), is_intrusive_ptr(i) {}
IValue(const Payload& p, Tag t, bool i) : tag(t), is_intrusive_ptr(i) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

even the largest Payload should be 64bit only and Payload has trivial copy/move constructors, so I would assume passing by value is better. Is passing by reference here related to the Itanium ABI thing you posted about?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Payload has trivial copy/move constructors

Not with Tensor in it -- do you run the Tensor copy/move constructors or not? It's not copyable.

if (isTensor()) {
new (&payload.as_tensor) at::Tensor(p.as_tensor);
} else {
memcpy(&payload, &p, sizeof(payload));
}
}

Payload payload;
Tag tag;
Expand All @@ -852,9 +907,14 @@ struct CAFFE2_API WeakIValue final {
}
}
WeakIValue(const IValue& rhs)
: payload(rhs.payload),
tag(rhs.tag),
: tag(rhs.tag),
is_intrusive_ptr(rhs.is_intrusive_ptr) {
if (rhs.isTensor()) {
payload.as_intrusive_ptr = rhs.unsafeToTensorImpl();
} else {
static_assert(sizeof(payload) == sizeof(rhs.payload), "IValue and WeakIValue payload sizes don't match!");
memcpy(&payload, &rhs.payload, sizeof(payload));
}
if (is_intrusive_ptr) {
c10::raw::weak_intrusive_ptr::incref(payload.as_intrusive_ptr);
}
Expand Down Expand Up @@ -888,17 +948,28 @@ struct CAFFE2_API WeakIValue final {

IValue lock() const {
if (!is_intrusive_ptr) {
return IValue(payload, tag, false);
IValue::Payload newPayload;
memcpy(&newPayload, &payload, sizeof(newPayload));
return IValue(newPayload, tag, false);
}
auto temp = c10::weak_intrusive_ptr<c10::intrusive_ptr_target>::reclaim(
payload.as_intrusive_ptr);
IValue::Payload pl;
pl.as_intrusive_ptr = temp.lock().release();
temp.release();
if (!pl.as_intrusive_ptr) {
return IValue();
if (IValue::Tag::Tensor == tag) {
auto ip = temp.lock().release();
if (!ip) {
return IValue();
} else {
return IValue(std::move(ip));
}
} else {
return IValue(pl, tag, true);
IValue::Payload pl;
pl.as_intrusive_ptr = temp.lock().release();
temp.release();
if (!pl.as_intrusive_ptr) {
return IValue();
} else {
return IValue(pl, tag, true);
}
}
}

Expand Down Expand Up @@ -928,7 +999,17 @@ struct CAFFE2_API WeakIValue final {
}

private:
IValue::Payload payload;
union Payload {
int64_t as_int;
double as_double;
bool as_bool;
c10::intrusive_ptr_target* as_intrusive_ptr;
struct {
DeviceType type;
DeviceIndex index;
} as_device;
};
Payload payload;
IValue::Tag tag;
bool is_intrusive_ptr;
};
Expand Down
14 changes: 11 additions & 3 deletions aten/src/ATen/core/ivalue_inl.h
Original file line number Diff line number Diff line change
Expand Up @@ -130,12 +130,20 @@ inline c10::intrusive_ptr<ivalue::EnumHolder> IValue::toEnumHolder() const& {
}
inline at::Tensor IValue::toTensor() && {
AT_ASSERT(isTensor(), "Expected Tensor but got ", tagKind());
return at::Tensor(
moveToIntrusivePtr<at::TensorImpl, at::UndefinedTensorImpl>());
auto result = std::move(payload.as_tensor);
// As far as I can tell, omitting the usual explicit destructor call
// is not UB in and of itself, and it's a slight perf win. The
// destructor is a no-op, because the moved-from Tensor is
// effectively an intrusive_ptr in the null state, so we don't need
// the behavior for correctness reasons either. Leaving this
// explanatory comment, including commented-out destructor call, to
// make this abundantly clear. payload.as_tensor.~Tensor();
clearToNone();
return result;
}
inline at::Tensor IValue::toTensor() const& {
AT_ASSERT(isTensor(), "Expected Tensor but got ", tagKind());
return at::Tensor(toIntrusivePtr<at::TensorImpl, at::UndefinedTensorImpl>());
return payload.as_tensor;
}
inline c10::Stream IValue::toStream() && {
return c10::Stream::unpack(payload.as_int);
Expand Down
66 changes: 66 additions & 0 deletions aten/src/ATen/test/ivalue_test.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,72 @@ TEST(IValueTest, Basic) {
ASSERT_EQ(tv.use_count(), 2);
}

TEST(IValueTest, Swap) {
at::Tensor tv1 = at::rand({3, 4});
IValue tensor(tv1), scalar(42);
// swap() has special cases depending on which side is tensor or
// not. Exercise all 4 combinations.
tensor.swap(scalar);
tensor.swap(scalar);
tensor.swap(tensor);
scalar.swap(scalar);
}

TEST(IValueTest, MoveConstruct) {
at::Tensor t1 = at::rand({3, 4});

{
IValue sourceTensor(t1);
IValue target(std::move(sourceTensor));
EXPECT_TRUE(target.toTensor().equal(t1));
EXPECT_TRUE(sourceTensor.isNone());
}

{
IValue sourceScalar(42);
IValue target(std::move(sourceScalar));
EXPECT_EQ(target, IValue(42));
EXPECT_TRUE(sourceScalar.isNone());
}
}

TEST(IValueTest, MoveAssign) {
at::Tensor tv1 = at::rand({3, 4});
at::Tensor tv2 = at::rand({3, 4});

// 1: tensor to tensor
{
IValue targetTensor(tv1), sourceTensor(tv2);
targetTensor = std::move(sourceTensor);
EXPECT_TRUE(targetTensor.toTensor().equal(tv2));
EXPECT_TRUE(sourceTensor.isNone());
}

// 2: tensor to scalar
{
IValue targetScalar(42), sourceTensor(tv1);
targetScalar = std::move(sourceTensor);
EXPECT_TRUE(targetScalar.toTensor().equal(tv1));
EXPECT_TRUE(sourceTensor.isNone());
}

// 3: scalar to tensor
{
IValue targetTensor(tv1), sourceScalar(42);
targetTensor = std::move(sourceScalar);
EXPECT_EQ(targetTensor, 42);
EXPECT_TRUE(sourceScalar.isNone());
}

// 4: scalar to scalar
{
IValue targetScalar(42), sourceScalar(43);
targetScalar = std::move(sourceScalar);
EXPECT_EQ(targetScalar, 43);
EXPECT_TRUE(sourceScalar.isNone());
}
}

TEST(IValueTest, Tuple) {
std::tuple<int64_t, at::Tensor> t = std::make_tuple(123, at::randn({1}));
auto iv = IValue(t);
Expand Down
8 changes: 8 additions & 0 deletions c10/macros/Macros.h
Original file line number Diff line number Diff line change
Expand Up @@ -186,6 +186,14 @@ namespace at { namespace cuda { using namespace c10::hip; }}
#define C10_NOINLINE
#endif

#if __has_attribute(always_inline) || defined(__GNUC__)
#define C10_ALWAYS_INLINE __attribute__((__always_inline__)) inline
#elif defined(_MSC_VER)
#define C10_ALWAYS_INLINE __forceinline
#else
#define C10_ALWAYS_INLINE inline
#endif

#include <sstream>
#include <string>

Expand Down