Skip to content

[PyTorch] RFC: Add tuple inline storage #64066

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 50 commits into from
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
50 commits
Select commit Hold shift + click to select a range
5f06e2f
[PyTorch] RFC: Add tuple inline storage
swolchok Aug 26, 2021
2e3fb4b
Update on "[PyTorch] RFC: Add tuple inline storage"
swolchok Aug 27, 2021
8496eb9
Update on "[PyTorch] RFC: Add tuple inline storage"
swolchok Aug 27, 2021
a2726bc
Update on "[PyTorch] RFC: Add tuple inline storage"
swolchok Aug 27, 2021
2451f9f
Update on "[PyTorch] RFC: Add tuple inline storage"
swolchok Aug 27, 2021
9dcf2ae
Update on "[PyTorch] RFC: Add tuple inline storage"
swolchok Aug 30, 2021
a7d23fd
Update on "[PyTorch] RFC: Add tuple inline storage"
swolchok Aug 31, 2021
ca08c5f
Update on "[PyTorch] RFC: Add tuple inline storage"
swolchok Aug 31, 2021
8fce745
Update on "[PyTorch] RFC: Add tuple inline storage"
swolchok Sep 3, 2021
031bc4a
Update on "[PyTorch] RFC: Add tuple inline storage"
swolchok Sep 7, 2021
8a7ef90
Update on "[PyTorch] RFC: Add tuple inline storage"
swolchok Sep 8, 2021
e03440d
Update on "[PyTorch] RFC: Add tuple inline storage"
swolchok Sep 9, 2021
757281a
Update on "[PyTorch] RFC: Add tuple inline storage"
swolchok Sep 9, 2021
9a1cc85
Update on "[PyTorch] RFC: Add tuple inline storage"
swolchok Sep 9, 2021
27eca8b
Update on "[PyTorch] RFC: Add tuple inline storage"
swolchok Sep 9, 2021
b4c2fd4
Update on "[PyTorch] RFC: Add tuple inline storage"
swolchok Sep 13, 2021
9e66764
Update on "[PyTorch] RFC: Add tuple inline storage"
swolchok Sep 14, 2021
a5e5bfe
Update on "[PyTorch] RFC: Add tuple inline storage"
swolchok Sep 15, 2021
1475ad1
Update on "[PyTorch] RFC: Add tuple inline storage"
swolchok Sep 16, 2021
436ccc5
Update on "[PyTorch] RFC: Add tuple inline storage"
swolchok Sep 20, 2021
305c8a1
Update on "[PyTorch] RFC: Add tuple inline storage"
swolchok Sep 20, 2021
edad395
Update on "[PyTorch] RFC: Add tuple inline storage"
swolchok Sep 20, 2021
a010999
Update on "[PyTorch] RFC: Add tuple inline storage"
swolchok Sep 21, 2021
0719493
Update on "[PyTorch] RFC: Add tuple inline storage"
swolchok Sep 21, 2021
5df6b5b
Update on "[PyTorch] RFC: Add tuple inline storage"
swolchok Sep 22, 2021
8c71367
Update on "[PyTorch] RFC: Add tuple inline storage"
swolchok Sep 22, 2021
23264d8
Update on "[PyTorch] RFC: Add tuple inline storage"
swolchok Sep 22, 2021
51f9a65
Update on "[PyTorch] RFC: Add tuple inline storage"
swolchok Sep 22, 2021
9232e2e
Update on "[PyTorch] RFC: Add tuple inline storage"
swolchok Sep 23, 2021
5f4f9f8
Update on "[PyTorch] RFC: Add tuple inline storage"
swolchok Sep 23, 2021
dcf699e
Update on "[PyTorch] RFC: Add tuple inline storage"
swolchok Sep 23, 2021
121a192
Update on "[PyTorch] RFC: Add tuple inline storage"
swolchok Sep 24, 2021
d897b89
Update on "[PyTorch] RFC: Add tuple inline storage"
swolchok Sep 27, 2021
d82cd75
Update on "[PyTorch] RFC: Add tuple inline storage"
swolchok Sep 29, 2021
09ee5d8
Update on "[PyTorch] RFC: Add tuple inline storage"
swolchok Sep 30, 2021
4fddc8d
Update on "[PyTorch] RFC: Add tuple inline storage"
swolchok Sep 30, 2021
0ce2d6d
Update on "[PyTorch] RFC: Add tuple inline storage"
swolchok Sep 30, 2021
59e3375
Update on "[PyTorch] RFC: Add tuple inline storage"
swolchok Oct 5, 2021
7e17dd6
Update on "[PyTorch] RFC: Add tuple inline storage"
swolchok Oct 6, 2021
cf1d072
Update on "[PyTorch] RFC: Add tuple inline storage"
swolchok Oct 8, 2021
a66f9a9
Update on "[PyTorch] RFC: Add tuple inline storage"
swolchok Oct 8, 2021
adfdb14
Update on "[PyTorch] RFC: Add tuple inline storage"
swolchok Oct 8, 2021
161a240
Update on "[PyTorch] RFC: Add tuple inline storage"
swolchok Oct 11, 2021
236cf76
give up on making TupleElements noncopyable on "[PyTorch] RFC: Add tu…
swolchok Oct 11, 2021
be384f7
Update on "[PyTorch] RFC: Add tuple inline storage"
swolchok Oct 12, 2021
5b55485
Update on "[PyTorch] RFC: Add tuple inline storage"
swolchok Oct 12, 2021
88f8ce9
Update on "[PyTorch] RFC: Add tuple inline storage"
swolchok Oct 13, 2021
eadc443
Update on "[PyTorch] RFC: Add tuple inline storage"
swolchok Oct 14, 2021
4d1ef84
Update on "[PyTorch] RFC: Add tuple inline storage"
swolchok Oct 14, 2021
b58e12f
Update on "[PyTorch] RFC: Add tuple inline storage"
swolchok Oct 15, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Next Next commit
[PyTorch] RFC: Add tuple inline storage
I noticed a bunch of time being spent heap-allocating Tuples
in the unpickler. 1-, 2-, and 3-element Tuples are apparently common
enough that they get their own bytecode instructions, so I decided to
try also giving them their own representation. We store up to 3
IValues inline in `Tuple` rather than doing a second heap allocation
for a `std::vector<IValue>`.

Differential Revision: [D30592622](https://our.internmc.facebook.com/intern/diff/D30592622/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D30592622/)!

[ghstack-poisoned]
  • Loading branch information
swolchok committed Aug 26, 2021
commit 5f06e2fb9ac9067acf8a7aecaae3cd8c592ed642
10 changes: 5 additions & 5 deletions aten/src/ATen/core/ivalue.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -51,19 +51,19 @@ TORCH_API c10::intrusive_ptr<ConstantString> ConstantString::create(
}

bool operator==(const ivalue::Tuple& lhs, const ivalue::Tuple& rhs) {
return lhs.elements_.size() == rhs.elements_.size() &&
return lhs.size() == rhs.size() &&
// see [container equality]
std::equal(
lhs.elements_.cbegin(),
lhs.elements_.cend(),
rhs.elements_.cbegin(),
lhs.elements().cbegin(),
lhs.elements().cend(),
rhs.elements().cbegin(),
_fastEqualsForContainer);
}

TupleTypePtr Tuple::type() const {
if (!type_) {
type_ = TupleType::create(
fmap(elements_, [&](const IValue& v) { return v.type(); }));
fmap(elements(), [&](const IValue& v) { return v.type(); }));
}
return type_;
}
Expand Down
266 changes: 262 additions & 4 deletions aten/src/ATen/core/ivalue_inl.h
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@
#include <c10/core/TensorImpl.h>
#include <c10/core/UndefinedTensorImpl.h>
#include <c10/util/intrusive_ptr.h>
#include <c10/util/irange.h>
#include <c10/util/hash.h>

namespace torch {
Expand Down Expand Up @@ -249,9 +250,227 @@ struct TORCH_API ConstantString final : c10::intrusive_ptr_target {

struct Future;

// XXX: must add test coverage!
struct TORCH_API TupleElements {
private:
size_t inlineSize_;
union {
std::vector<IValue> elementsVector_;
IValue elementsInline_[3];
};

void destroyInline() {
for (const auto ii : c10::irange(inlineSize_)) {
elementsInline_[ii].~IValue();
}
}
public:

using iterator = IValue*;
using const_iterator = const IValue*;

TupleElements() : inlineSize_(0) {
new (&elementsVector_) std::vector<IValue>();
}

explicit TupleElements(std::vector<IValue> elements)
: inlineSize_(0), elementsVector_(std::move(elements)) {}

explicit TupleElements(IValue&& e1)
: inlineSize_(1) {
new (&elementsInline_[0]) IValue(std::move(e1));
}

explicit TupleElements(IValue&& e1, IValue&& e2)
: inlineSize_(2) {
new (&elementsInline_[0]) IValue(std::move(e1));
new (&elementsInline_[1]) IValue(std::move(e2));
}

explicit TupleElements(IValue&& e1, IValue&& e2, IValue&& e3)
: inlineSize_(3) {
new (&elementsInline_[0]) IValue(std::move(e1));
new (&elementsInline_[1]) IValue(std::move(e2));
new (&elementsInline_[2]) IValue(std::move(e3));
}

~TupleElements() {
if (inlineSize_) {
destroyInline();
} else {
elementsVector_.~vector();
}
}

// Simply not implemented; no particular reason not to implement in
// the future except that it seems unnecessary.
TupleElements(const TupleElements&) = delete;
TupleElements& operator=(const TupleElements&) = delete;

TupleElements(TupleElements&& rhs) noexcept
: inlineSize_(rhs.inlineSize_) {
if (inlineSize_) {
for (const auto ii : c10::irange(inlineSize_)) {
new (&elementsInline_[ii]) IValue(std::move(rhs.elementsInline_[ii]));
}
} else {
new (&elementsVector_) std::vector<IValue>(std::move(rhs.elementsVector_));
}
}

TupleElements& operator=(TupleElements&& rhs) noexcept {
if (inlineSize_) {
if (rhs.inlineSize_) {
for (const auto ii : c10::irange(std::min(inlineSize_, rhs.inlineSize_))) {
elementsInline_[ii] = std::move(rhs.elementsInline_[ii]);
}
if (rhs.inlineSize_ > inlineSize_) {
for (const auto ii : c10::irange(inlineSize_, rhs.inlineSize_)) {
new (&elementsInline_[ii]) IValue(std::move(rhs.elementsInline_[ii]));
}
}
} else {
destroyInline();
new (&elementsVector_) std::vector<IValue>(std::move(rhs.elementsVector_));
}
} else {
if (rhs.inlineSize_) {
elementsVector_.~vector();
for (const auto ii : c10::irange(rhs.inlineSize_)) {
new (&elementsInline_[ii]) IValue(std::move(rhs.elementsInline_[ii]));
}
} else {
elementsVector_ = std::move(rhs.elementsVector_);
}
}
inlineSize_ = rhs.inlineSize_;
return *this;
}

c10::ArrayRef<IValue> asArrayRef() const {
if (inlineSize_) {
return c10::ArrayRef<IValue>(elementsInline_, inlineSize_);
} else {
return elementsVector_;
}
}

void setContents(std::vector<IValue>&& contents) {
if (inlineSize_) {
for (const auto ii : c10::irange(inlineSize_)) {
elementsInline_[ii].~IValue();
}
new (&elementsVector_) std::vector<IValue>(std::move(contents));
} else {
elementsVector_ = std::move(contents);
}
}

size_t size() const {
return inlineSize_ ? inlineSize_ : elementsVector_.size();
}

IValue& operator[](size_t idx) {
if (inlineSize_) {
return elementsInline_[idx];
} else {
return elementsVector_[idx];
}
}

const IValue& operator[](size_t idx) const {
if (inlineSize_) {
return elementsInline_[idx];
} else {
return elementsVector_[idx];
}
}

IValue& at(size_t idx) {
if (inlineSize_) {
TORCH_INTERNAL_ASSERT_DEBUG_ONLY(inlineSize_ <= 3);
TORCH_CHECK(idx < inlineSize_, "TupleElements: invalid index Index = ", idx, "; Length = ", inlineSize_);
return elementsInline_[idx];
} else {
return elementsVector_.at(idx);
}
}

const IValue& at(size_t idx) const {
if (inlineSize_) {
TORCH_INTERNAL_ASSERT_DEBUG_ONLY(inlineSize_ <= 3);
TORCH_CHECK(idx < inlineSize_, "TupleElements: invalid index Index = ", idx, "; Length = ", inlineSize_);
return elementsInline_[idx];
} else {
return elementsVector_.at(idx);
}
}

iterator begin() {
if (inlineSize_) {
return elementsInline_;
} else {
return elementsVector_.empty() ? nullptr : &elementsVector_[0];
}
}

iterator end() {
if (inlineSize_) {
return elementsInline_ + inlineSize_;
} else {
return elementsVector_.empty() ? nullptr : &elementsVector_[elementsVector_.size()];
}
}

const_iterator begin() const {
if (inlineSize_) {
return elementsInline_;
} else {
return elementsVector_.empty() ? nullptr : &elementsVector_[0];
}
}

const_iterator end() const {
if (inlineSize_) {
return elementsInline_ + inlineSize_;
} else {
return elementsVector_.empty() ? nullptr : &elementsVector_[elementsVector_.size()];
}
}

const_iterator cbegin() const {
return begin();
}

const_iterator cend() const {
return end();
}

std::vector<IValue> vec() const & {
return asArrayRef().vec();
}

IValue& back() {
return *(end() - 1);
}

const IValue& back() const {
return *(end() - 1);
}

std::vector<IValue> vec() && {
std::vector<IValue> result;
result.reserve(size());
for (auto&& iv : *this) {
result.push_back(std::move(iv));
}
return result;
}
};

struct TORCH_API Tuple : c10::intrusive_ptr_target {
private:
std::vector<IValue> elements_;
TupleElements elements_;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need that new TupleElements class instead of just using a c10::SmallVector<IValue, 3>?

mutable std::shared_ptr<TupleType>
type_; // lazily computed for unnamed tuples

Expand All @@ -263,28 +482,55 @@ struct TORCH_API Tuple : c10::intrusive_ptr_target {
std::shared_ptr<TupleType> type_) {
return c10::make_intrusive<Tuple>(std::move(elements_), type_);
}

static c10::intrusive_ptr<Tuple> create(std::vector<IValue> elements_) {
return c10::make_intrusive<Tuple>(std::move(elements_));
}

static c10::intrusive_ptr<Tuple> create(TupleElements elements_) {
return c10::make_intrusive<Tuple>(std::move(elements_));
}

static c10::intrusive_ptr<Tuple> create(IValue e1) {
return c10::make_intrusive<Tuple>(std::move(e1));
}

static c10::intrusive_ptr<Tuple> create(IValue e1, IValue e2) {
return c10::make_intrusive<Tuple>(std::move(e1), std::move(e2));
}

static c10::intrusive_ptr<Tuple> create(IValue e1, IValue e2, IValue e3) {
return c10::make_intrusive<Tuple>(std::move(e1), std::move(e2), std::move(e3));
}

template <typename... Args>
static c10::intrusive_ptr<Tuple> create(Args&&... elements_) {
return c10::make_intrusive<Tuple>(
std::vector<IValue>{IValue(std::forward<Args>(elements_))...});
}

Tuple(const Tuple& rhs) = delete;

c10::ArrayRef<IValue> elements() const& {
return elements_;
return elements_.asArrayRef();
}

std::vector<IValue> elements() && {
TupleElements elements() && {
return std::move(elements_);
}

void setElements(std::vector<IValue>&& elements) {
elements_.setContents(std::move(elements));
}

void setElements(TupleElements&& elements) {
elements_ = std::move(elements);
}

size_t size() const {
return elements_.size();
}

std::shared_ptr<TupleType> type() const;

static size_t hash(const Tuple& t) {
Expand All @@ -297,7 +543,19 @@ struct TORCH_API Tuple : c10::intrusive_ptr_target {

private:
Tuple(std::vector<IValue> elements, std::shared_ptr<TupleType> type = nullptr)
: elements_(std::move(elements)), type_(std::move(type)) {}
: elements_(std::move(elements)), type_(std::move(type)) {}

Tuple(TupleElements&& elements, std::shared_ptr<TupleType> type = nullptr)
: elements_(std::move(elements)), type_(std::move(type)) {}

explicit Tuple(IValue&& e1, std::shared_ptr<TupleType> type = nullptr)
: elements_(std::move(e1)), type_(std::move(type)) {}

explicit Tuple(IValue&& e1, IValue&& e2, std::shared_ptr<TupleType> type = nullptr)
: elements_(std::move(e1), std::move(e2)), type_(std::move(type)) {}

explicit Tuple(IValue&& e1, IValue&& e2, IValue&& e3, std::shared_ptr<TupleType> type = nullptr)
: elements_(std::move(e1), std::move(e2), std::move(e3)), type_(std::move(type)) {}

friend class c10::intrusive_ptr<Tuple>;
};
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -55,28 +55,25 @@ std::unique_ptr<PropagateGradientsReq> PropagateGradientsReq::fromMessage(
payload_size,
*rpc::RpcAgent::getCurrentRpcAgent()->getTypeResolver(),
message.tensors());
std::vector<at::IValue> tupleElements = std::move(*std::move(tuple).toTuple()).elements();
auto tupleElements = std::move(*std::move(tuple).toTuple()).elements();

// Build PropagateGradientsReq.
TORCH_INTERNAL_ASSERT(tupleElements.size() >= 3);

// Retrieve retainGraph.
bool retainGraph = tupleElements.back().toBool();
tupleElements.pop_back();

// Build AutogradMetadata.
// NOLINTNEXTLINE(cppcoreguidelines-init-variables)
int64_t autogradContextId, autogradMessageId;
autogradMessageId = tupleElements.back().toInt();
tupleElements.pop_back();
autogradContextId = tupleElements.back().toInt();
tupleElements.pop_back();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you removing these pop_back() calls? Unless I'm missing something obvious, without them autogradMessageId and autogradContextId will be the same. This can't possibly be correct. Didn't any test catch it? CC @pritamdamania87

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this doesn't seem right.


AutogradMetadata autogradMetadata(autogradContextId, autogradMessageId);

// Retrieve the gradient tensors.
std::vector<Variable> grads(tupleElements.size());
for(const auto i : c10::irange(tupleElements.size())) {
std::vector<Variable> grads(tupleElements.size() - 3);
for(const auto i : c10::irange(tupleElements.size() - 3)) {
grads[i] = std::move(tupleElements[i]).toTensor();
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ std::unique_ptr<RRefBackwardReq> RRefBackwardReq::fromMessage(
payload_size,
*rpc::RpcAgent::getCurrentRpcAgent()->getTypeResolver(),
message.tensors());
std::vector<at::IValue> tupleElements = std::move(*std::move(tuple).toTuple()).elements();
auto tupleElements = std::move(*std::move(tuple).toTuple()).elements();

// Build RRefBackwardReq.
TORCH_INTERNAL_ASSERT(tupleElements.size() == 3);
Expand Down
2 changes: 1 addition & 1 deletion torch/csrc/distributed/rpc/python_remote_call.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ std::unique_ptr<PythonRemoteCall> PythonRemoteCall::fromMessage(
payload_size,
*RpcAgent::getCurrentRpcAgent()->getTypeResolver(),
message.tensors());
auto values = std::move(*std::move(value).toTuple()).elements();
auto values = std::move(*std::move(value).toTuple()).elements().vec();

// remove the last elements from values and convert it back to an RRef
TORCH_INTERNAL_ASSERT(
Expand Down
Loading