Skip to content

[PyTorch] Stack-allocate boxed args for RecordFunction #76266

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 12 commits into from

Conversation

swolchok
Copy link
Contributor

@swolchok swolchok commented Apr 22, 2022

Stack from ghstack (oldest at bottom):

Saving a heap allocation in this path improves performance.

Differential Revision: D34090699

Saving a heap allocation in this path improves performance.

Differential Revision: [D34090699](https://our.internmc.facebook.com/intern/diff/D34090699/)

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Apr 22, 2022

🔗 Helpful links

❌ 2 New Failures

As of commit 48895d1 (more details on the Dr. CI page):

Expand to see more
  • 2/2 failures introduced in this PR

🕵️ 2 new failures recognized by patterns

The following CI failures do not appear to be due to upstream breakages

See GitHub Actions build trunk / linux-bionic-cuda10.2-py3.9-gcc7 / test (default, 2, 2, linux.4xlarge.nvidia.gpu) (1/2)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

2022-05-20T17:48:12.2313759Z RuntimeError: test_sparse_csr failed!
2022-05-20T17:48:11.5246675Z 
2022-05-20T17:48:11.5246809Z Generating XML reports...
2022-05-20T17:48:11.6841472Z Generated XML report: test-reports/python-unittest/test_sparse_csr/TEST-TestSparseCSRCUDA-20220520174739.xml
2022-05-20T17:48:11.6843227Z Generated XML report: test-reports/python-unittest/test_sparse_csr/TEST-TestSparseCSRSampler-20220520174739.xml
2022-05-20T17:48:11.7354585Z Generated XML report: test-reports/python-unittest/test_sparse_csr/TEST-TestSparseCompressedCUDA-20220520174739.xml
2022-05-20T17:48:12.2308554Z Traceback (most recent call last):
2022-05-20T17:48:12.2311054Z   File "/var/lib/jenkins/workspace/test/run_test.py", line 1074, in <module>
2022-05-20T17:48:12.2311819Z     main()
2022-05-20T17:48:12.2312281Z   File "/var/lib/jenkins/workspace/test/run_test.py", line 1052, in main
2022-05-20T17:48:12.2312995Z     raise RuntimeError(err_message)
2022-05-20T17:48:12.2313759Z RuntimeError: test_sparse_csr failed!
2022-05-20T17:48:12.7866072Z 
2022-05-20T17:48:12.7866712Z real	6m31.042s
2022-05-20T17:48:12.7867040Z user	9m50.392s
2022-05-20T17:48:12.7867309Z sys	1m17.978s
2022-05-20T17:48:12.7867742Z + cleanup
2022-05-20T17:48:12.7868112Z + retcode=1
2022-05-20T17:48:12.7868488Z + set +x
2022-05-20T17:48:12.7918914Z ##[error]Process completed with exit code 1.
2022-05-20T17:48:12.7967158Z ##[group]Run pytorch/pytorch/.github/actions/get-workflow-job-id@master
2022-05-20T17:48:12.7967514Z with:

See GitHub Actions build pull / linux-bionic-rocm5.1-py3.7 / test (default, 1, 2, linux.rocm.gpu) (2/2)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

2022-05-20T18:40:52.8636762Z RuntimeError: test_sparse_csr failed!
2022-05-20T18:40:49.3638896Z 
2022-05-20T18:40:49.3639226Z Generating XML reports...
2022-05-20T18:40:49.6296118Z Generated XML report: test-reports/python-unittest/test_sparse_csr/TEST-TestSparseCSRCUDA-20220520184015.xml
2022-05-20T18:40:49.6320853Z Generated XML report: test-reports/python-unittest/test_sparse_csr/TEST-TestSparseCSRSampler-20220520184015.xml
2022-05-20T18:40:49.6881829Z Generated XML report: test-reports/python-unittest/test_sparse_csr/TEST-TestSparseCompressedCUDA-20220520184015.xml
2022-05-20T18:40:52.8617632Z Traceback (most recent call last):
2022-05-20T18:40:52.8618978Z   File "test/run_test.py", line 1074, in <module>
2022-05-20T18:40:52.8628622Z     main()
2022-05-20T18:40:52.8629753Z   File "test/run_test.py", line 1052, in main
2022-05-20T18:40:52.8635423Z     raise RuntimeError(err_message)
2022-05-20T18:40:52.8636762Z RuntimeError: test_sparse_csr failed!
2022-05-20T18:40:54.8404957Z 
2022-05-20T18:40:54.8405731Z real	6m34.281s
2022-05-20T18:40:54.8406521Z user	22m34.657s
2022-05-20T18:40:54.8407229Z sys	3m6.947s
2022-05-20T18:40:54.8408388Z + cleanup
2022-05-20T18:40:54.8409083Z + retcode=1
2022-05-20T18:40:54.8409736Z + set +x
2022-05-20T18:40:54.8541425Z ##[error]Process completed with exit code 1.
2022-05-20T18:40:54.8629772Z ##[group]Run # copy test results back to the mounted workspace, needed sudo, resulting permissions were correct
2022-05-20T18:40:54.8630305Z �[36;1m# copy test results back to the mounted workspace, needed sudo, resulting permissions were correct�[0m

This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

swolchok added a commit that referenced this pull request Apr 22, 2022
Saving a heap allocation in this path improves performance.

Differential Revision: [D34090699](https://our.internmc.facebook.com/intern/diff/D34090699/)

ghstack-source-id: 154626794
Pull Request resolved: #76266
@swolchok
Copy link
Contributor Author

An interesting follow-up would be to make it possible to avoid heap-allocating a Stack for calls to boxed KernelFunctions made via make_boxed_from_unboxed_functor. I have no idea how common this is, though.

template <typename T>
C10_DISPATCHER_INLINE_UNLESS_MOBILE void box(IValueStorage* dest, T& arg, int& lastIdx) {
new (&dest[lastIdx]) IValue(arg);
lastIdx++;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be kept consistent with our other boxing logic, is that right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest a cross-ref with aten/src/ATen/core/boxing/impl/boxing.h (or perhaps moving it to this file entirely)

using IValueStorage = std::aligned_storage_t<sizeof(IValue), alignof(IValue)>;

template <typename T>
C10_DISPATCHER_INLINE_UNLESS_MOBILE void box(IValueStorage* dest, T& arg, int& lastIdx) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name here is a bit ambiguous; what we're doing is placement new boxing; would be nice if the name had this so people don't try to use this invalidly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's in detail and takes a weird typedef; they'll have trouble using it without writing obvious "here be dragons" stuff like aligned_storage

Copy link
Contributor

@ezyang ezyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mechanically looks all fine, just some naming / file placement stuff

Saving a heap allocation in this path improves performance.

Differential Revision: [D34090699](https://our.internmc.facebook.com/intern/diff/D34090699/)

[ghstack-poisoned]
swolchok added a commit that referenced this pull request Apr 25, 2022
Pull Request resolved: #76266

Saving a heap allocation in this path improves performance.
ghstack-source-id: 154745500

Differential Revision: [D34090699](https://our.internmc.facebook.com/intern/diff/D34090699/)
Saving a heap allocation in this path improves performance.

Differential Revision: [D34090699](https://our.internmc.facebook.com/intern/diff/D34090699/)

[ghstack-poisoned]
swolchok added a commit that referenced this pull request Apr 26, 2022
Pull Request resolved: #76266

Saving a heap allocation in this path improves performance.
ghstack-source-id: 154777631

Differential Revision: [D34090699](https://our.internmc.facebook.com/intern/diff/D34090699/)
Saving a heap allocation in this path improves performance.

Differential Revision: [D34090699](https://our.internmc.facebook.com/intern/diff/D34090699/)

[ghstack-poisoned]
swolchok added a commit that referenced this pull request Apr 26, 2022
Pull Request resolved: #76266

Saving a heap allocation in this path improves performance.
ghstack-source-id: 154795837

Differential Revision: [D34090699](https://our.internmc.facebook.com/intern/diff/D34090699/)
Saving a heap allocation in this path improves performance.

Differential Revision: [D34090699](https://our.internmc.facebook.com/intern/diff/D34090699/)

[ghstack-poisoned]
swolchok added a commit that referenced this pull request May 2, 2022
Pull Request resolved: #76266

Saving a heap allocation in this path improves performance.
ghstack-source-id: 155249947

Differential Revision: [D34090699](https://our.internmc.facebook.com/intern/diff/D34090699/)
Saving a heap allocation in this path improves performance.

Differential Revision: [D34090699](https://our.internmc.facebook.com/intern/diff/D34090699/)

[ghstack-poisoned]
swolchok added a commit that referenced this pull request May 4, 2022
Pull Request resolved: #76266

Saving a heap allocation in this path improves performance.
ghstack-source-id: 155492055

Differential Revision: [D34090699](https://our.internmc.facebook.com/intern/diff/D34090699/)
Saving a heap allocation in this path improves performance.

Differential Revision: [D34090699](https://our.internmc.facebook.com/intern/diff/D34090699/)

[ghstack-poisoned]
swolchok added a commit that referenced this pull request May 9, 2022
Pull Request resolved: #76266

Saving a heap allocation in this path improves performance.
ghstack-source-id: 155856030

Differential Revision: [D34090699](https://our.internmc.facebook.com/intern/diff/D34090699/)
Saving a heap allocation in this path improves performance.

Differential Revision: [D34090699](https://our.internmc.facebook.com/intern/diff/D34090699/)

[ghstack-poisoned]
swolchok added a commit that referenced this pull request May 9, 2022
Pull Request resolved: #76266

Saving a heap allocation in this path improves performance.
ghstack-source-id: 155873881

Differential Revision: [D34090699](https://our.internmc.facebook.com/intern/diff/D34090699/)
Saving a heap allocation in this path improves performance.

Differential Revision: [D34090699](https://our.internmc.facebook.com/intern/diff/D34090699/)

[ghstack-poisoned]
swolchok added a commit that referenced this pull request May 12, 2022
Pull Request resolved: #76266

Saving a heap allocation in this path improves performance.
ghstack-source-id: 156231743

Differential Revision: [D34090699](https://our.internmc.facebook.com/intern/diff/D34090699/)
Saving a heap allocation in this path improves performance.

Differential Revision: [D34090699](https://our.internmc.facebook.com/intern/diff/D34090699/)

[ghstack-poisoned]
swolchok added a commit that referenced this pull request May 18, 2022
Pull Request resolved: #76266

Saving a heap allocation in this path improves performance.
ghstack-source-id: 156690927

Differential Revision: [D34090699](https://our.internmc.facebook.com/intern/diff/D34090699/)
Saving a heap allocation in this path improves performance.

Differential Revision: [D34090699](https://our.internmc.facebook.com/intern/diff/D34090699/)

[ghstack-poisoned]
swolchok added a commit that referenced this pull request May 19, 2022
Pull Request resolved: #76266

Saving a heap allocation in this path improves performance.
ghstack-source-id: 156845963

Differential Revision: [D34090699](https://our.internmc.facebook.com/intern/diff/D34090699/)
Saving a heap allocation in this path improves performance.

Differential Revision: [D34090699](https://our.internmc.facebook.com/intern/diff/D34090699/)

[ghstack-poisoned]
swolchok added a commit that referenced this pull request May 20, 2022
Pull Request resolved: #76266

Saving a heap allocation in this path improves performance.
ghstack-source-id: 156914882

Differential Revision: [D34090699](https://our.internmc.facebook.com/intern/diff/D34090699/)
@swolchok
Copy link
Contributor Author

@pytorchbot merge

@github-actions
Copy link
Contributor

Hey @swolchok.
You've committed this PR, but it does not have both a 'release notes: ...' and 'topics: ...' label. Please add one of each to the PR. The 'release notes: ...' label should represent the part of PyTorch that this PR changes (fx, autograd, distributed, etc) and the 'topics: ...' label should represent the kind of PR it is (not user facing, new feature, bug fix, perf improvement, etc). The list of valid labels can be found here for the 'release notes: ...' and here for the 'topics: ...'.
For changes that are 'topic: not user facing' there is no need for a release notes label.

facebook-github-bot pushed a commit that referenced this pull request May 25, 2022
Summary:
Pull Request resolved: #76266

Saving a heap allocation in this path improves performance.

Test Plan: contbuild & OSS CI, see https://hud.pytorch.org/commit/pytorch/pytorch/80c4919bec29a8234e7289c058b905352d12d18e

Reviewed By: aaronenyeshi

Differential Revision: D34090699

Pulled By: swolchok

fbshipit-source-id: 4a6a6623237e89de58e7bc350f5703726a09515e
@facebook-github-bot facebook-github-bot deleted the gh/swolchok/501/head branch May 28, 2022 14:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants