-
Notifications
You must be signed in to change notification settings - Fork 24.7k
[PyTorch] Use real if constexpr behind macro in hot template #51368
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This seems to noticably reduce build times, at least for RegisterCPU.cpp. It makes sense that a compiler builtin would be faster than simulating the same builtin with templates. Identified with templight. Differential Revision: [D26154964](https://our.internmc.facebook.com/intern/diff/D26154964/) [ghstack-poisoned]
💊 CI failures summary and remediationsAs of commit f309519 (more details on the Dr. CI page):
Extra GitHub checks: 1 failed
This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.Please report bugs/suggestions to the (internal) Dr. CI Users group. |
This seems to noticably reduce build times, at least for RegisterCPU.cpp. It makes sense that a compiler builtin would be faster than simulating the same builtin with templates. Identified with templight. Differential Revision: [D26154964](https://our.internmc.facebook.com/intern/diff/D26154964/) [ghstack-poisoned]
Pull Request resolved: #51368 This seems to noticably reduce build times, at least for RegisterCPU.cpp. It makes sense that a compiler builtin would be faster than simulating the same builtin with templates. Identified with templight. ghstack-source-id: 120697490 Differential Revision: [D26154964](https://our.internmc.facebook.com/intern/diff/D26154964/)
This seems to noticably reduce build times, at least for RegisterCPU.cpp. It makes sense that a compiler builtin would be faster than simulating the same builtin with templates. Identified with templight. Differential Revision: [D26154964](https://our.internmc.facebook.com/intern/diff/D26154964/) [ghstack-poisoned]
Pull Request resolved: #51368 This seems to noticably reduce build times, at least for RegisterCPU.cpp. It makes sense that a compiler builtin would be faster than simulating the same builtin with templates. Identified with templight. ghstack-source-id: 120761381 Differential Revision: [D26154964](https://our.internmc.facebook.com/intern/diff/D26154964/)
This seems to noticably reduce build times, at least for RegisterCPU.cpp. It makes sense that a compiler builtin would be faster than simulating the same builtin with templates. Identified with templight. Differential Revision: [D26154964](https://our.internmc.facebook.com/intern/diff/D26154964/) [ghstack-poisoned]
Pull Request resolved: #51368 This seems to noticably reduce build times, at least for RegisterCPU.cpp. It makes sense that a compiler builtin would be faster than simulating the same builtin with templates. Identified with templight. ghstack-source-id: 121256867 Differential Revision: [D26154964](https://our.internmc.facebook.com/intern/diff/D26154964/)
This seems to noticably reduce build times, at least for RegisterCPU.cpp. It makes sense that a compiler builtin would be faster than simulating the same builtin with templates. Identified with templight. Differential Revision: [D26154964](https://our.internmc.facebook.com/intern/diff/D26154964/) [ghstack-poisoned]
Pull Request resolved: #51368 This seems to noticably reduce build times, at least for RegisterCPU.cpp. It makes sense that a compiler builtin would be faster than simulating the same builtin with templates. Identified with templight. ghstack-source-id: 121300917 Differential Revision: [D26154964](https://our.internmc.facebook.com/intern/diff/D26154964/)
This seems to noticably reduce build times, at least for RegisterCPU.cpp. It makes sense that a compiler builtin would be faster than simulating the same builtin with templates. Identified with templight. Differential Revision: [D26154964](https://our.internmc.facebook.com/intern/diff/D26154964/) [ghstack-poisoned]
Pull Request resolved: #51368 This seems to noticably reduce build times, at least for RegisterCPU.cpp. It makes sense that a compiler builtin would be faster than simulating the same builtin with templates. Identified with templight. ghstack-source-id: 121363469 Differential Revision: [D26154964](https://our.internmc.facebook.com/intern/diff/D26154964/)
…hind macro in hot template" This seems to noticably reduce build times, at least for RegisterCPU.cpp. It makes sense that a compiler builtin would be faster than simulating the same builtin with templates. Identified with templight. Differential Revision: [D26154964](https://our.internmc.facebook.com/intern/diff/D26154964/) [ghstack-poisoned]
Pull Request resolved: #51368 This seems to noticably reduce build times, at least for RegisterCPU.cpp. It makes sense that a compiler builtin would be faster than simulating the same builtin with templates. Identified with templight. ghstack-source-id: 121378959 Differential Revision: [D26154964](https://our.internmc.facebook.com/intern/diff/D26154964/)
Do we have any data on the perf benefit from this? I can believe the lambdas in the polyfill API add overhead (or inline disruption risk, etc.) over the native feature, but given that we'll want to use this kind of surface-level #ifdef sparingly for readability's sake, it'd be good to establish a ballpark quantitative threshold for when it's justified. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah nvm my comment, saw the numbers in D26154964. To answer my own question, I'd say ~8% reduction in build time makes the surface-level splintering tolerable.
(BTW, wasn't marked as a reviewer on GH and missed it on Phab, hence the delay, apologies)
@bhosmer note that IIUC, Ubuntu 16.04 LTS is the blocker for raising our minimum C++ version to C++17. It hits end of life in April, so we should be able to move to C++17 and use |
Yup! @smessmer actually has the move to C++17 on his H1 roadmap, so you're right, it'll clean up pretty quick. |
This pull request has been merged in 6e1a5b1. |
…#51368) Summary: Pull Request resolved: pytorch#51368 This seems to noticably reduce build times, at least for RegisterCPU.cpp. It makes sense that a compiler builtin would be faster than simulating the same builtin with templates. Identified with templight. ghstack-source-id: 121378959 Test Plan: Confirmed this speeds up RegisterCPU.cpp optimized build by simply running builds under `time(1)`: previous diff: [50.53, 50.41, 50.57, 50.67, 50.94] mean: 50.6 std: 0.179 this diff: [45.71, 45.89, 46.21, 48.51, 45.84] mean: 46.4 std: 1.05 Reviewed By: bhosmer Differential Revision: D26154964 fbshipit-source-id: 62ee2f5a872007db032dfebf7ad4d1b6e1ce63d1
Stack from ghstack:
This seems to noticably reduce build times, at least for
RegisterCPU.cpp. It makes sense that a compiler builtin would be
faster than simulating the same builtin with templates.
Identified with templight.
Differential Revision: D26154964