Skip to content

[PyTorch] Use real if constexpr behind macro in hot template #51368

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 7 commits into from

Conversation

swolchok
Copy link
Contributor

@swolchok swolchok commented Jan 29, 2021

Stack from ghstack:

This seems to noticably reduce build times, at least for
RegisterCPU.cpp. It makes sense that a compiler builtin would be
faster than simulating the same builtin with templates.

Identified with templight.

Differential Revision: D26154964

This seems to noticably reduce build times, at least for
RegisterCPU.cpp. It makes sense that a compiler builtin would be
faster than simulating the same builtin with templates.

Identified with templight.

Differential Revision: [D26154964](https://our.internmc.facebook.com/intern/diff/D26154964/)

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Jan 29, 2021

💊 CI failures summary and remediations

As of commit f309519 (more details on the Dr. CI page):


  • 1/1 failures possibly* introduced in this PR
    • 1/1 non-CircleCI failure(s)

Extra GitHub checks: 1 failed


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

This seems to noticably reduce build times, at least for
RegisterCPU.cpp. It makes sense that a compiler builtin would be
faster than simulating the same builtin with templates.

Identified with templight.

Differential Revision: [D26154964](https://our.internmc.facebook.com/intern/diff/D26154964/)

[ghstack-poisoned]
swolchok added a commit that referenced this pull request Jan 29, 2021
Pull Request resolved: #51368

This seems to noticably reduce build times, at least for
RegisterCPU.cpp. It makes sense that a compiler builtin would be
faster than simulating the same builtin with templates.

Identified with templight.
ghstack-source-id: 120697490

Differential Revision: [D26154964](https://our.internmc.facebook.com/intern/diff/D26154964/)
This seems to noticably reduce build times, at least for
RegisterCPU.cpp. It makes sense that a compiler builtin would be
faster than simulating the same builtin with templates.

Identified with templight.

Differential Revision: [D26154964](https://our.internmc.facebook.com/intern/diff/D26154964/)

[ghstack-poisoned]
swolchok added a commit that referenced this pull request Feb 1, 2021
Pull Request resolved: #51368

This seems to noticably reduce build times, at least for
RegisterCPU.cpp. It makes sense that a compiler builtin would be
faster than simulating the same builtin with templates.

Identified with templight.
ghstack-source-id: 120761381

Differential Revision: [D26154964](https://our.internmc.facebook.com/intern/diff/D26154964/)
This seems to noticably reduce build times, at least for
RegisterCPU.cpp. It makes sense that a compiler builtin would be
faster than simulating the same builtin with templates.

Identified with templight.

Differential Revision: [D26154964](https://our.internmc.facebook.com/intern/diff/D26154964/)

[ghstack-poisoned]
swolchok added a commit that referenced this pull request Feb 8, 2021
Pull Request resolved: #51368

This seems to noticably reduce build times, at least for
RegisterCPU.cpp. It makes sense that a compiler builtin would be
faster than simulating the same builtin with templates.

Identified with templight.
ghstack-source-id: 121256867

Differential Revision: [D26154964](https://our.internmc.facebook.com/intern/diff/D26154964/)
This seems to noticably reduce build times, at least for
RegisterCPU.cpp. It makes sense that a compiler builtin would be
faster than simulating the same builtin with templates.

Identified with templight.

Differential Revision: [D26154964](https://our.internmc.facebook.com/intern/diff/D26154964/)

[ghstack-poisoned]
swolchok added a commit that referenced this pull request Feb 9, 2021
Pull Request resolved: #51368

This seems to noticably reduce build times, at least for
RegisterCPU.cpp. It makes sense that a compiler builtin would be
faster than simulating the same builtin with templates.

Identified with templight.
ghstack-source-id: 121300917

Differential Revision: [D26154964](https://our.internmc.facebook.com/intern/diff/D26154964/)
This seems to noticably reduce build times, at least for
RegisterCPU.cpp. It makes sense that a compiler builtin would be
faster than simulating the same builtin with templates.

Identified with templight.

Differential Revision: [D26154964](https://our.internmc.facebook.com/intern/diff/D26154964/)

[ghstack-poisoned]
swolchok added a commit that referenced this pull request Feb 9, 2021
Pull Request resolved: #51368

This seems to noticably reduce build times, at least for
RegisterCPU.cpp. It makes sense that a compiler builtin would be
faster than simulating the same builtin with templates.

Identified with templight.
ghstack-source-id: 121363469

Differential Revision: [D26154964](https://our.internmc.facebook.com/intern/diff/D26154964/)
…hind macro in hot template"

This seems to noticably reduce build times, at least for
RegisterCPU.cpp. It makes sense that a compiler builtin would be
faster than simulating the same builtin with templates.

Identified with templight.

Differential Revision: [D26154964](https://our.internmc.facebook.com/intern/diff/D26154964/)

[ghstack-poisoned]
swolchok added a commit that referenced this pull request Feb 9, 2021
Pull Request resolved: #51368

This seems to noticably reduce build times, at least for
RegisterCPU.cpp. It makes sense that a compiler builtin would be
faster than simulating the same builtin with templates.

Identified with templight.
ghstack-source-id: 121378959

Differential Revision: [D26154964](https://our.internmc.facebook.com/intern/diff/D26154964/)
@bhosmer
Copy link

bhosmer commented Feb 14, 2021

Do we have any data on the perf benefit from this? I can believe the lambdas in the polyfill API add overhead (or inline disruption risk, etc.) over the native feature, but given that we'll want to use this kind of surface-level #ifdef sparingly for readability's sake, it'd be good to establish a ballpark quantitative threshold for when it's justified.

Copy link

@bhosmer bhosmer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah nvm my comment, saw the numbers in D26154964. To answer my own question, I'd say ~8% reduction in build time makes the surface-level splintering tolerable.

(BTW, wasn't marked as a reviewer on GH and missed it on Phab, hence the delay, apologies)

@swolchok
Copy link
Contributor Author

@bhosmer note that IIUC, Ubuntu 16.04 LTS is the blocker for raising our minimum C++ version to C++17. It hits end of life in April, so we should be able to move to C++17 and use if constexpr directly in May.

@bhosmer
Copy link

bhosmer commented Feb 16, 2021

@bhosmer note that IIUC, Ubuntu 16.04 LTS is the blocker for raising our minimum C++ version to C++17. It hits end of life in April, so we should be able to move to C++17 and use if constexpr directly in May.

Yup! @smessmer actually has the move to C++17 on his H1 roadmap, so you're right, it'll clean up pretty quick.

@facebook-github-bot
Copy link
Contributor

This pull request has been merged in 6e1a5b1.

@facebook-github-bot facebook-github-bot deleted the gh/swolchok/105/head branch February 21, 2021 15:16
xsacha pushed a commit to xsacha/pytorch that referenced this pull request Mar 31, 2021
…#51368)

Summary:
Pull Request resolved: pytorch#51368

This seems to noticably reduce build times, at least for
RegisterCPU.cpp. It makes sense that a compiler builtin would be
faster than simulating the same builtin with templates.

Identified with templight.
ghstack-source-id: 121378959

Test Plan:
Confirmed this speeds up RegisterCPU.cpp optimized build by
simply running builds under `time(1)`:

previous diff: [50.53, 50.41, 50.57, 50.67, 50.94]
mean: 50.6 std: 0.179

this diff: [45.71, 45.89, 46.21, 48.51, 45.84]
mean: 46.4 std: 1.05

Reviewed By: bhosmer

Differential Revision: D26154964

fbshipit-source-id: 62ee2f5a872007db032dfebf7ad4d1b6e1ce63d1
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants