You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[Pytorch] Specialize guts of c10::optional for 32-bit scalars (#47015)
Summary:
Pull Request resolved: #47015
c10::optional has non-trivial copy and move operations always. This change specializes it for 32-bit scalars so that it has trivial copy and move operations in that case. Ideally, we would instead rely on P0602 "variant and optional should propagate copy/move triviality" and use `std::optional` (or implement that functionality ourselves). We can't use `std::optional` because we are stuck with C++14. Implementing the full P0602 ourselves would add even more complexity. We could do it, but this should be a helpful first step.
ghstack-source-id: 115886743
Test Plan:
Collect Callgrind instruction counts for `torch.empty(())`. Data:
Make empty c10-ful (#46092):
```
<torch.utils.benchmark.utils.valgrind_wrapper.timer_interface.CallgrindStats object at 0x7ffaed1128e0>
torch.empty(())
All Noisy symbols removed
Instructions: 648005 632899
Baseline: 4144 3736
100 runs per measurement, 1 thread
```
This diff atop #46092:
```
<torch.utils.benchmark.utils.valgrind_wrapper.timer_interface.CallgrindStats object at 0x7f943f1dc8e0>
torch.empty(())
All Noisy symbols removed
Instructions: 602347 591005
Baseline: 4106 3736
100 runs per measurement, 1 thread
```
(6.6% improvement vs #46092)
Pass optionals by const reference (#46598)
```
<torch.utils.benchmark.utils.valgrind_wrapper.timer_interface.CallgrindStats object at 0x7f1abb3988e0>
torch.empty(())
All Noisy symbols removed
Instructions: 601349 590005
Baseline: 4162 3736
100 runs per measurement, 1 thread
```
(6.8% improvement vs #46092)
This diff atop #46598 (i.e., both together)
```
<torch.utils.benchmark.utils.valgrind_wrapper.timer_interface.CallgrindStats object at 0x7f9577c22850>
torch.empty(())
All Noisy symbols removed
Instructions: 596095 582451
Baseline: 4162 3736
100 runs per measurement, 1 thread
Warning: PyTorch was not built with debug symbols.
Source information may be limited. Rebuild with
REL_WITH_DEB_INFO=1 for more detailed results.
```
(another 1.3% savings!)
#46598 outperformed this change slightly, and combining the two leads to further benefits. I guess we should do both! (Though I still don't understand why passing optionals that should fit in a register by const reference would help...)
Reviewed By: smessmer
Differential Revision: D24552280
fbshipit-source-id: 4d93bfcffafebd8c01559398513fa6b9db959d11
0 commit comments