Don't leave a math puzzle for the compiler in BMI decoder

swolchok · facebook-github-bot · commit eebf98682265 · 2024-03-04T23:40:51.000-08:00
Summary: We computed `8 * intBytes - 1`, converted that to  `intBytes`, and then did a shift by `8 * intBytes - 1`. Saving the shift value directly causes clang to generate shorter code.

Differential Revision: D54440459

fbshipit-source-id: 5f1380f8b38fd706ed91b903d6b212e9f791f626
diff --git a/thrift/lib/cpp/util/VarintUtils-inl.h b/thrift/lib/cpp/util/VarintUtils-inl.h
@@ -234,10 +234,10 @@ inline size_t readContiguousVarintMediumSlowU64BMI2(
   }
   // By reset data bits and toggle the continuation bits, the tailing zeros
   // should be intBytes*8-1
-  size_t intBytes =
-      (__builtin_ctzll(continuationBits ^ kContinuationBitMask) >> 3) + 1;
+  size_t maskShift = __builtin_ctzll(continuationBits ^ kContinuationBitMask);
+  size_t intBytes = (maskShift >> 3) + 1;
 
-  uint64_t mask = (1ULL << (8 * intBytes - 1)) - 1;
+  uint64_t mask = (1ULL << maskShift) - 1;
   // You might think it would make more sense to to the pext first and mask
   // afterwards (avoiding having two pexts in a single dependency chain at 3
   // cycles / pop); this seems not to be borne out in microbenchmarks. The