bpo-47177: Replace f_lasti with prev_instr#32208
Conversation
markshannon
left a comment
There was a problem hiding this comment.
Looks very promising.
I think there is an off by one error, and I have a few suggestions and questions.
Objects/genobject.c
Outdated
| code->co_filename, | ||
| PyCode_Addr2Line(frame->f_code, frame->f_lasti*sizeof(_Py_CODEUNIT)), | ||
| int addr = _PyInterpreterFrame_LASTI(frame) * sizeof(_Py_CODEUNIT); | ||
| int line = PyCode_Addr2Line(frame->f_code, addr); |
There was a problem hiding this comment.
Slightly off-topic, but I wonder why we are going to the expense of computing the line here, rather than just storing the instruction offset?
Python/ceval.c
Outdated
| } while (0) | ||
| #else | ||
| #define INSTRUCTION_START(op) frame->f_lasti = INSTR_OFFSET(); next_instr++ | ||
| #define INSTRUCTION_START(op) (frame->next_instr = ++next_instr) |
There was a problem hiding this comment.
Here's something to consider.
Rather that storing next_instr, we could store last_instr (strictly, next_instr_less_one) as it would break the dependency of the store on the addition.
frame->next_instr_less_one = next_instr; next_instr++
The compiler can then merge the next_instr++ with any subsequent JUMPBY(INLINE_CACHE_ENTRIES_...)
It would also simplify the changes as you wouldn't need to change all the offsets by one.
Thoughts?
There was a problem hiding this comment.
Interesting. I'll try that out in another branch.
There was a problem hiding this comment.
There was a problem hiding this comment.
The performance numbers show no difference between that branch and this one. The other branch does result in a net reduction of "adjust by 1" moves, so I think I slightly prefer it.
Thoughts?
There was a problem hiding this comment.
I prefer the other version. It should be quicker, and feels less intrusive. I'm not surprised that there is no measurable difference in performance, I would expect it to be very slight.
|
When you're done making the requested changes, leave the comment: |
|
My performance test shows the same 2% speedup but with less variation, and no 6% slowdown for |
I was initially suspicious of the broad improvement too, but I was able to reproduce it with two fresh runs: DetailsSo now we're triply sure. |
|
Did you get the version with |
|
It's ready, just slipped off my radar. Would you prefer a separate PR, or just to bring those commits over to this branch? |
|
Whatever is easier. I don't mind |
f_lasti with next_instrf_lasti with prev_instr
|
I brought the commits over here. |
|
🤖 New build scheduled with the buildbot fleet by @markshannon for commit 670b94b 🤖 If you want to schedule another build, you need to add the ":hammer: test-with-buildbots" label again. |
|
Looks good. |
|
It seems there are many reference leaks caught by various buildbots. e.g.: https://buildbot.python.org/all/#/builders/766/builds/117 |
|
Looking into it. |
|
Oh wait, it looks like those are failing on other PRs, too. |
|
Feel free to merge once you are confident that all the failures are unrelated. I think they are unrelated, but I've not checked them individually. |
|
The refleaks disappear when I apply the fix in #32403. So not my fault. 🙂 |
|
The coverage project uses the PyFrameObject.f_frame.f_lasti field and so is broken by this PR. I wrote coveragepy/coveragepy#1331 to read the Python attribute, but the coverage maintainer has worries about worse performance. Maybe adding a new PyFrame_GetLastInstr() getter function would help. |
|
The PR adds a |
A nice 2% improvement:
Details
https://bugs.python.org/issue47177