This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author webknjaz
Recipients webknjaz
Date 2021-04-19.11:50:46
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <[email protected]>
In-reply-to
Content
I noticed that https://github.com/python/cpython/runs/2378199636 (a coverage job on the last commit on master at the time of writing) takes suspiciously long to complete.

I did some investigation and noticed that this job on the 3.9 branch succeeds (all of the job runs on the first page in the list are green — https://github.com/python/cpython/actions/workflows/coverage.yml?query=branch%3A3.9)

But then I took a look at the runs on master and discovered that the last successful run was 4 months ago — https://github.com/python/cpython/actions.html?query=is%3Asuccess+branch%3Amaster&workflow_file_name=coverage.yml.

The last success is https://github.com/python/cpython/actions/runs/444323166 and after that, starting with https://github.com/python/cpython/actions/runs/444405699, if fails consistently.

Notably, all of the failures are caused by the job timeout after *6 hours* — GitHub platform just kills those, 6h is a default per-job timeout in GHA.

It's also important to mention that before every job starting timing out effectively burning 6 hours of GHA time for each merge and producing no useful reports, there were occasional 6h-timeouts but they weren't consistent.

Looking into the successful runs from the past, on master and other jobs, I haven't noticed it taking more than 1h35m to complete with a successful outcome. Taking into account this as a baseline, I suggest changing the timeout of the whole job or maybe just one step that actually runs coverage.

Action items:
* Set job timeout in GHA to 1h40m (allowing a bit of extra time for exceptionally slow jobs) — this will make sure that the failure/timeout is reported sooner than 6h
* Figure out why this started happening in the first place.

I'm going to send a PR addressing the first point but feel free to pick up the investigation part — I don't expect to have time for this anytime soon.

P.S. FTR the last timeout of this type happened two months ago — https://github.com/python/cpython/actions.html?page=4&query=branch%3A3.9&workflow_file_name=coverage.yml.
History
Date User Action Args
2021-04-19 11:50:46webknjazsetrecipients: + webknjaz
2021-04-19 11:50:46webknjazsetmessageid: <[email protected]>
2021-04-19 11:50:46webknjazlinkissue43888 messages
2021-04-19 11:50:46webknjazcreate