Skip to content

Re-run flaky tests#5140

Merged
ethomson merged 2 commits intomasterfrom
ethomson/flaky_ci
Jun 24, 2019
Merged

Re-run flaky tests#5140
ethomson merged 2 commits intomasterfrom
ethomson/flaky_ci

Conversation

@ethomson
Copy link
Member

Re-run the flaky online integration tests. Since these hit actual network endpoints and those may be occasionally down, we want to re-run them up to 5 times. This prevents us from failing the builds just because the badssl endpoint is down, for example.

Copy link
Member

@pks-t pks-t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm generally 👍 on this, feels like it is long overdue. I've been wondering about whether this is the right layer to fix this, though. Wouldn't it make more sense to have this in the test code itself? Like that, developers would benefit from it, too, and we'd be able to mark single tests as flaky instead of having to re-do all tests of a particular test suite.

It's fine if your answer to the above is "no", though. :)

$TestCommand += " -r${BuildDir}\results_${TestName}.xml"

Invoke-Expression $TestCommand
if ($LastExitCode -ne 0) { $global:Success = $false }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing I wondered about recently: do we really need the Powershell file at all? I saw that Azure does support Bash on Windows via Git for Windows. If that's the case, then we should drop this file completely so that we don't always have to keep both in sync. I'd naturally volunteer to do that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's true, it does. I have no love for PowerShell... I'm happy to do that, too, but I've got a few other things going on before I can look at it.

@ethomson
Copy link
Member Author

I'm generally 👍 on this, feels like it is long overdue. I've been wondering about whether this is the right layer to fix this, though. Wouldn't it make more sense to have this in the test code itself? Like that, developers would benefit from it, too, and we'd be able to mark single tests as flaky instead of having to re-do all tests of a particular test suite.

Right. This was my first thought as well, actually. But then I realized it would require more thought than I wanted to give it. 😀

I think there are two issues here:

  1. We need a way to decorate tests as flaky. This could be as simple as /* flaky */ at the end of a declaration... in fact, that's probably the smart way to do this as far as clar's parsing goes.

(But then I started yakshaving, because I wish clar's parser was smart enough to deal with commented out tests. So then I started thinking more about using llvm or something, and then things got really out of control.)

We could also mark a suite as flaky on the command-line, which might actually be smarter still. eg, -fonline.

  1. All tests would need to get tightened up (or we'd fail them for memory leaks). This is not impossible by any means, but it is an added effort to go through and identify them. But if we do the whole-suite marking on the command line then we'd need to do a bunch en masse.

🤷‍♂

So I decided to knock this out quickly so that I'd be less angry. 😉


if [ "$FAILED" -ne 0 ]; then
SUCCESS=0
fi
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously we were executing failure, don't we have to do that now, too?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ach, that ended up dead. I removed it, but I also improved the error message a little, inspired by that function.

@pks-t
Copy link
Member

pks-t commented Jun 24, 2019

Fair enough ;) So let's get this merged quickly -- we can still improve in the future.

@ethomson ethomson force-pushed the ethomson/flaky_ci branch 3 times, most recently from b9f1625 to 1dc004e Compare June 24, 2019 21:27
ethomson added 2 commits June 24, 2019 22:54
Our online tests are occasionally flaky since they hit real network
endpoints.  Re-run them up to 5 times if they fail, to allow us to
avoid having to fail the whole build.
Our online tests are occasionally flaky since they hit real network
endpoints.  Re-run them up to 5 times if they fail, to allow us to
avoid having to fail the whole build.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Comments