Skip to content

Runner lambdas v20221019-100329

Compare
Choose a tag to compare
@github-actions github-actions released this 19 Oct 10:04
7a7c4ea
FIX: Don't remove EC2 instance when fails to remove githubRunner (#904)

`removeGithubRunner[Org || Repo]` used to remove the EC2 instance, so no
need to call `terminateRunner` again. This potentially could cause
runners that failed to be unregistered from GHA to be terminated on EC2.

As a fix, `removeGithubRunner` won't terminate the instance, nor
generate logs. This will enable `scaleDown` to control when to call
`terminateRunner` and generate the proper logs and metrics. Avoiding
having this issue in the future.

This bug also explains why we had in the past more EC2 instances being
kept at its minimum time: instances with less than minimum time got
unregistered and terminated without being tracked on main application
metric. This is obvious when we compare the API calls to terminate and
the count of app level termination.

![Screenshot 2022-10-18 at 09 21
47](https://user-images.githubusercontent.com/4520845/196364535-5aaab331-2080-44be-b6af-0702f99d50d9.png)
![Screenshot 2022-10-18 at 09 26
19](https://user-images.githubusercontent.com/4520845/196364542-376ff99f-617e-4e82-b459-dfc8364219ad.png)

Bug initially flagged on
[87134](https://github.com/pytorch/pytorch/issues/87134)