[Meta] Observations during the 8.16.0 release
I've created this as a new list of things that went wrong and can be improved - Very similar to Observations during the 8.15.2 release
https://gitlab.com/gitlab-com/organization/issues/1 still hold true or have worsened in this release:
Some of the things in- "We entered the after Christmas week..." While the release was after Christmas, due to the summit taking place, most of the release work had to be done in EMEA time and the RM and RM trainee were not in the same TZ during that period.
- We had couple of merge requests with the label without the milestone.
This situation got worse and I've found a total of 18
with either the milestone or pick into stable
missing. This is ignoring others where the pick into stable
and milestone was set, but shouldn't be (in cases where there was a revert already or similar situations).
- CE EE specs take around an hour to finish, maybe more. Definitely more, recently. And if one fails, that may add another hour of waiting.
- Deploy to staging takes time
- Package installation is slow during the unpack phase
- Some workers did not restart Unicorn correctly so manual restart was necessary
Other things were fixed, such as mailroom
that no longer needs to be manually killed for a deployment.
Other things that we can improve:
- Last minute MRs the day before the release, and the day of the release. Some of them were to fix bugs that were not regressions or not that critical. I think in the last 3 days we should focus more on regressions. Some of the fixes caused other problems too. Something like Don't merge master into the stable branch after RC1 created will definitely help.
- Tests in master CE and master in EE were broken the day before the release. Mainly due to the MWPS bug (that merged some MRs automatically). I know this could be difficult to spot if you have a lot of noise in your inbox, etc...
- Some features were merged too late (just about the time when we required setting the
pick into stable
label). This caused new issues with no much time to fix them, also, not many people were available since this happened during the weekend. - Better coordination between the EMEA & Americas RMs is needed. This was also a special case since we had the summit, but there was a change in RM and in the end, the EMEA/APAC team did most of the work. This meant we had to work long hours most of the days to get the release ready.
- The CHANGELOG for the security releases was in
dev
master
- we had to rebase and drop the commits so we wouldn't push this info to the release, which wasn't a security release nor it was announced. - There were problems syncing
dev
andgithub
as they were not up to date by around 3 days. - I had to leave a note in issues that were merged with the task
EE compat check
failing. Meaning there was no EE counterpart and some of the conflicts weren't straight forward. This is worse during the weekend when there are fewer people available to fix those. This should hopefully improve with https://gitlab.com/gitlab-org/gitlab-ee/issues/1505 and https://gitlab.com/gitlab-org/gitlab-ce/issues/25932 - Other deployment issues that happened in RC3: Deployment of 8.16.0 RC3 post mortem