Zero downtime upgrades
DETAILS: Tier: Free, Premium, Ultimate Offering: Self-managed
It's possible to upgrade to a newer major, minor, or patch version of GitLab without having to take your GitLab instance offline. However, for this to work there are the following requirements:
- You can only upgrade one minor release at a time. So from 13.1 to 13.2, not to 13.3. If you skip releases, database modifications may be run in the wrong sequence and leave the database schema in a broken state.
- You have to use post-deployment migrations.
- You are using PostgreSQL. Starting from GitLab 12.1, MySQL is not supported.
- You have set up a multi-node GitLab instance. Cloud Native Hybrid installations do not support zero-downtime upgrades.
If you want to upgrade multiple releases or do not meet the other requirements:
If you meet all the requirements above, follow these instructions in order. There are three sets of steps, depending on your deployment type:
Deployment type | Description |
---|---|
Gitaly or Gitaly Cluster | GitLab CE/EE using HA architecture for Gitaly or Gitaly Cluster |
Multi-node / PostgreSQL HA | GitLab CE/EE using HA architecture for PostgreSQL |
Multi-node / Redis HA | GitLab CE/EE using HA architecture for Redis |
Geo | GitLab EE with Geo enabled |
Multi-node / HA with Geo | GitLab CE/EE on multiple nodes |
Each type of deployment requires that you hot reload the puma
and sidekiq
processes on all nodes running these
services after you've upgraded. The reason for this is that those processes each load the GitLab Rails application which reads and loads
the database schema into memory when starting up. Each of these processes must be reloaded (or restarted in the case of sidekiq
)
to re-read any database changes that have been made by post-deployment migrations.
Most of the time you can safely upgrade from a patch release to the next minor release if the patch release is not the latest. For example, upgrading from 14.1.1 to 14.2.0 should be safe even if 14.1.2 has been released. We do recommend you check the release posts of any releases between your current and target version just in case they include any migrations that may require you to upgrade one release at a time.
We also recommend you verify the version specific upgrading instructions relevant to your upgrade path.
Some releases may also include so called "background migrations". These migrations are performed in the background by Sidekiq and are often used for migrating data. Background migrations are only added in the monthly releases.
Certain major/minor releases may require a set of background migrations to be
finished. To guarantee this, such a release processes any remaining jobs
before continuing the upgrading procedure. While this doesn't require downtime
(if the above conditions are met) we require that you
wait for background migrations to complete
between each major/minor release upgrade.
The time necessary to complete these migrations can be reduced by
increasing the number of Sidekiq workers that can process jobs in the
background_migration
queue. To see the size of this queue,
Check for background migrations before upgrading.
As a guideline, any database smaller than 10 GB doesn't take too much time to upgrade; perhaps an hour at most per minor release. Larger databases however may require more time, but this is highly dependent on the size of the database and the migrations that are being performed.
To help explain this, let's look at some examples:
Example 1: You are running a large GitLab installation using version 13.4.2, which is the latest patch release of 13.4. When GitLab 13.5.0 is released this installation can be safely upgraded to 13.5.0 without requiring downtime if the requirements mentioned above are met. You can also skip 13.5.0 and upgrade to 13.5.1 after it's released, but you can not upgrade straight to 13.6.0; you have to first upgrade to a 13.5.Z release.
Example 2: You are running a large GitLab installation using version 13.4.2, which is the latest patch release of 13.4. GitLab 13.5 includes some background migrations, and 14.0 requires these to be completed (processing any remaining jobs for you). Skipping 13.5 is not possible without downtime, and due to the background migrations would require potentially hours of downtime depending on how long it takes for the background migrations to complete. To work around this you have to upgrade to 13.5.Z first, then wait at least a week before upgrading to 14.0.
Example 3: You use MySQL as the database for GitLab. Any upgrade to a new major/minor release requires downtime. If a release includes any background migrations this could potentially lead to hours of downtime, depending on the size of your database. To work around this you must use PostgreSQL and meet the other online upgrade requirements mentioned above.
Multi-node / HA deployment
WARNING: You can only upgrade one minor release at a time. So from 15.6 to 15.7, not to 15.8. If you attempt more than one minor release, the upgrade may fail.
Use a load balancer in front of web (Puma) nodes
With Puma, single node zero-downtime updates are no longer possible. To achieve HA with zero-downtime updates, at least two nodes are required to be used with a load balancer which distributes the connections properly across both nodes.
The load balancer in front of the application nodes must be configured to check
proper health check endpoints to check if the service is accepting traffic or
not. For Puma, the /-/readiness
endpoint should be used, while
/readiness
endpoint can be used for Sidekiq and other services.
Upgrades on web (Puma) nodes must be done in a rolling manner, one after another, ensuring at least one node is always up to serve traffic. This is required to ensure zero-downtime.
Puma enters a blackout period as part of the upgrade, during which nodes continue to accept connections but mark their respective health check endpoints to be unhealthy. On seeing this, the load balancer should disconnect them gracefully.
Puma restarts only after completing all the currently-processing requests. This ensures data and service integrity. Once they have restarted, the health check end points are marked healthy.
The nodes must be updated in the following order to update an HA instance using load balancer to latest GitLab version.
-
Select one application node as a deploy node and complete the following steps on it:
-
Create an empty file at
/etc/gitlab/skip-auto-reconfigure
. This prevents upgrades from runninggitlab-ctl reconfigure
, which by default automatically stops GitLab, runs all database migrations, and restarts GitLab:sudo touch /etc/gitlab/skip-auto-reconfigure
-
Get the regular migrations and latest code in place. Before running this step, the deploy node's
/etc/gitlab/gitlab.rb
configuration file must havegitlab_rails['auto_migrate'] = true
to permit regular migrations.sudo SKIP_POST_DEPLOYMENT_MIGRATIONS=true gitlab-ctl reconfigure
-
Ensure services use the latest code:
sudo gitlab-ctl hup puma sudo gitlab-ctl restart sidekiq
-
-
Complete the following steps on the other Puma/Sidekiq nodes, one after another. Always ensure at least one of such nodes is up and running, and connected to the load balancer before proceeding to the next node.
-
Update the GitLab package and ensure a
reconfigure
is run as part of it. If not (due to/etc/gitlab/skip-auto-reconfigure
file being present), runsudo gitlab-ctl reconfigure
manually. -
Ensure services use latest code:
sudo gitlab-ctl hup puma sudo gitlab-ctl restart sidekiq
-
-
On the deploy node, run the post-deployment migrations:
sudo gitlab-rake db:migrate
Gitaly or Gitaly Cluster
Gitaly nodes can be located on their own server, either as part of a sharded setup, or as part of Gitaly Cluster.
Before you update the main GitLab application you must (in order):
- Upgrade the Gitaly nodes that reside on separate servers.
- Upgrade Praefect if using Gitaly Cluster.
Because of a known issue, Gitaly and Gitaly Cluster upgrades cause some downtime.
Upgrade Gitaly nodes
Upgrade the GitLab package on the Gitaly nodes one at a time to ensure access to Git repositories is maintained.
Upgrade Praefect
From the Praefect nodes, select one to be your Praefect deploy node. You install the new Omnibus package on the deploy node first and run database migrations.
-
On the Praefect deploy node:
-
Create an empty file at
/etc/gitlab/skip-auto-reconfigure
. This prevents upgrades from runninggitlab-ctl reconfigure
, which by default automatically stops GitLab, runs all database migrations, and restarts GitLab:sudo touch /etc/gitlab/skip-auto-reconfigure
-
Ensure that
praefect['auto_migrate'] = true
is set in/etc/gitlab/gitlab.rb
.
-
-
On all remaining Praefect nodes, ensure that
praefect['auto_migrate'] = false
is set in/etc/gitlab/gitlab.rb
to preventreconfigure
from automatically running database migrations. -
On the Praefect deploy node:
-
To apply the Praefect database migrations and restart Praefect, run:
sudo gitlab-ctl reconfigure
-
On all remaining Praefect nodes:
-
Ensure nodes are running the latest code:
sudo gitlab-ctl reconfigure
PostgreSQL
Pick a node to be the Deploy Node
. It can be any application node, but it must be the same
node throughout the process.
Deploy node
-
Create an empty file at
/etc/gitlab/skip-auto-reconfigure
. This prevents upgrades from runninggitlab-ctl reconfigure
, which by default automatically stops GitLab, runs all database migrations, and restarts GitLab.sudo touch /etc/gitlab/skip-auto-reconfigure
All nodes including the Deploy node
- To prevent
reconfigure
from automatically running database migrations, ensure thatgitlab_rails['auto_migrate'] = false
is set in/etc/gitlab/gitlab.rb
.
PostgreSQL only nodes
-
Ensure nodes are running the latest code
sudo gitlab-ctl reconfigure
Deploy node
-
If you're using PgBouncer:
You must bypass PgBouncer and connect directly to the database leader before running migrations.
Rails uses an advisory lock when attempting to run a migration to prevent concurrent migrations from running on the same database. These locks are not shared across transactions, resulting in
ActiveRecord::ConcurrentMigrationError
and other issues when running database migrations using PgBouncer in transaction pooling mode.To find the leader node, run the following on a database node:
sudo gitlab-ctl patroni members
Then, in your
gitlab.rb
file on the deploy node, updategitlab_rails['db_host']
andgitlab_rails['db_port']
with the database leader's host and port. -
To get the regular database migrations and latest code in place, run
sudo gitlab-ctl reconfigure sudo SKIP_POST_DEPLOYMENT_MIGRATIONS=true gitlab-rake db:migrate
All nodes excluding the Deploy node
-
Ensure nodes are running the latest code
sudo gitlab-ctl reconfigure
Deploy node
-
Run post-deployment database migrations on deploy node to complete the migrations with
sudo gitlab-rake db:migrate
For nodes that run Puma or Sidekiq
-
Hot reload
puma
andsidekiq
servicessudo gitlab-ctl hup puma sudo gitlab-ctl restart sidekiq
-
If you're using PgBouncer:
Change your
gitlab.rb
to point back to PgBouncer and run:sudo gitlab-ctl reconfigure
If you do not want to run zero downtime upgrades in the future, make
sure you remove /etc/gitlab/skip-auto-reconfigure
and revert
setting gitlab_rails['auto_migrate'] = false
in
/etc/gitlab/gitlab.rb
after you've completed these steps.
Redis HA (using Sentinel)
DETAILS: Tier: Premium, Ultimate Offering: Self-managed
Package upgrades may involve version updates to the bundled Redis service. On instances using Redis for scaling, upgrades must follow a proper order to ensure minimum downtime, as specified below. This doc assumes the official guides are being followed to setup Redis HA.
In the application node
According to official Redis documentation, the easiest way to update an HA instance using Sentinel is to upgrade the secondaries one after the other, perform a manual failover from current primary (running old version) to a recently upgraded secondary (running a new version), and then upgrade the original primary. For this, we must know the address of the current Redis primary.
-
If your application node is running GitLab 12.7.0 or later, you can use the following command to get address of current Redis primary
sudo gitlab-ctl get-redis-master
-
If your application node is running a version older than GitLab 12.7.0, you have to run the underlying
redis-cli
command (whichget-redis-master
command uses) to fetch information about the primary.-
Get the address of one of the sentinel nodes specified as
gitlab_rails['redis_sentinels']
in/etc/gitlab/gitlab.rb
-
Get the Redis main name specified as
redis['master_name']
in/etc/gitlab/gitlab.rb
-
Run the following command
sudo /opt/gitlab/embedded/bin/redis-cli -h <sentinel host> -p <sentinel port> SENTINEL get-master-addr-by-name <redis master name>
-
In the Redis secondary nodes
-
Set
gitlab_rails['rake_cache_clear'] = false
ingitlab.rb
if you haven't already. If not, you might receive the errorRedis::CommandError: READONLY You can't write against a read only replica.
during the reconfigure post installation of new package. -
Install package for new version.
-
Run
sudo gitlab-ctl reconfigure
, if a reconfigure is not run as part of installation (due to/etc/gitlab/skip-auto-reconfigure
file being present). -
If reconfigure warns about a pending Redis/Sentinel restart, restart the corresponding service
sudo gitlab-ctl restart redis sudo gitlab-ctl restart sentinel
In the Redis primary node
Before upgrading the Redis primary node, we must perform a failover so that one of the recently upgraded secondary nodes becomes the new primary. After the failover is complete, we can go ahead and upgrade the original primary node.
-
Stop Redis service in Redis primary node so that it fails over to a secondary node
sudo gitlab-ctl stop redis
-
Wait for failover to be complete. You can verify it by periodically checking details of the current Redis primary node (as mentioned above). If it starts reporting a new IP, failover is complete.
-
Start Redis again in that node, so that it starts following the current primary node.
sudo gitlab-ctl start redis
-
Install package corresponding to new version.
-
Run
sudo gitlab-ctl reconfigure
, if a reconfigure is not run as part of installation (due to/etc/gitlab/skip-auto-reconfigure
file being present). -
If reconfigure warns about a pending Redis/Sentinel restart, restart the corresponding service
sudo gitlab-ctl restart redis sudo gitlab-ctl restart sentinel
Update the application node
Install the package for new version and follow regular package upgrade procedure.
Geo deployment
DETAILS: Tier: Premium, Ultimate Offering: Self-managed
WARNING: You can only upgrade one minor release at a time.
The order of steps is important. While following these steps, make sure you follow them in the right order, on the correct node.
Update the Geo primary site
Log in to your primary node, executing the following:
-
Create an empty file at
/etc/gitlab/skip-auto-reconfigure
. This prevents upgrades from runninggitlab-ctl reconfigure
, which by default automatically stops GitLab, runs all database migrations, and restarts GitLab:sudo touch /etc/gitlab/skip-auto-reconfigure
-
Edit
/etc/gitlab/gitlab.rb
and ensure the following is present:gitlab_rails['auto_migrate'] = false
-
Reconfigure GitLab:
sudo gitlab-ctl reconfigure
-
To get the database migrations and latest code in place, run:
sudo gitlab-ctl reconfigure
-
After the node is updated and reconfigure finished successfully, complete the migrations:
sudo SKIP_POST_DEPLOYMENT_MIGRATIONS=true gitlab-rake db:migrate
-
Copy the
/etc/gitlab/gitlab-secrets.json
file from the primary site to the secondary site if they're different. The file must be the same on all of a site's nodes.
Update the Geo secondary site
On each secondary node, executing the following:
-
Create an empty file at
/etc/gitlab/skip-auto-reconfigure
. This prevents upgrades from runninggitlab-ctl reconfigure
, which by default automatically stops GitLab, runs all database migrations, and restarts GitLab.sudo touch /etc/gitlab/skip-auto-reconfigure
-
Edit
/etc/gitlab/gitlab.rb
and ensure the following is present:gitlab_rails['auto_migrate'] = false
-
Reconfigure GitLab:
sudo gitlab-ctl reconfigure
-
To get the database migrations and latest code in place, run:
sudo gitlab-ctl reconfigure
-
Run post-deployment database migrations, specific to the Geo database:
sudo gitlab-rake db:migrate:geo
Finalize the update
After all secondary nodes are updated, finalize the update on the primary node:
-
Run post-deployment database migrations
sudo gitlab-rake db:migrate
-
After the update is finalized on the primary node, hot reload
puma
and restartsidekiq
andgeo-logcursor
services on all primary and secondary nodes:sudo gitlab-ctl hup puma sudo gitlab-ctl restart sidekiq sudo gitlab-ctl restart geo-logcursor
After updating all nodes (both primary and all secondaries), check their status:
-
Verify Geo configuration and dependencies
sudo gitlab-rake gitlab:geo:check
If you do not want to run zero downtime upgrades in the future, make
sure you remove /etc/gitlab/skip-auto-reconfigure
and revert
setting gitlab_rails['auto_migrate'] = false
in
/etc/gitlab/gitlab.rb
after you've completed these steps.
Multi-node / HA deployment with Geo
DETAILS: Tier: Premium, Ultimate Offering: Self-managed
WARNING: You can only upgrade one minor release at a time. You also must first start with the Gitaly cluster, updating Gitaly one node one at a time. This will ensure access to the Git repositories for the remainder of the upgrade process.
This section describes the steps required to upgrade a multi-node / HA deployment with Geo. Some steps must be performed on a particular node. This node is known as the "deploy node" and is noted through the following instructions.
Updates must be performed in the following order:
- Update Geo primary multi-node deployment.
- Update Geo secondary multi-node deployments.
- Post-deployment migrations and checks.
Step 1: Choose a "deploy node" for each deployment
You now must choose:
- One instance for use as the primary "deploy node" on the Geo primary multi-node deployment.
- One instance for use as the secondary "deploy node" on each Geo secondary multi-node deployment.
Deploy nodes must be configured to be running Puma or Sidekiq or the geo-logcursor
daemon. In order
to avoid any downtime, they must not be in use during the update:
-
If running Puma remove the deploy node from the load balancer.
-
If running Sidekiq, ensure the deploy node is not processing jobs:
sudo gitlab-ctl stop sidekiq
-
If running
geo-logcursor
daemon, ensure the deploy node is not processing events:sudo gitlab-ctl stop geo-logcursor
For zero-downtime, Puma, Sidekiq, and geo-logcursor
must be running on other nodes during the update.
Step 2: Update the Geo primary multi-node deployment
On all primary nodes including the primary "deploy node"
- Create an empty file at
/etc/gitlab/skip-auto-reconfigure
. This prevents upgrades from runninggitlab-ctl reconfigure
, which by default automatically stops GitLab, runs all database migrations, and restarts GitLab.
sudo touch /etc/gitlab/skip-auto-reconfigure
-
To prevent
reconfigure
from automatically running database migrations, ensure thatgitlab_rails['auto_migrate'] = false
is set in/etc/gitlab/gitlab.rb
. -
Ensure nodes are running the latest code
sudo gitlab-ctl reconfigure
On primary Gitaly only nodes
-
Ensure nodes are running the latest code
sudo gitlab-ctl reconfigure
On the primary "deploy node"
-
If you're using PgBouncer:
You must bypass PgBouncer and connect directly to the database leader before running migrations.
Rails uses an advisory lock when attempting to run a migration to prevent concurrent migrations from running on the same database. These locks are not shared across transactions, resulting in
ActiveRecord::ConcurrentMigrationError
and other issues when running database migrations using PgBouncer in transaction pooling mode.To find the leader node, run the following on a database node:
sudo gitlab-ctl patroni members
Then, in your
gitlab.rb
file on the deploy node, updategitlab_rails['db_host']
andgitlab_rails['db_port']
with the database leader's host and port. -
To get the regular database migrations and latest code in place, run
sudo gitlab-ctl reconfigure sudo SKIP_POST_DEPLOYMENT_MIGRATIONS=true gitlab-rake db:migrate
-
If this deploy node is used to serve requests or process jobs, then you may return it to service at this point.
-
To serve requests, add the deploy node to the load balancer.
-
To process Sidekiq jobs again, start Sidekiq:
sudo gitlab-ctl start sidekiq
-
On all primary nodes excluding the primary "deploy node"
-
Ensure nodes are running the latest code
sudo gitlab-ctl reconfigure
For all primary nodes that run Puma or Sidekiq including the primary "deploy node"
Hot reload puma
and sidekiq
services:
sudo gitlab-ctl hup puma
sudo gitlab-ctl restart sidekiq
- Copy the
/etc/gitlab/gitlab-secrets.json
file from the primary site to the secondary site if they're different. The file must be the same on all of a site's nodes.
Step 3: Update each Geo secondary multi-node deployment
Only proceed if you have successfully completed all steps on the Geo primary multi-node deployment.
On all secondary nodes including the secondary "deploy node"
- Create an empty file at
/etc/gitlab/skip-auto-reconfigure
. This prevents upgrades from runninggitlab-ctl reconfigure
, which by default automatically stops GitLab, runs all database migrations, and restarts GitLab.
sudo touch /etc/gitlab/skip-auto-reconfigure
-
To prevent
reconfigure
from automatically running database migrations, ensure thatgeo_secondary['auto_migrate'] = false
is set in/etc/gitlab/gitlab.rb
. -
Ensure nodes are running the latest code
sudo gitlab-ctl reconfigure
On secondary Gitaly only nodes
-
Ensure nodes are running the latest code
sudo gitlab-ctl reconfigure
On the secondary "deploy node"
-
To get the regular database migrations and latest code in place, run
sudo gitlab-ctl reconfigure sudo SKIP_POST_DEPLOYMENT_MIGRATIONS=true gitlab-rake db:migrate:geo
-
If this deploy node is used to serve requests or perform background processing, then you may return it to service at this point.
-
To serve requests, add the deploy node to the load balancer.
-
To process Sidekiq jobs again, start Sidekiq:
sudo gitlab-ctl start sidekiq
-
To process Geo events again, start the
geo-logcursor
daemon:sudo gitlab-ctl start geo-logcursor
-
On all secondary nodes excluding the secondary "deploy node"
-
Ensure nodes are running the latest code
sudo gitlab-ctl reconfigure
For all secondary nodes that run Puma, Sidekiq, or the geo-logcursor
daemon including the secondary "deploy node"
Hot reload puma
, sidekiq
and geo-logcursor
services:
sudo gitlab-ctl hup puma
sudo gitlab-ctl restart sidekiq
sudo gitlab-ctl restart geo-logcursor
Step 4: Run post-deployment migrations and checks
On the primary "deploy node"
-
Run post-deployment database migrations:
sudo gitlab-rake db:migrate
-
Verify Geo configuration and dependencies
sudo gitlab-rake gitlab:geo:check
-
If you're using PgBouncer:
Change your
gitlab.rb
to point back to PgBouncer and run:sudo gitlab-ctl reconfigure
On all secondary "deploy nodes"
-
Run post-deployment database migrations, specific to the Geo database:
sudo gitlab-rake db:migrate:geo
-
Verify Geo configuration and dependencies
sudo gitlab-rake gitlab:geo:check
-
Verify Geo status
sudo gitlab-rake geo:status