Back up and restore GitLab
DETAILS: Tier: Free, Premium, Ultimate Offering: Self-managed
Your software or organization depends on the data in your GitLab instance. You need to ensure this data is protected from adverse events such as:
- Corrupted data
- Accidental deletion of data
- Ransomware attacks
- Unexpected cloud provider downtime
You can mitigate all of these risks with a disaster recovery plan that includes backups.
Back up GitLab
For detailed information on backing up GitLab, see Backup GitLab.
Restore GitLab
For detailed information on restoring GitLab, see Restore GitLab.
Migrate to a new server
For detailed information on using back up and restore to migrate to a new server, see Migrate to a new server.
Additional notes
This documentation is for GitLab Community and Enterprise Edition. We back up GitLab.com and ensure your data is secure. You can't, however, use these methods to export or back up your data yourself from GitLab.com.
Issues are stored in the database, and can't be stored in Git itself.
GitLab backup archive creation process
When working with GitLab backups, you might need to know how GitLab creates backup archives. To create backup archives, GitLab:
- If creating an incremental backup, extracts the previous backup archive and read its
backup_information.yml
file. - Updates or generates the
backup_information.yml
file. - Runs all backup sub-tasks:
-
db
to backup the GitLab PostgreSQL database (not Gitaly Cluster). -
repositories
to back up Git repositories. -
uploads
to back up attachments. -
builds
to back up CI job output logs. -
artifacts
to back up CI job artifacts. -
pages
to back up page content. -
lfs
to back up LFS objects. -
terraform_state
to back up Terraform states. -
registry
to back up container registry images. -
packages
to back up packages. -
ci_secure_files
to back up project-level secure files.
-
- Archives the backup staging area into a tar file.
- Optional. Uploads the new backup archive to object-storage.
- Cleans up backup staging directory files that are now archived.
Backup ID
Backup IDs identify individual backup archives. You need the backup ID of a backup archive if you need to restore GitLab and multiple backup archives are available.
Backup archives are saved in a directory set in backup_path
, which is specified in the config/gitlab.yml
file.
- By default, backup archives are stored in
/var/opt/gitlab/backups
. - By default, backup archive filenames are
<backup-id>_gitlab_backup.tar
where<backup-id>
identifies the time when the backup archive was created, the GitLab version, and the GitLab edition.
For example, if the archive filename is 1493107454_2018_04_25_10.6.4-ce_gitlab_backup.tar
,
the backup ID is 1493107454_2018_04_25_10.6.4-ce
.
Backup staging directory
The backup staging directory is a place to temporarily:
- Store backup artifacts on disk before the GitLab backup archive is created.
- Extract backup archives on disk before restoring a backup or creating an incremental backup.
This directory is the same directory where completed GitLab backup archives are created. When creating an untarred backup, the backup artifacts are left in this directory and no archive is created.
Example backup staging directory with untarred backup:
backups/
├── 1701728344_2023_12_04_16.7.0-pre_gitlab_backup.tar
├── 1701728447_2023_12_04_16.7.0-pre_gitlab_backup.tar
├── artifacts.tar.gz
├── backup_information.yml
├── builds.tar.gz
├── ci_secure_files.tar.gz
├── db
│ ├── ci_database.sql.gz
│ └── database.sql.gz
├── lfs.tar.gz
├── packages.tar.gz
├── pages.tar.gz
├── repositories
│ ├── manifests/
│ ├── @hashed/
│ └── @snippets/
├── terraform_state.tar.gz
└── uploads.tar.gz
backup_information.yml
file
The backup_information.yml
file saves all backup inputs that are not included in the backup itself. It includes information such as:
- The time the backup was created.
- The version of GitLab that generated the backup.
- Any options that were specified, such as skipped sub-tasks.
This information is used by some sub-tasks to determine how:
- To restore.
- To link data in the backup with external services (such as server-side repository backups).
This file is saved into the backup staging directory.
Database backups
Database backups are created and restored by a GitLab backup sub-task called db
. The database sub-task uses pg_dump
to create a SQL dump. The output of pg_dump
is piped through gzip
in order to create a compressed SQL file. This file is saved to the backup staging directory.
Repository backups
Repository backups are created and restored by a GitLab backup sub-task called repositories
. The repositories sub-task uses a Gitaly command
gitaly-backup
to create Git repository backups:
- GitLab uses its database to tell
gitaly-backup
which repositories to back up. -
gitaly-backup
then calls a series of RPCs on Gitaly to collect the repository backup data for each repository. This data is streamed into a directory structure in the GitLab backup staging directory.
sequenceDiagram
box Backup host
participant Repositories sub-task
participant gitaly-backup
end
Repositories sub-task->>+gitaly-backup: List of repositories
loop Each repository
gitaly-backup->>+Gitaly: ListRefs request
Gitaly->>-gitaly-backup: List of Git references
gitaly-backup->>+Gitaly: CreateBundleFromRefList request
Gitaly->>-gitaly-backup: Git bundle file
gitaly-backup->>+Gitaly: GetCustomHooks request
Gitaly->>-gitaly-backup: Custom hooks archive
end
gitaly-backup->>-Repositories sub-task: Success/failure
Storages configured to Gitaly Cluster are backed up the same as standalone Gitaly. When Gitaly Cluster receives the RPC calls from gitaly-backup
, it is responsible for
rebuilding its own database. This means that there is no need to backup the Gitaly Cluster database separately. Because backups operate through RPCs, each repository is only backed
up once no matter the replication factor.
Server-side repository backups
You can configure repository backups as server-side repository backups. When specified, gitaly-backup
makes a single RPC call for each repository to create the backup. This RPC
does not transmit any repository data. Instead, the RPC triggers the Gitaly node that stores that physical repository to upload the backup data directly to object-storage. Because
the data is no longer transmitted through RPCs from Gitaly, server-side backups require much less network transfer and require no disk storage on the machine that is running the
backup Rake task. The backups stored on object-storage are linked to the created backup archive by the backup ID.
sequenceDiagram
box Backup host
participant Repositories sub-task
participant gitaly-backup
end
Repositories sub-task->>+gitaly-backup: List of repositories
loop Each repository
gitaly-backup->>+Gitaly: BackupRepository request
Gitaly->>+Object-storage: Git references file
Object-storage->>-Gitaly: Success/failure
Gitaly->>+Object-storage: Git bundle file
Object-storage->>-Gitaly: Success/failure
Gitaly->>+Object-storage: Custom hooks archive
Object-storage->>-Gitaly: Success/failure
Gitaly->>+Object-storage: Backup manifest file
Object-storage->>-Gitaly: Success/failure
Gitaly->>-gitaly-backup: Success/failure
end
gitaly-backup->>-Repositories sub-task: Success/failure
File backups
The following GitLab backup sub-tasks back up files:
uploads
builds
artifacts
pages
lfs
terraform_state
registry
packages
ci_secure_files
These file sub-tasks determine a set of files within a directory specific to the task. This set of files is then passed to tar
to create an archive. This archive is piped (not saved to disk) through gzip
to save a compressed tar file to the backup staging directory.
Because backups are created from live instances, the files that tar is trying to archive can sometimes be modified while creating the backup. In this case, an alternate "copy"
strategy can be used. When this strategy is used, rsync
is first used to create a copy of the files to back up. Then, these copies are passed to tar
as usual. In this case,
the machine running the backup Rake task must have enough storage for the copied files and the compressed archive.