Repository storage
DETAILS: Tier: Free, Premium, Ultimate Offering: Self-managed
GitLab stores repositories on repository storage. Repository storage is either:
- Physical storage configured with a
gitaly_address
that points to a Gitaly node. - Virtual storage that stores repositories on a Gitaly Cluster.
WARNING:
Repository storage could be configured as a path
that points directly to the directory where the repositories are
stored. GitLab directly accessing a directory containing repositories is deprecated. You should configure GitLab to
access repositories through a physical or virtual storage.
For more information on:
- Configuring Gitaly, see Configure Gitaly.
- Configuring Gitaly Cluster, see Configure Gitaly Cluster.
Hashed storage
Hashed storage stores projects on disk in a location based on a hash of the project's ID. This makes the folder structure immutable and eliminates the need to synchronize state from URLs to disk structure. This means that renaming a group, user, or project:
- Costs only the database transaction.
- Takes effect immediately.
The hash also helps spread the repositories more evenly on the disk. The top-level directory contains fewer folders than the total number of top-level namespaces.
The hash format is based on the hexadecimal representation of a SHA256, calculated with
SHA256(project.id)
. The top-level folder uses the first two characters, followed by another folder
with the next two characters. They are both stored in a special @hashed
folder so they can
co-exist with existing legacy storage projects. For example:
# Project's repository:
"@hashed/#{hash[0..1]}/#{hash[2..3]}/#{hash}.git"
# Wiki's repository:
"@hashed/#{hash[0..1]}/#{hash[2..3]}/#{hash}.wiki.git"
Translate hashed storage paths
Troubleshooting problems with the Git repositories, adding hooks, and other tasks requires you translate between the human-readable project name and the hashed storage path. You can translate:
- From a project's name to its hashed path.
- From a hashed path to a project's name.
From project name to hashed path
Administrators can look up a project's hashed path from its name or ID using:
- The Admin Area.
- A Rails console.
To look up a project's hash path in the Admin Area:
-
On the left sidebar, at the bottom, select Admin Area.
-
Select Overview > Projects and select the project.
-
Locate the Relative path field. The value is similar to:
"@hashed/b1/7e/b17ef6d19c7a5b1ee83b907c595526dcb1eb06db8227d650d5dda0a9f4ce8cd9.git"
To look up a project's hash path using a Rails console:
-
Start a Rails console.
-
Run a command similar to this example (use either the project's ID or its name):
Project.find(16).disk_path Project.find_by_full_path('group/project').disk_path
From hashed path to project name
Administrators can look up a project's name from its hashed relative path using:
- A Rails console.
- The
config
file in the*.git
directory.
To look up a project's name using the Rails console:
-
Start a Rails console.
-
Run a command similar to this example:
ProjectRepository.find_by(disk_path: '@hashed/b1/7e/b17ef6d19c7a5b1ee83b907c595526dcb1eb06db8227d650d5dda0a9f4ce8cd9').project
The quoted string in that command is the directory tree you can find on your GitLab server. For
example, on a default Linux package installation this would be /var/opt/gitlab/git-data/repositories/@hashed/b1/7e/b17ef6d19c7a5b1ee83b907c595526dcb1eb06db8227d650d5dda0a9f4ce8cd9.git
with .git
from the end of the directory name removed.
The output includes the project ID and the project name. For example:
=> #<Project id:16 it/supportteam/ticketsystem>
To look up a project's name using the config
file in the *.git
directory:
- Locate the
*.git
directory. This directory is located in/var/opt/gitlab/git-data/repositories/@hashed/
, where the first four characters of the hash are the first two directories in the path under@hashed/
. For example, on a default Linux package installation the*.git
directory of the hashb17eb17ef6d19c7a5b1ee83b907c595526dcb1eb06db8227d650d5dda0a9f4ce8cd9
would be/var/opt/gitlab/git-data/repositories/@hashed/b1/7e/b17ef6d19c7a5b1ee83b907c595526dcb1eb06db8227d650d5dda0a9f4ce8cd9.git
. - Open the
config
file and locate thefullpath=
key under[gitlab]
.
Hashed object pools
Object pools are repositories used to deduplicate forks of public and internal projects and
contain the objects from the source project. Using objects/info/alternates
, the source project and
forks use the object pool for shared objects. For more information, see
How Git object deduplication works in GitLab.
Objects are moved from the source project to the object pool when housekeeping is run on the source
project. Object pool repositories are stored similarly to regular repositories in a directory called @pools
instead of @hashed
# object pool paths
"@pools/#{hash[0..1]}/#{hash[2..3]}/#{hash}.git"
WARNING:
Do not run git prune
or git gc
in object pool repositories, which are stored in the @pools
directory.
This can cause data loss in the regular repositories that depend on the object pool.
Group wiki storage
Unlike project wikis that are stored in the @hashed
directory, group wikis are stored in a directory called @groups
.
Like project wikis, group wikis follow the hashed storage folder convention, but use a hash of the group ID rather than the project ID.
For example:
# group wiki paths
"@groups/#{hash[0..1]}/#{hash[2..3]}/#{hash}.wiki.git"
Gitaly Cluster storage
If Gitaly Cluster is used, Praefect manages storage locations. The internal path used by Praefect for the repository differs from the hashed path. For more information, see Praefect-generated replica paths.
Object storage support
This table shows which storable objects are storable in each storage type:
Storable object | Hashed storage | S3 compatible |
---|---|---|
Repository | Yes | - |
Attachments | Yes | - |
Avatars | No | - |
Pages | No | - |
Docker Registry | No | - |
CI/CD job logs | No | - |
CI/CD artifacts | No | Yes |
CI/CD cache | No | Yes |
LFS objects | Similar | Yes |
Repository pools | Yes | - |
Files stored in an S3-compatible endpoint can have the same advantages as
hashed storage, as long as they are not prefixed with
#{namespace}/#{project_name}
. This is true for CI/CD cache and LFS objects.
Avatars
Each file is stored in a directory that matches the id
assigned to it in the database. The
filename is always avatar.png
for user avatars. When an avatar is replaced, the Upload
model is
destroyed and a new one takes place with a different id
.
CI/CD artifacts
CI/CD artifacts are S3-compatible.
LFS objects
LFS Objects in GitLab implement a similar storage pattern using two characters and two-level folders, following the Git implementation:
"shared/lfs-objects/#{oid[0..1}/#{oid[2..3]}/#{oid[4..-1]}"
# Based on object `oid`: `8909029eb962194cfb326259411b22ae3f4a814b5be4f80651735aeef9f3229c`, path will be:
"shared/lfs-objects/89/09/029eb962194cfb326259411b22ae3f4a814b5be4f80651735aeef9f3229c"
LFS objects are also S3-compatible.
Configure where new repositories are stored
After you configure multiple repository storages, you can choose where new repositories are stored:
- On the left sidebar, at the bottom, select Admin Area.
- Select Settings > Repository.
- Expand Repository storage.
- Enter values in the Storage nodes for new repositories fields.
- Select Save changes.
Each repository storage path can be assigned a weight from 0-100. When a new project is created, these weights are used to determine the storage location the repository is created on.
The higher the weight of a given repository storage path relative to other repository storages
paths, the more often it is chosen ((storage weight) / (sum of all weights) * 100 = chance %
).
By default, if repository weights have not been configured earlier:
-
default
is weighted100
. - All other storages are weighted
0
.
NOTE:
If all storage weights are 0
(for example, when default
does not exist), GitLab attempts to
create new repositories on default
, regardless of the configuration or if default
exists.
See the tracking issue for more information.
Move repositories
To move a repository to a different repository storage (for example, from default
to storage2
), use the
same process as migrating to Gitaly Cluster.