Understanding Gitlab Pipelines - Creating CI/CD Flow

CI and CD are concepts that have been around for a long time, but the specifics of implementing them often can be murky. Continuous integration and delivery are critical components of modern software development, and GitLab has some fantastic features for creating pipelines. Gitlab documentation is excellent, but when I learn, I need an understanding of fundamental principles - it allows me to build knowledge upon it. This article will cover what GitLab CI/CD pipelines do and look at some of the best practices you should keep in mind when setting up your own continuous integration or delivery pipeline.

The project is agnostic when it comes to the language or technology used in the project so that anyone can read it.

CI/CD (Continuous Integration/Continuous Delivery)

It is impossible to create a CI/CD pipeline on our local machine.

Continuous integration won’t work because everything would have to be integrated by you, whenever you decide to pull changes from the remote repository and trigger validating commands, e.g., running unit tests.

Stories on software engineering straight to your inbox

Continuous deployment also won’t work because you will have to trigger all deployment tasks manually, which is very risky. After all, the environment where you run commands may change during the time, and build results may differ.

It would be best if you simply had some environment that is predictable and where all jobs can be done. Yes, it is GitLab.

Gitlab is a DevOps platform. It is not just a git repository but also a set of tools that allows you to run unit tests, execute a build job, deploy an application, and many, many more. In short words, it will enable you to configure CI/CD pipelines.

Since it is a web application, it may be accessed by any team member, which allows fruitful collaboration, which is necessary for the software development lifecycle.

To define CI/CD pipeline, you need to create in your repository a .gitlab-ci.yml file located in the root directory of the project that illustrates pipeline configuration, including jobs and pipeline stages. It is a yml file, which GitLab interprets.

The pipeline consists of stages and jobs. As the name suggests, a job is a particular job to do, e.g., building your application. Jobs that can run at the same time are grouped into the same stage. Look at the example pipeline. There are four stages: Test stage, Build stage, Image stage, and Deploy stage. Each of them runs its own tasks, ex. Test contains jobs: Test API and Test UI.

More on CI/CD pipelines here: https://docs.gitlab.com/ee/ci/introduction/#gitlab-cicd-workflow

Gitlab Pipelines Explained

Jobs in pipelines depend on each other. A pipeline job may require some input, produce some output, use a cache, and need some configuration.

Suppose you want to run pipelines on your own. In that case, you need a better understanding of artifacts and cache, which allows jobs to share data between jobs, as well as environment variables that will enable jobs to work according to provided configuration.

Artifacts

To deploy an application, you probably need docker images. To create docker images, you need to build your application. To build your application, you have to be sure that unit tests pass, etc.

If you preview the dependency graph for any GitLab project, you may notice that jobs depend on each other. Defining such dependencies is required because a single job doesn’t do the whole job on its own.

Job needs some input from a previous job and produces some output for consecutive jobs. - this data, which is passed between jobs, is called an artifact.

Using artifacts allows a job to prepare a small portion of data, archive it and pass it to another job. Thanks to that, jobs have low complexity because they are responsible only for a single thing.

More on artifacts here: https://docs.gitlab.com/ee/ci/pipelines/job_artifacts.html

Cache

For optimization purposes, you may use cache. The cache is excellent for downloading external dependencies like 3rd-party libraries.

If your list of dependencies has not changed, there is no need to download dependencies for each test run or build. In contrast to artifacts, the cache is shared between pipelines. More on caching here: https://docs.gitlab.com/ee/ci/caching/

CI/CD Variables

The repository is accessed by all team members, so not everything should be stored there, I mean secrets ex.: credentials, connections strings, etc.

To protect such sensitive data, in GitLab, you can define environment variables that may be accessed only by project maintainers. Variables may be accessed in the .gitlab-ci.yml file using the variable key, without exposing its value.

Additionally, you may also want to use information about your build environment, e.g., allow running deploy jobs only on the main branch, etc. To do so, you can use some predefined variables.

Gitlab CI/CD variables are a great solution that allows you to customize your CI/CD pipelines and protect your secrets.

More details on CI/CD variables: https://docs.gitlab.com/ee/ci/variables/

How Does Gitlab Do Its Job?

At first glance, GitLab pipelines may look like a black box, but it is a little less scary if you understand how it uses Artifacts, Cache, and CI/CD.

Probably you know how to compile, run tests or build your project on a local machine. You also know how to build and run docker images on your machine.

How is it possible that a web application like Gitlab executes such a tremendous job?

The response is Gitlab Runners & Gitlab Executors.

How Does Gitlab Runner Work?

Check this series

Gitlab Runner is an open-sourced service written in Go that is responsible for running your pipeline. It communicates with Gitlab and delegates jobs to executors.

Each runner, when it becomes available, sends requests to the GitLab instance, asking to be assigned jobs.

GitLab, when asked, is responsible for dividing work between runners. Still, most of the effort is done by runners and executors, which is good because it allows sharing the workload between multiple servers.

Interesting fact. In the beginning, runners were written in Ruby, which requires Ruby with its dependencies. It was heavy.

In 2015 Kamil Trzciński created his own runner written in Go as a side task, which is great for multi-tasking and parallelization. GitLab noticed this brilliant solution, and right now, it is a default runner used by Gitlab. You can learn more about it from Kamil’s presentation on Youtube. Kamil works for GitLab now.

Are GitLab Runners Safe?

As a default, you may want to use shared runners provided by GitLab. You may be afraid that using runners installed on servers managed by GitLab may be risky because your source code may leak. This is a reasonable concern.

But, instead of using shared runners, you can use your own runners installed on your machine. It is a better solution due to performance and security reasons.

You can register several runners and use them the whole time without usage limitations defined by GitLab, which right now is 400 minutes in the free tier. It means that you can collaborate with your team members without any unwanted interruptions, which is necessary for continuous integration.

For beginners, I recommend using your own GitLab runner installed as a docker container because it is great for fast prototyping.

What is also essential, Gitlab CI/CD runs on the machines of your choice. This means that the whole source code is downloaded to the machine managed by you. For sure, runners managed by GitLab are secure, but still - you never know.

What Are Gitlab Executors?

An executor is a service that receives assignments from the runner and executes jobs defined in .gitlab-ci.yml. Several types of executors allow you to select an environment where the job is executed. The simplest one is shell executor, which uses a shell on the machine where the runner is installed - it means that it may be your laptop. Unfortunately, shell executor does not guarantee a clean environment for each job and requires manual installation of the necessary software.

I recommend using the docker executor for beginners, which guarantees a new environment for each pipeline run. Decision upon executor may be done when registering your runner in GitLab.

Gitlab Runner Execution Flow

Let’s see how runners and executors collaborate with Gitlab, looking at GitLab runner execution flow. This is how an entire pipeline is executed.

First of all, GitLab must be aware that such an executor exists, and for security reasons, must be sure that it is a runner managed by the owner of the repository. To gain access to the repository, the runner must be registered using the token provided by GitLab. It is a pretty straightforward process. As I already mentioned, I recommend using GitLab runner installed in a container.

Then the runner is asking for jobs to execute. If there is something to do, the runner downloads job details and triggers an executor. Executor clones the git repository, downloads artifacts, and executes jobs defined in .gitlab-ci.yml.

When an executor is done, it uploads job output and job status to GitLab.

GitLab Pipeline Jobs

To better understand GitLab runners, let’s see the example GitLab repository, with pipelines and jobs defined in .gitlab-ci.yml. The pipeline consists of four jobs as follows: update artifact, update cache, update C&A, and check, which verifies the final status of the pipeline.

Each job executes the same commands, which display the content of the working directory and show the content of cache and artifact files.

- ls # show all files and directories in working directory
- if test -f ./cache-file; then cat ./cache-file; fi; # cat content of cache-file
- if test -f ./artifact-file; then cat ./artifact-file; fi; # cat content of artifact-file

For the sake of simplicity, I use only one file as cache and one file as an artifact, but you can also define a whole directory. Additionally, jobs have their responsibilities: Job update artifact adds some text to artifact-file

- echo "artifact updated in job 1A, secret [$SECRET_KEY], on branch [$CI_COMMIT_BRANCH]">>./artifact-file

Job update cache also adds some text, but to cache-file

- echo "cache created in job 1B, secret [$SECRET_KEY], on branch [$CI_COMMIT_BRANCH]">>./cache-file

Job update C&A updates both cache and artifact:

- echo "update artifact in job 2A, secret [$SECRET_KEY], on branch [$CI_COMMIT_BRANCH]">>./artifact-file
- echo "update cache in job 2A, secret [$SECRET_KEY], on branch [$CI_COMMIT_BRANCH]">>./cache-file

Note usage of predefined variable CI_COMMIT_BRANCH and environment variable SECRET_KEY defined in repository CI/CD settings below. Variables will be injected into docker containers when the pipeline runs.

That’s basically it. As you can see, jobs are straightforward, the only responsibility of the job is to create or update files, but it is excellent for further explanation.

You can say that this is not a practical application for GitLab pipelines, but that’s what most of the pipelines do in real projects. Based on source code, build files are created - it may be a jar file, dist directory, or docker image, but still, it is only a set of files.

Gitlab Runner Step by Step

Gitlab runner may run multiple jobs defined in gitlab-ci.yml. Let’s see how the specific runner operates, looking at the very first job executed in the pipeline. In job logs, you can find several exciting entries.

Download a docker image inside which the whole job is executed - in this case, it is a very lightweight Linux instance called alpine.

Using Docker executor with image alpine ...
Pulling docker image alpine ...

Clone the git repository and mount it inside the docker container.

Getting source from Git repository
Created fresh repository.
Checking out 0e585cb3 as main…

Restore cache based on the key, which is calculated based on file cache-key in the repository. In real-life projects, cache keys could be calculated on build.gradle or package.json. As default, the cache is stored where GitLab Runner is installed - so there is no need to download cache from any external server.

Checking cache for 1069cd7938a4180b9c6b962f6476ed257d5c2a35...
Successfully extracted cache

Note that there is no need to download job artifacts because job artifacts are accessible within the same pipeline, on the machine where Gitlab Runner is installed and where each job is executed.

Execute shell script defined in yml file.

Execute job defined in gitlab-ci.yml
Executing "step_script" stage of the job script
Using docker image sha256:14119a... for alpine

After the job is done, preserve the updated cache for future use.

Saving cache for successful job
Creating cache a9406f87e1a5d81018be360ad0ca2f3f0b38fa37...
No URL provided, cache will be not uploaded to shared cache server. Cache will be stored only locally.

After the job is done, upload job artifacts to make them available on Gitlab UI.

Uploading artifacts...
./artifact-file: found 1 matching files and directories
Uploading artifacts as "archive" to coordinator...

That’s basically it. If the exit code is zero, it means that the job succeeded.

Cleaning up project directory and file based variables
Job succeeded

Artifacts and Cache Operations

Let’s see how artifacts and cache files (that are not under source control) are used by GitLab pipelines based on the example below.

There are three pipelines in this repository, which are started on the same main branch.

First Pipeline

The first pipeline is the first one that is started in the repository, with a brand new environment.

Let’s see the output of the very first run of the update artifacts job. The same situation is for the very first run of the update cache job.

$ ls
README.md
cache-key
$ if test -f ./cache-file; then cat ./cache-file; fi;
$ if test -f ./artifact-file; then cat ./artifact-file; fi;
$ echo "artifact updated in job 1A, secret [$SECRET_KEY], on branch [$CI_COMMIT_BRANCH]">>./artifact-file

As you can see, there are only two files in the working directory - these are files from the repository - *README.md *and cache-key. There is no cache-file nor artifact-file.

A different situation is for the consecutive job: update C&A.

$ ls
README.md
artifact-file
cache-file
cache-key
$ if test -f ./cache-file; then cat ./cache-file; fi;
cache created in job 1B, secret [secret_value], on branch [main]
$ if test -f ./artifact-file; then cat ./artifact-file; fi;
artifact updated in job 1A, secret [secret_value], on branch [main]

As you can see, there are two additional files:

artifact-file with content artifact updated in job 1A, secret [secret_value], on branch [main]
cache-file with content cache created in job 1B, secret [secret_value], on branch [main]

It means that both cache and artifact files are accessible for consecutive jobs in the pipeline.

Second Pipeline

A second pipeline has access only to cache files. Look at the output for the first job in the second pipeline - update artifacts:

$ ls
README.md
cache-file
cache-key
$ if test -f ./cache-file; then cat ./cache-file; fi;
cache created in job 1B, secret [secret_value], on branch [main]
update cache in job 2A, secret [&SECRET_KEY], on branch [main]
$ if test -f ./artifact-file; then cat ./artifact-file; fi;

There is only a cache-file available. It means that cache is shared between pipelines, but artifacts don’t.

Third Pipeline

After the second pipeline finished, the file cache-key was updated. It means that the cache was invalidated and couldn’t be used anymore.

Let’s see at the third pipeline. After the cache-key update, job update artifacts didn’t have access to the outdated cache, as you can see below

$ ls
README.md
cache-key
$ if test -f ./cache-file; then cat ./cache-file; fi;
$ if test -f ./artifact-file; then cat ./artifact-file; fi;

There were only two files in the working directory, as at the beginning.

Gitlab CI/CD Is for Humans

Getting acquainted with GitLab runner execution flow allowed me to understand that running pipelines on GitLab is just running the same command that I could run on my local machine.

The main difference is that Gitlab pipelines allow collaboration and provide a clean environment for each build because docker images may be used. An environment that ensures repeatability is the essential requirement of the software development lifecycle with CI/CD configured.

Thanks to artifacts and cache, results of commands may be accessed for consecutive jobs and allow caching to prevent downloading the same data all the time.

If you understand the basic concepts of GitLab pipelines, feel free to clone this repository and run some experiments on your new CI/CD project. You can run some jobs only on a specific branch; you can open merge requests and run jobs after merging. You could detect code changes and run some jobs if there were changes in particular directories.

References

https://docs.gitlab.com/ee/ci

https://docs.gitlab.com/runner

https://docs.gitlab.com/runner/executors

https://docs.gitlab.com/ee/ci/introduction/#gitlab-cicd-workflow

https://www.digitalocean.com/community/tutorials/how-to-set-up-a-continuous-deployment-pipeline-with-gitlab-ci-cd-on-ubuntu-18-04

https://medium.com/@ryzmen/gitlab-fast-pipelines-stages-jobs-c51c829b9aa1

https://www.section.io/engineering-education/building-a-cicd-pipeline-using-gitlab

About the author

Wojciech Marusarz

Software Engineer

Wojciech enjoys working with small teams where the quality of the code and the project's direction are essential. In the long run, this allows him to have a broad understanding of the subject, develop personally and look for challenges. He deals with programming in Java and Kotlin. Additionally, Wojciech is interested in Big Data tools, making him a perfect candidate for various Data-Intensive Application implementations.

Understanding Gitlab Pipelines - Creating CI/CD Flow

CI/CD (Continuous Integration/Continuous Delivery)

Stories on software engineering straight to your inbox

Gitlab Pipelines Explained

Artifacts

Cache

CI/CD Variables

How Does Gitlab Do Its Job?

How Does Gitlab Runner Work?

Are GitLab Runners Safe?

What Are Gitlab Executors?

Gitlab Runner Execution Flow

GitLab Pipeline Jobs

Gitlab Runner Step by Step

Artifacts and Cache Operations

First Pipeline

Second Pipeline

Third Pipeline

Gitlab CI/CD Is for Humans

References

About the author

1. Definitions

2. Cookies

3. How System Logs work on the Website

4. Cookie mechanism on the Website

5. Additional information

Understanding Gitlab Pipelines - Creating CI/CD Flow

CI/CD (Continuous Integration/Continuous Delivery)

Stories on software engineering straight to your inbox

Gitlab Pipelines Explained

Artifacts

Cache

CI/CD Variables

How Does Gitlab Do Its Job?

How Does Gitlab Runner Work?

Are GitLab Runners Safe?

What Are Gitlab Executors?

Gitlab Runner Execution Flow

GitLab Pipeline Jobs

Gitlab Runner Step by Step

Artifacts and Cache Operations

First Pipeline

Second Pipeline

Third Pipeline

Gitlab CI/CD Is for Humans

References

About the author

This article is a part of

Insights from nexocode team just one click away

Done!

Thanks for joining the newsletter

1. Definitions

2. Cookies

3. How System Logs work on the Website

4. Cookie mechanism on the Website

5. Additional information

Want to be a part of our engineering team?