- Sun 25 July 2021
- server admin
- Gaige B. Paulsen
- #server admin, #gitlab, #ansible
In June, I mentioned in an article about Docker on SmartOS that we are doing some work with GitLab these days as a replacement for my venerable Gitolite server (and, to an increasing extent Jenkins).
Deploying from Pelican
I'm likely going to write more on GitLab in the near future, but for now, I'd like to document some things I've learned about deploying with Gitlab.
This blog is deployed in a semi-automated fashion. As mentioned
previously,
it is compiled using pelican
and served as static pages using nginx.
As such, once modifications are made, I'm ready to verify that they look OK and work correctly on the Stage Server; once I'm happy with that deployment, it's time to push to production.
Historically, I started out by doing a complete rebuild of the server serving
up the pages. THat got tedious if I was writing a lot of posts (or, at least
if I was writing posts more frequently than OS releases and nginx releases).
Eventually, I modified my ansible scripts so that
they had a tag for publish
which would skip the re-provisioning process
and the process of building new certs, etc. and just deploy the latest pelican,
build the pages, and reset the cache. In fact, it would do so in a separate
directory, so that it would flash-cut the web pages.
While rolling out GitLab, I started playing with the CI tools and realized there was a lot I could do with it, much of it more easily than I could with Jenkins. As such, an automatic build to stage followed by a manually-triggered build to production was simple to configure.
So, I set out on my next automation journey with GitLab...
Access control
One nice thing about running the CI under the rubric of the SCM is that you can grant permissions to do source-related things just from the SCM. This makes it simple to pull from multiple repositories and perform other SCM-specific tasks.
However, this doesn't specifically extend beyond the CI and SCM and into the deployment. So, my next question was how to control access to the hosts and make sure that I could control them, and retrieve the code without trouble.
Further, I wanted to re-use the ansible playbooks that I used to deploy the systems (albeit with tags to reduce the plays), while limiting access to the stage and production servers (not the SmartOS global zones they're deployed from). Since I was reusing these mechanisms, I wanted to leave the existing ssh-based access controls in place.
As an aside, I could now switch my deployment method for git repositories to using deployment or personal access tokens, but I'd rather not right now.
SSH solution
My existing deployment pattern automatically deals with what I refer to as
ssh_access_keys
, which are SSH keys that are used for root access to the
servers. These are generally used infrequently (there are separate deployment
keys that are multi-server), but when accessing only the VM, the
ssh_access_keys
are precisely the right tool.
When running on the CI server, I need to adopt the ssh key as part of the
CI process, and I use ssh-agent
to do that (one agent per running CI
process, segregated by the socket/pid combination). It's simple to start this by
using:
eval $(ssh-agent -s)
This creates the agent and sets the shell variables so the agent is reachable.
Then comes the real trick: loading the ssh key into the agent. I had a vague recollection that it was possible to load a key from a shell variable, and here's how to do it:
echo "$HOST_DEPLOY_KEY" | tr -d '\r' | ssh-add -
Ansible ssh control paths
While getting this put together, I ran across an issue with the length of the
path for ANSIBLE_SSH_CONTROL_PATH
, which is used by SSH to persist connections
(in our configuration). Especially on Solaris (and derivatives, like SmartOS),
there's a path limit to the control file and caused a problem with the
relatively-deep nesting that gitlab runners do for their paths. The solution
was to define a bespoke path:
export ANSIBLE_SSH_CONTROL_PATH_DIR=/tmp/${CI_JOB_ID}-${CI_COMMIT_SHORT_SHA}/.ansible/cp
Not that this path is in /tmp
, not in ~
and certainly not in the build directory,
however, it does change for every job and repo.
Final gitlab script
script:
- eval $(ssh-agent -s)
- echo "$HOST_DEPLOY_KEY" | tr -d '\r' | ssh-add -
- export HOME=$(pwd)
- export ANSIBLE_SSH_CONTROL_PATH_DIR=/tmp/${CI_JOB_ID}-${CI_COMMIT_SHORT_SHA}/.ansible/cp
- 'git config --global url."https://gitlab-ci-token:${CI_JOB_TOKEN}@your.git.server/".insteadOf git@your.git.server:'
- git clone https://gitlab-ci-token:${CI_JOB_TOKEN}@your.git.server/playbooks/ansible-web.git
- cd ansible-web
- ansible-galaxy install -r requirements.yml -f
- ansible-playbook -i stage -t publish --vault-password-file $VAULT_SECRET -e cert_renew_days=0 pelican.yml
Putting it all together:
- set up SSH (lines 1 & 2)
- set HOME so that we're not stomping on another cache; this may not be necessary if you can guarantee that only one runner will be running at a time in each account (line 3)
- set the ansible control path (line 4)
- rewrite our git URLs globally (line 5)
- check out our ansible playbook repository (line 6)
- update the ansible galaxy requirements (lines 7 & 8)
- Run the playbook to our stage sever (line 9)
You might be wondering about line 5, where we use an interesting feature of git to rewrite the URLs. This might not be explicitly necessary if I were to allow the ssh key that I use for deployment access to all of my dependencies in my git repo. However, I've left it that way for future compatibility and because it confines this particular script to being run by the CI server.
So, for those keeping score, the gitlab server runs the gitlab script on a SmartOS host which runs the gitlab agent, and thus the ansible runs on SmartOS. Theoretically the ansible could run on basically anything (my Jenkins versions of this ran on macOS Jenkins nodes), but our provisioning is done from SmartOS these days, so keeping things the same is a good thing.
Manually-triggered releases
I mentioned in the beginning that I was going to be manually trigging the release to production. This is done using a rule in the GitLab CI configuration:
deploy-prod:
tags: [ansible]
stage: prod
environment:
name: production
url: https://${SERVER}
script:
- "... see above ..."
rules:
- if: $CI_COMMIT_BRANCH == $CI_DEFAULT_BRANCH
when: manual
This job definition requires that the runner be tagged ansible
, names the stage
prod
, sets up an environment for prod with the URL pointing at our final
server, includes the script above, and then conditionally (and only on the
main branch) holds for manual release.
Script locations
One additional note I'll make is that I made some potentially interesting
decisions on where to place the gitlab scripts. Since I tend to have multiple
hosts (or groups) using the same ansible plays, I knew I wanted
a place to share the scripts calling them. Their requirements tend to be more
aligned with the ansible playbooks than the code that is deployed. As such, I
placed the gitlab-ci jobs as templates in my ansible playbook repositories in
a gitlab-deploy
directory. I aligned the names with the playbooks.
To call these, I use the include
directive in the .gitlab-ci.yml
files for
the repositories I'm deploying:
# This will work, but not on the python runner (yet)
include:
- project: 'playbooks/ansible-web'
file: 'gitlab-deploy/pelican.yml'
variables:
SERVER_GROUP: gaiges_pages
SERVER: www.gaige.net
Note the additional variables. Since this deployment script is used by both the Gaige's Pages and Cartographica blogs, I needed a way to pass in the server and group names.