- Sat 24 April 2021
- server admin
- Gaige B. Paulsen
- #server admin, #programming, #ansible
Quite a while back, RS wrote a comprehensive ansible role for handling
Let's Encrypt certificate issuance and renewal.
We both use this role extensively, which is why it was a significant issue when it
suddenly started throwing type errors deep inside of the
dnspython library during an nsupdate
call in a critical
part of the script.
A cursory examination of the component parts indicated that the most likely cause was
a change to the dnspython
library, which had recently been upgrade from 1.16 to 2.0.
Although there wasn't anything we could find online indicating other people had suffered
this breakage (which should have been a clue), it hadn't been out very long, it crashed
in a module that indicated it was checking something with IPv6, we use a lot of IPv6 on
our systems, many people use no IPv6, and well, we hadn't changed anything...
This was an annoyance, but relatively easy to avoid in one of the following ways:
- Pin the
dnspython
libraries to <2.0 inpip
- On the Mac, use
brew
'sansible
and manually roll back thednspython
libraries in the installed version[1]
I used both, as we ran ansible on both SmartOS and macOS.
Taking brew hackery up a notch
After maintaining this for a while, I needed to upgrade some modules in ansible
, and
needed to keep my CI environment (running on Macs under Jenkins
in sync with what we were running on my desktop, laptop, and servers; and that lead me to
create my own tap in homebrew
by cloning the standard ansible formula and using my own repository.
The addition of this tap meant that I could configure this and test it once, but I could deploy it on all of my Macs (and anyone else who had access to the tap on my private git server could do the same).
One thing leads to another
After a few more months of using this tap on my Macs (and slowly moving ahead the ansible
version on the SmartOS machines, but keeping dnspython
pinned), I needed to upgrade the
version of ansible at home (due to a project that I'll likely write about later, using
ansible
to configure my Jenkins agents). The driver here was the need to execute
homebrew
commands on an M1 mac, something that didn't work out of the box with ansible
2.9, which is what I was pinned to.
Ever-hopeful, I first decided to see if my aforementioned problem was "fixed" by unlinking[2]
my private tap's version of ansible
, and installing homebrew's version.
Sadly, running the ansible
playbook just resulted in the familiar crash. I looked at it
for a few minutes, decided the bug that was introduced in summer 2020 was still there and
set about building a new tap for version 3.2.0 of ansible
. This went smoothly, but after
updating my formula, installing took a long time, on the order of a few minutes. Why was
the standard homebrew install so much faster?
A bottle for monsieur?
Quick investigation lead to the fact that most brew taps are installed these days using bottles, or pre-built versions of the entire subdirectory that ends up in the Cellar. That seemed like it was a significant win, especially since I was going to install this at least 5 times each update, so I decided to figure out how to create my own custom bottles for my custom tap.
Thanks to a good article on Custom Tap and Bottles with Homebrew by Yehowshua Immanuel, I was on my way quickly after rebuilding from my tap formula once for each platform of Mac that I run (Intel Catalina, Intel Big Sur, and ARM Big Sur at this time).
The final verdict
After all this work, and getting a great solution in place for working around the
perceived bug in dnspython
, I took another quick look at the bug that was popping up in
our role. I'd contributed to random python projects in the past and also contributed to
ansible
directly, so I was familiar with the process and figured I could track the
problem down. I fired up pycharm to get a little
better perspective on the particular bugs and settled in to reproduce a minimal set of the
problem with the nsupdate
command in ansible
.
A few minutes (literally) into the investigation and I found myself looking at the what
seemed like completely reasonable arguments to the dns.query.tcp
method which were
raising exceptions due to not being able to determine whether my hostname was an IPv4
or IPv6 address. I immediately checked the current docs for nsupdate
in ansible
and,
indeed, the server
argument is now designated an IP address (v4 or v6). Checking whether
we'd just been lucky and ignoring this all along, I went back to the ansible 2.9
documentation and verified that it was mute on the issue of what was in the string argument.
At some point between 2.9 of ansible
and 3.0, they documented the change caused by
the the underlying library and I missed that change.
A few take-aways:
- Once again, a reminder that checking your arguments against current documentation is often time well spent.
- Assuming a behavior that goes against your expectations is a bug when nobody else is complaining about it is often a recipe for a lot of work.
- Homebrew is a really well thought out package and if you have a need to maintain your own tools, it may be well worth it to use private taps and bottles, they're easy to create and super-easy to use.
Every once in a while, it's good to have your own assumptions challenged. I made a point
of commenting on the bug report
for ansible
regarding this filed by someone else. Hopefully they're find my information
useful.
This experience lead me to a nifty thing about
brew
, which is that many installations have every dependency installed in the Cellar directory for that specific package, including (for most python tools), it's own copy of site-packages. This makes it very easy to pin specific versions of dependencies and be able to run a number of python tools with different libraries and even interpreters. ↩︎Everyone who uses
ansible
should be familiar with thelink
andunlink
commands, which allow you to keep a version or command installed while switching to another one. In my case, since I was using a tap that had named versions (the best example of this I can think of is Postgresql, which has separate versions for current, 12, 11, 10, 9.6 and even some of the deprecated versions--use at your own peril). So, I couldbrew unlink ansible@2.9.13
andbrew install ansible
and get my private copy to move out of the way and use the brew-standard version for testing. ↩︎