- Tue 05 October 2021
- server admin
- Gaige B. Paulsen
- #server admin, #backups, #bacula
After 18 months of using Bacula and sending copies of my data to the cloud (in this case, cloud I operate in another location) using an S3-compatible storage mechanism, I noticed I had a lot of data sitting around on my current server for backups. When I set out to move to Bacula, I decided to use long retention times for my core monthly full backups, which resulted in more than a small number of terrabytes of data.
At the time of the implementation (and still the case at the time of this writing), the automatic options in Bacula for pruning/truncating local copies of cloud datasets were:
- No (do not remove cache)
- AfterUpload (each part removed directly after upload)
- AtEndOfJob (each part removed at the end of the job)
None of these would work for me, as I want to retain the data for months locally, only giving up my cached copy when I'm outside of my normal restore window, or when I need the space.
There are a number of ways to prune, depending on how much you want to get into the Bacula mindset.
Manual purge using find
It turns out that if you leave the label intact (the label being part.1
in the volume directory), you can delete any parts in the cloud volume and they will
be auto-retrieved during a restore. This will allow you to override any settings you
have in bacula-dir.conf for your CacheRetention and just manually purge in any
way you like. In my case, I made use of find:
find .  -regextype posix-egrep -regex '.*\/Vol-.*\/part\.([2-9]|..+)' -exec rm \{\} \;
This particular command uses a posix regular expression to find any file in any directory
starting Vol- and named part._number_ where number is any value other than 1.
Manual pruning using bconsole
Bacula's console (bconsole) has a Cloud command which can be used to force a
prune operation. The cloud prune command respects the CacheRetention setting and
has a number of command-line parameters to allow you to specify what you want to prune.
You can prune by storage, pool, or even MediaType.  There is also a parameter
to prune AllPools.
In my case, I used:
cloud prune AllFromPool Storage=Cloud-CT Pool=File
which breaks down to:
- cloudcommand
- prunesub-command
- AllFromPool: run the purge command on all volumes in the pool
- Storage=: use the specific Storage definition (in this case- Cloud-CT)
- Pool=: use the specific Pool (in this case- File)
For ClueTrust, we use 3 different pools in our storage:
- Filefor the full backups (historical naming convention)
- Inc-Filefor the daily incremental backups (from the last File backup)
- Diff-Filefor the weekly differential backups (from the last File backup)
In this case, I only want to purge the full backups that are outside of the range of the
incremental and differential backups.  To that end, I've set the CacheRetention
appropriately in my bacula-dir.conf file and so I can trust bacula to clear these
correctly.
Automatic pruning using bacula admin jobs
I've read that this is possible, but I haven't found the appropriate documentation yet. At this point, I can't recommend, but the other two processes work fine and are easily scripted if need be.