- Mon 04 April 2022
- server admin
- Gaige B. Paulsen
- #elasticsearch, #server admin
I've been an ELK Stack (Elasticsearch, Logstash, Kibana, and Beats) user for quite some time, using exclusively the open source version fo the stack.
Generally it's works well and, with some exceptions, supports our mostly-Solaris based environment (using LX zones to host most of the beefier components, and using custom-built beats and senders for the lightweight senders).
Index Lifecycle Management (ILM)
A couple of years ago, I started using ILM, which automatically rotates indexes through various stages of life based on use of the index:
- hot: index is being actively updated and queried
- warm: index is not being updated, but is still being queried
- cold: no longer being updated, but queried infrequently
- frozen: no longer being updated, queried rarely
- delete: no longer needed and may be deleted
Generally, you set up a file pattern which is followed automatically for the indexes
creating a user-and-machine-friendly YYYY-MM-DD-NNN
suffix, such as
mail-7.4.2-2020-04-22-0027
so that the name denotes its creation date and has
a numeric discriminator in case you need to rotate due to volume instead of dates.
ILM supports rotation based on a number of factors, and it's common to specify a
monthly rotation plus a file size rotation, in order to keep indexes a manageable size.
Fixing a bad index template
During one of my index creation steps, I used the wrong name template and ended up with all of my subsequent indexes (for over a year) being named with the same date and an incrementing discriminator value. This was practically fine, but annoying because it made it difficult to determine if the indexes were being deleted and rotated through stages correctly.
The fix for this was to make sure that the index being written by my logstash component
was <mail-8.0.0-{now/d}-000032>
(as an example), not mail-8.0.0-{now/d}-000032
; the
difference being when the value was evaluated. In the former, it's kept with the index;
in the latter, it's evaluated when the index is created and the result is a
provided_name
of mail-8.0.0-
date-created
-000032
, which means the date won't
change.
I tested this change by modifying the provided_name
of the running index, and then
executing the manual rollover by sending:
POST <rollover-target>/_rollover/<target-index>
and specifying the new index template in the <target-index>
and then subsequently
setting index.lifecycle.indexing_complete
to true
on the index so that the lack
of an automated rollover didn't cause error messages.
Changing historic names
The remaining problem was the names of the old indexes. Although w does not
have a rename command, it does have two other useful commands, reindex
and clone
.
reindex
will reindex all of the documents in the original index into a new
index, which allows you to change the format of the index and settings prior to reindexing
the data (in fact, you must create and provision the new index first, or you're likely
going to either delete the reindexed copy or re-reindex it).
clone
makes a complete clone of the index by using hard links (if possible on the
underlying OS). The makes it particularly fast (at least for the primary shards) and
allows you to create the new index with all of the attributes of the old index.
So, for the equivalent of renaming the indexes, you clone
the old index to the new
name, and then delete
the original. In this case, you're only rebuilding the replicas
and by deleting the old index, the hard links become referenced only by the new index.
Setting the ILM time
The final problem was that all of these indexes now were believed by ILM to be brand new. They were going to get rotated into another phase (or deleted) based on the date that I cloned them, not based on the last write date.
As an aside: the way that ILM looks at indexes is by considering the creation date to
be the key date for the index (the lifecycle_date_millis
) to be the creation date of
the index, until it is closed for the first rollover, at which point the
lifecycle_date_millis
is set to that first rollover date. This way, ILM actions are based
on the creation date of the index (rollover 30d after creation, for example) and subsequent
actions are made based on the date that index was closed.
By cloning the index, I'd reset the creation date, and thus the lifecycle_date_millis
.
Not surprisingly, this was a pretty easy fix: determine the rollover date and then
reset the value.
In my case, I double-checked the expected dates by executing a timestamp query:
GET mail-7.4.2-2022.02.23-000131/_search?size=0
{
"aggs": {
"max_date": {
"max": {"field": "@timestamp"}
},
"min_date": {
"min": {"field": "@timestamp"}
}
}
}
And then updating the index settings:
PUT mail-7.4.2-2022.02.23-000131/_settings
{
"settings": {
"index": {
"lifecycle.origination_date": 1646629200000
}
}
}
Finally, check the ILM information:
GET mail-7.4.2-2022.02.23-000131/_ilm/explain
and verify that the age is as expected:
{
"indices" : {
"mail-7.4.2-2022.02.23-000131" : {
"index" : "mail-7.4.2-2022.02.23-000131",
"managed" : true,
"policy" : "mail",
"lifecycle_date_millis" : 1646629200000,
"age" : "28.25d",
"phase" : "hot",
"phase_time_millis" : 1649003776346,
"action" : "complete",
"action_time_millis" : 1649004257477,
"step" : "complete",
"step_time_millis" : 1649004257477,
"phase_execution" : {
"policy" : "mail",
"phase_definition" : {
"min_age" : "0ms",
"actions" : {
"rollover" : {
"max_size" : "50gb",
"max_age" : "30d"
}
}
},
"version" : 3,
"modified_date_in_millis" : 1647091493509
}
}
}
}
And you're set.