Learnings from the field – Running Fluent Bit on Azure Kubernetes Service – Part 2

This is the second part of a three-part series about “Learnings from the field – Running Fluent Bit on Azure Kubernetes Service”.

-> https://www.danielstechblog.io/learnings-from-the-field-running-fluent-bit-on-azure-kubernetes-service-part-1/

Logging is one of the central aspects when operating Kubernetes. The easiest way to get started with it is by using the solution your cloud provider provides. On Azure, this is Azure Monitor Container Insights that can also be used on Google Kubernetes Engine and Amazon Elastic Kubernetes Service via Azure Arc.

When you look for a platform-agnostic approach that is also highly customizable, you probably end up with Fluent Bit. Besides running Fluent Bit on Kubernetes for your container logs, you can run it on VMs or bare-metal servers for logging. Nevertheless, the focus in this series is on Fluent Bit running on Azure Kubernetes Service and using Azure Log Analytics as the logging backend.

I share with you specific learnings from the field operating Fluent Bit on Azure Kubernetes Service.

Why should I use filesystem buffering?

When working with Fluent Bit, you control the memory usage of the input plugins with the setting Mem_Buf_Limit. Otherwise, you risk running into an out-of-memory exception in a high-load environment with backpressure.

-> https://docs.fluentbit.io/manual/administration/buffering-and-storage#buffering-and-memory
-> https://docs.fluentbit.io/manual/administration/backpressure

Backpressure can occur when the configured output plugin cannot flush the log data to its destination. Most likely are network issues or the logging backend, in our case Log Analytics, is not available.

So, what happens when you run into a backpressure scenario where the input plugin reaches its Mem_Buf_Limit threshold?

The input plugin pauses the log ingestion, and you might lose log data, especially in the case of the tail plugin when log file rotation occurs. You can prevent that by configuring and using filesystem buffering.

-> https://docs.fluentbit.io/manual/administration/buffering-and-storage#filesystem-buffering-to-the-rescue

The filesystem buffering allows the input plugin in a backpressure scenario to register new log data and store the log chunks on disk rather than in memory. Once the output plugin starts to flush log data to its backend again, the input plugin can process the log data in memory and starts processing the log chunks stored on disk.

By using the Mem_Buf_Limit setting, and file system buffering, you ensure you do not lose log data. Depending on your configuration and the length of an outage of your logging backend, it might be that you still can lose log data.

The configuration of the filesystem buffering is done centrally in the [SERVICE] section of the Fluent Bit configuration. For a minimal configuration using the defaults, set the storage.path in the [SERVICE] section.

...
  fluent-bit.conf: |
    [SERVICE]
        Flush                     15
        Log_Level                 info
        Daemon                    Off
        Parsers_File              parsers.conf
        storage.path              /var/log/flb-storage/
...

In the input plugin configuration set storage.type to filesystem.

...
  input-kubernetes.conf: |
    [INPUT]
        Name              tail
        Alias             logs_input
        Tag               kubernetes.logs.*
        Path              /var/log/containers/*.log
        Parser            cri_kubernetes_logs
        DB                /var/log/flb_kubernetes_log.db
        ...
        Mem_Buf_Limit     10mb
        storage.type      filesystem
...

Losing log data due to output plugin retry configuration

Fluent Bit uses a Scheduler to decide when it is time to flush log data through the configured output plugins. The output plugin will send one of three possible return statuses. OK means the log data has been successfully flushed. An Error status indicates an unrecoverable error, and the log data is lost.

When a Retry status is sent, the Scheduler decides how long the wait time is to retry to flush the log data. Per default, only one retry happens. Can the log data not be flushed again the log data is lost, and Fluent Bit logs the following warning.

...
[2023/01/25 09:11:04] [ warn] [engine] failed to flush chunk '1-1674637849.649241392.flb', retry in 10 seconds: task_id=83, input=logs_input > output=logs_output (out_id=0)
[2023/01/25 09:11:24] [ warn] [engine] chunk '1-1674637849.649241392.flb' cannot be retried: task_id=83, input=logs_input > output=logs_output
...
[2023/01/25 09:13:10] [ warn] [engine] failed to flush chunk '1-1674637886.887525595.flb', retry in 11 seconds: task_id=30, input=storage_backlog.2 > output=logs_output (out_id=0)
[2023/01/25 09:13:31] [ warn] [engine] chunk '1-1674637886.887525595.flb' cannot be retried: task_id=30, input=storage_backlog.2 > output=logs_output
...

Using the default Retry_Limit configuration will result in losing log data in the event of network issues or a backend outage. You configure the Retry_Limit in each output plugin individually. Setting it to no_limits or False Fluent Bit retries to flush the log data till the return status is OK. Otherwise, you specify a number that fits your needs. This can be 10 or 60 whatever is suitable for your use case.

...
  output-kubernetes.conf: |
    [OUTPUT]
        Name            azure
        Alias           logs_output
        Match           kubernetes.logs.*
        ...
        Retry_Limit     10

Give Fluent Bit enough time during a shutdown

Another case where you can lose log data is during a voluntary disruption of the Fluent Bit pod on a node. These voluntary disruptions happen during cluster autoscaler, Kubernetes upgrades, node reboot events, or updating the Fluent Bit daemon set.

Fluent Bit’s own grace period configuration is 5 seconds if not specified otherwise. 5 seconds might be too short when during the final flush, a Retry status is reported by the output plugin. Hence, my recommendation to you configure the grace period via the Grace parameter.

Keep in mind that the default termination grace period for Kubernetes pods is 30 seconds.

For instance, we use a 60-second grace period within Fluent Bit and a 75-second termination grace period in Kubernetes for the daemon set.

Fluent Bit configuration:

...
  fluent-bit.conf: |
    [SERVICE]
        Flush                     15
        Grace                     60
        Log_Level                 info
        Daemon                    Off
        Parsers_File              parsers.conf
        storage.path              /var/log/flb-storage/
...

Kubernetes daemon set configuration:

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: fluent-bit
  namespace: logging
  ...
spec:
  selector:
    matchLabels:
      app: fluent-bit
  template:
    ...
    spec:
      terminationGracePeriodSeconds: 75
      containers:
      - name: fluent-bit
        image: fluent/fluent-bit:1.9.6
...

Azure Log Analytics’ TimeGenerated field

As mentioned at the beginning, I am using Log Analytics as the logging backend for Fluent Bit. Log Analytics has a TimeGenerated field for every log line that represents the timestamp when this specific log line was ingested into Log Analytics.

For application log data it is crucial that the original timestamp is used by Log Analytics for the TimeGenerated field to make queries a lot easier. The TimeGenerated field is the default field used by Log Analytics to identify log data during a log query for a specific time range.

Fortunately, the Log Analytics API provides the request header field time-generated-field you can use to point Log Analytics to the field in the log data that contains the timestamp to use for the TimeGenerated field.

-> https://learn.microsoft.com/en-us/azure/azure-monitor/logs/data-collector-api?WT.mc_id=AZ-MVP-5000119#request-headers

You must configure the Fluent Bit output plugin specifically to achieve this by setting Time_Generated to on and providing the field name via Time_Key.

-> https://docs.fluentbit.io/manual/pipeline/outputs/azure#configuration-parameters

Below is an example output plugin configuration when you use the CRI parser from the Fluent Bit documentation.

...
  output-kubernetes.conf: |
    [OUTPUT]
        Name            azure
        Alias           logs_output
        Match           kubernetes.logs.*
        ...
        Time_Key        @time
        Time_Generated  on
        Retry_Limit     10

Keep in mind that for log data where the value for the TimeGenerated field is older than two days before the received time, Log Analytics uses the ingestion time for the TimeGenerated field. Under normal circumstances, you should not run into this edge case. But it depends on the backpressure scenario and your retry configuration.

Outlook

That is all for part two of the series “Learnings from the field – Running Fluent Bit on Azure Kubernetes Service”.

In the third and last part, I talk about the topic of gathering logs of Fluent Bit itself. Stay tuned.