Use Fluent Bit for Kubernetes events gathering on Azure Kubernetes Service

For a while now Fluent Bit has a new input plugin that allows us to gather Kubernetes events, modify, and ingest them into the logging backend.

-> https://docs.fluentbit.io/manual/pipeline/inputs/kubernetes-events

Today we look at how to configure and deploy Fluent Bit to gather Kubernetes events on an Azure Kubernetes Service cluster and ingest them into an Azure Data Explorer cluster.

Deployment

Fluent Bit runs per default as a Kubernetes daemon set on every node in a Kubernetes cluster to gather container logs. The Kubernetes Events input plugin should not be configured, at the time of writing, on a Fluent Bit daemon set installation as the input plugin does not have a leader election functionality. Hence, we would gather the same Kubernetes events over and over again.

-> https://github.com/fluent/fluent-bit/discussions/6942

The only viable option for the Kubernetes Events input plugin is a Kubernetes deployment with a single replica.

Furthermore, we need external storage for the database that the Kubernetes Events input plugin uses to track the state of events that have already been gathered.

In the case of an Azure Kubernetes Service cluster, I have chosen in this example an Azure File Share as an external storage. Unfortunately, we cannot use one of the already existing storage classes as they are all missing an important configuration parameter.

The nobrl parameter must be set, otherwise Fluent Bit will complain about a locked database. nobrl is used to avoid sending byte range lock requests to the server.

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: azurefile-csi-fluent-bit
provisioner: file.csi.azure.com
reclaimPolicy: Delete
volumeBindingMode: Immediate
allowVolumeExpansion: true
mountOptions:
  - mfsymlinks
  - actimeo=30
  - nosharesock
  - nobrl # nobrl is required for Fluent Bit to work correctly
parameters:
  skuName: Standard_LRS

With the above-mentioned storage class, we hand over the Azure Storage Account creation to Azure. So, no pre-provisioning is required, and the Storage Account will be created within the Azure Kubernetes Service node resource group.

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: fluent-bit-kubernetes-events
spec:
  accessModes:
    - ReadWriteMany
  storageClassName: azurefile-csi-fluent-bit
  resources:
    requests:
      storage: 5Gi

For the persistent volume claim, that represents the Azure File Share, we choose 5 GB as the initial storage capacity.

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: fluent-bit-kubernetes-events
    version: v3.2.3
    kubernetes.io/cluster-service: "true"
  name: fluent-bit-kubernetes-events
  namespace: logging
spec:
  replicas: 1
  strategy:
    type: Recreate
  selector:
    matchLabels:
      app: fluent-bit-kubernetes-events
  template:
    metadata:
      labels:
        app: fluent-bit-kubernetes-events
        version: v3.2.3
        kubernetes.io/cluster-service: "true"
    spec:
      terminationGracePeriodSeconds: 75
      containers:
        - name: fluent-bit-kubernetes-events
          image: cr.fluentbit.io/fluent/fluent-bit:3.2.3
          imagePullPolicy: IfNotPresent
          ports:
            - containerPort: 2020
          livenessProbe:
            httpGet:
              path: /api/v1/health
              port: 2020
            failureThreshold: 3
            initialDelaySeconds: 60
            periodSeconds: 10
            successThreshold: 1
            timeoutSeconds: 1
          env:
            - name: FLUENT_ADX_TENANT_ID
              valueFrom:
                secretKeyRef:
                  name: azuredataexplorer
                  key: tenant_id
            - name: FLUENT_ADX_CLIENT_ID
              valueFrom:
                secretKeyRef:
                  name: azuredataexplorer
                  key: client_id
            - name: FLUENT_ADX_CLIENT_SECRET
              valueFrom:
                secretKeyRef:
                  name: azuredataexplorer
                  key: client_secret
            - name: CLUSTER
              value: aks-azst-1
            - name: REGION
              value: northeurope
            - name: ENVIRONMENT
              value: prod
            - name: NODE_IP
              valueFrom:
                fieldRef:
                  apiVersion: v1
                  fieldPath: status.hostIP
          volumeMounts:
            - name: fluent-bit-kubernetes-events-config
              mountPath: /fluent-bit/etc/
            - name: fluent-bit-kubernetes-events-data
              mountPath: /fluent-bit/data/
          resources:
            limits:
              cpu: 500m
              memory: 750Mi
            requests:
              cpu: 75m
              memory: 325Mi
          securityContext:
            runAsNonRoot: true
            runAsUser: 65534
            runAsGroup: 65534
            readOnlyRootFilesystem: true
            allowPrivilegeEscalation: false
      volumes:
        - name: fluent-bit-kubernetes-events-config
          configMap:
            name: fluent-bit-kubernetes-events-config
        - name: fluent-bit-kubernetes-events-data
          persistentVolumeClaim:
            claimName: fluent-bit-kubernetes-events
      serviceAccountName: fluent-bit-kubernetes-events
      priorityClassName: system-cluster-critical

The deployment is kept simple and only has three specific configurations.

First, the increased termination grace period to provide Fluent Bit with enough time to shut down during the pod termination phase.

Second, the priority class as we do not want that our Fluent Bit deployment will be evicted from the node by the scheduler when pods with higher priority are scheduled under normal configuration circumstances.

Third, we use the recreate strategy to prevent interference between two pods accessing the database simultaneously.

Configuration

As of version 3.1, Fluent Bit uses a Kubernetes watch stream to retrieve Kubernetes events via the input plugin. Hence, we use the default configuration for the input plugin, followed by several filters to prepare the data for the Azure Data Explorer output plugin.

apiVersion: v1
kind: ConfigMap
metadata:
  name: fluent-bit-kubernetes-events-config
  namespace: logging
data:
  # General settings
  # ======================================================
  fluent-bit.conf: |
    [SERVICE]
        Flush                     15
        # Ensures that log chunks, where the flush failed previously, are flushed on container termination
        Grace                     60
        Log_Level                 info
        Daemon                    Off
        HTTP_Server               On
        HTTP_Listen               0.0.0.0
        HTTP_Port                 2020
        Health_Check              On
        HC_Errors_Count           5
        HC_Retry_Failure_Count    5
        HC_Period                 60
        # Backpressue fallback
        storage.path              /fluent-bit/data/flb-storage/
        storage.sync              normal
        storage.checksum          off
        storage.backlog.mem_limit 50M

    @INCLUDE input-kubernetes.conf
    @INCLUDE filter-kubernetes.conf
    @INCLUDE output-kubernetes.conf

  # Kuberetes Events configuration
  # ======================================================
  input-kubernetes.conf: |
    [INPUT]
        Name                kubernetes_events
        Alias               events_input
        Tag                 kubernetes.events.*
        DB                  /fluent-bit/data/flb_kubernetes_events.db
        DB.sync             normal
        kube_retention_time 1h
        Log_Level           warning

  filter-kubernetes.conf: |
    [FILTER]
        Name         nest
        Alias        events_filter_1
        Match        kubernetes.events.*
        Operation    lift
        Nested_under involvedObject
        Add_prefix   involvedObject_

    [FILTER]
        Name         nest
        Alias        events_filter_2
        Match        kubernetes.events.*
        Operation    lift
        Nested_under source
        Add_prefix   source_

    [FILTER]
        Name         nest
        Alias        events_filter_3
        Match        kubernetes.events.*
        Operation    lift
        Nested_under metadata
        Add_prefix   metadata_

    [FILTER]
        Name      modify
        Alias     events_filter_4
        Match     kubernetes.events.*
        Condition Key_does_not_exist source_host
        Add       source_host        ""

    [FILTER]
        Name      modify
        Alias     events_filter_5
        Match     kubernetes.events.*
        Add       Cluster                    ${CLUSTER}
        Add       Region                     ${REGION}
        Add       Environment                ${ENVIRONMENT}
        Rename    metadata_creationTimestamp CreationTimestamp
        Rename    source_component           SourceComponent
        Rename    source_host                SourceComputer
        Rename    reportingComponent         ReportingComponent
        Rename    reportingInstance          ReportingComputer
        Rename    involvedObject_kind        Kind
        Rename    involvedObject_apiVersion  ApiVersion
        Rename    involvedObject_name        Name
        Rename    involvedObject_namespace   Namespace
        Rename    count                      Count
        Rename    action                     Action
        Rename    reason                     Reason
        Rename    message                    Message
        Rename    type                       KubeEventType
        Rename    firstTimestamp             FirstSeen
        Rename    lastTimestamp              LastSeen
        Remove    metadata
        Remove    involvedObject
        Remove    source
        Remove    eventTime
        Remove    involvedObject_resourceVersion
        Remove    involvedObject_uid
        Remove    involvedObject_fieldPath
        Remove    involvedObject_labels
        Remove    involvedObject_annotations
        Remove    metadata_name
        Remove    metadata_namespace
        Remove    metadata_uid
        Remove    metadata_resourceVersion
        Remove    metadata_managedFields

  output-kubernetes.conf: |
    [OUTPUT]
        Name                        azure_kusto
        Match                       kubernetes.events.*
        Tenant_Id                   ${FLUENT_ADX_TENANT_ID}
        Client_Id                   ${FLUENT_ADX_CLIENT_ID}
        Client_Secret               ${FLUENT_ADX_CLIENT_SECRET}
        Ingestion_Endpoint          https://ingest-adxaks.northeurope.kusto.windows.net
        Database_Name               Kubernetes
        Table_Name                  KubeEvents
        Ingestion_Mapping_Reference FluentBitMappingEvents
        Log_Key                     log
        Include_Tag_Key             Off
        Include_Time_Key            On
        Time_Key                    TimeGenerated
        Retry_Limit                 False
        Log_Level                   info
        compression_enabled         on
        ingestion_endpoint_connect_timeout 60
        ingestion_resources_refresh_interval 3600
        # buffering_enabled false

Before we roll out the Fluent Bit deployment, we prepare the Azure Data Explorer side with a new table called KubeEvents in the Kubernetes database.

.create table KubeEvents (
    TimeGenerated: datetime, Namespace: string, Name: string, Kind: string, ApiVersion: string, KubeEventType: string, Action: string,
    Reason: string, Message: string, Count: string, CreationTimestamp: datetime, FirstSeen: datetime, LastSeen: datetime,
    SourceComponent: string, SourceComputer: string, ReportingComponent: string, ReportingComputer: string,
    Cluster: string, Region: string, Environment: string
    )

Afterwards, we set the ingestion mapping.

.create-or-alter table KubeEvents ingestion json mapping "FluentBitMappingEvents"
    ```[
    {"column": "TimeGenerated", "datatype": "datetime", "Properties": {"Path": "$.TimeGenerated"}},
    {"column": "Namespace", "datatype": "string", "Properties": {"Path": "$.log.Namespace"}},
    {"column": "Name", "datatype": "string", "Properties": {"Path": "$.log.Name"}},
    {"column": "Kind", "datatype": "string", "Properties": {"Path": "$.log.Kind"}},
    {"column": "ApiVersion", "datatype": "string", "Properties": {"Path": "$.log.ApiVersion"}},
    {"column": "KubeEventType", "datatype": "string", "Properties": {"Path": "$.log.KubeEventType"}},
    {"column": "Action", "datatype": "string", "Properties": {"Path": "$.log.Action"}},
    {"column": "Reason", "datatype": "string", "Properties": {"Path": "$.log.Reason"}},
    {"column": "Message", "datatype": "string", "Properties": {"Path": "$.log.Message"}},
    {"column": "Count", "datatype": "string", "Properties": {"Path": "$.log.Count"}},
    {"column": "CreationTimestamp", "datatype": "datetime", "Properties": {"Path": "$.log.CreationTimestamp"}},
    {"column": "FirstSeen", "datatype": "datetime", "Properties": {"Path": "$.log.FirstSeen"}},
    {"column": "LastSeen", "datatype": "datetime", "Properties": {"Path": "$.log.LastSeen"}},
    {"column": "SourceComponent", "datatype": "string", "Properties": {"Path": "$.log.SourceComponent"}},
    {"column": "SourceComputer", "datatype": "string", "Properties": {"Path": "$.log.SourceComputer"}},
    {"column": "ReportingComponent", "datatype": "string", "Properties": {"Path": "$.log.ReportingComponent"}},
    {"column": "ReportingComputer", "datatype": "string", "Properties": {"Path": "$.log.ReportingComputer"}},
    {"column": "Cluster", "datatype": "string", "Properties": {"Path": "$.log.Cluster"}},
    {"column": "Region", "datatype": "string", "Properties": {"Path": "$.log.Region"}},
    {"column": "Environment", "datatype": "string", "Properties": {"Path": "$.log.Environment"}},
    ]```

Rollout

Once everything is prepared, we roll out the Fluent Bit deployment to gather Kubernetes events and ingest them to the Azure Data Explorer cluster.

❯ ./deploy.sh TENANT_ID CLIENT_ID CLIENT_SECRET
❯ ./deploy.sh 00000000-0000-0000-0000-000000000000 00000000-0000-0000-0000-000000000000 Pa$$W0rd

As seen in the screenshot below, the single Fluent Bit pod for the Kubernetes event gathering is running in our Azure Kubernetes Service cluster.

Besides that, we see the first data flowing into the Azure Data Explorer table.

Summary

Since Fluent Bit switched to the Kubernetes watch stream the configuration for the input plugin is straightforward. The only challenge is the configuration of an external storage to hold the database to keep record which Kubernetes events have been processed already.

The examples can be found on my GitHub repository.

-> https://github.com/neumanndaniel/scripts/blob/main/Azure_Data_Explorer/Fluent_Bit_Kubernetes/Kubernetes_Events_ADX_Output.kql
-> https://github.com/neumanndaniel/kubernetes/tree/master/fluent-bit/azure-data-explorer-kubernetes-events