For a while now Fluent Bit has a new input plugin that allows us to gather Kubernetes events, modify, and ingest them into the logging backend.
-> https://docs.fluentbit.io/manual/pipeline/inputs/kubernetes-events
Today we look at how to configure and deploy Fluent Bit to gather Kubernetes events on an Azure Kubernetes Service cluster and ingest them into an Azure Data Explorer cluster.
Deployment
Fluent Bit runs per default as a Kubernetes daemon set on every node in a Kubernetes cluster to gather container logs. The Kubernetes Events input plugin should not be configured, at the time of writing, on a Fluent Bit daemon set installation as the input plugin does not have a leader election functionality. Hence, we would gather the same Kubernetes events over and over again.
-> https://github.com/fluent/fluent-bit/discussions/6942
The only viable option for the Kubernetes Events input plugin is a Kubernetes deployment with a single replica.
Furthermore, we need external storage for the database that the Kubernetes Events input plugin uses to track the state of events that have already been gathered.
In the case of an Azure Kubernetes Service cluster, I have chosen in this example an Azure File Share as an external storage. Unfortunately, we cannot use one of the already existing storage classes as they are all missing an important configuration parameter.
The nobrl parameter must be set, otherwise Fluent Bit will complain about a locked database. nobrl is used to avoid sending byte range lock requests to the server.
apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: name: azurefile-csi-fluent-bit provisioner: file.csi.azure.com reclaimPolicy: Delete volumeBindingMode: Immediate allowVolumeExpansion: true mountOptions: - mfsymlinks - actimeo=30 - nosharesock - nobrl # nobrl is required for Fluent Bit to work correctly parameters: skuName: Standard_LRS
With the above-mentioned storage class, we hand over the Azure Storage Account creation to Azure. So, no pre-provisioning is required, and the Storage Account will be created within the Azure Kubernetes Service node resource group.
apiVersion: v1 kind: PersistentVolumeClaim metadata: name: fluent-bit-kubernetes-events spec: accessModes: - ReadWriteMany storageClassName: azurefile-csi-fluent-bit resources: requests: storage: 5Gi
For the persistent volume claim, that represents the Azure File Share, we choose 5 GB as the initial storage capacity.
apiVersion: apps/v1 kind: Deployment metadata: labels: app: fluent-bit-kubernetes-events version: v3.2.3 kubernetes.io/cluster-service: "true" name: fluent-bit-kubernetes-events namespace: logging spec: replicas: 1 strategy: type: Recreate selector: matchLabels: app: fluent-bit-kubernetes-events template: metadata: labels: app: fluent-bit-kubernetes-events version: v3.2.3 kubernetes.io/cluster-service: "true" spec: terminationGracePeriodSeconds: 75 containers: - name: fluent-bit-kubernetes-events image: cr.fluentbit.io/fluent/fluent-bit:3.2.3 imagePullPolicy: IfNotPresent ports: - containerPort: 2020 livenessProbe: httpGet: path: /api/v1/health port: 2020 failureThreshold: 3 initialDelaySeconds: 60 periodSeconds: 10 successThreshold: 1 timeoutSeconds: 1 env: - name: FLUENT_ADX_TENANT_ID valueFrom: secretKeyRef: name: azuredataexplorer key: tenant_id - name: FLUENT_ADX_CLIENT_ID valueFrom: secretKeyRef: name: azuredataexplorer key: client_id - name: FLUENT_ADX_CLIENT_SECRET valueFrom: secretKeyRef: name: azuredataexplorer key: client_secret - name: CLUSTER value: aks-azst-1 - name: REGION value: northeurope - name: ENVIRONMENT value: prod - name: NODE_IP valueFrom: fieldRef: apiVersion: v1 fieldPath: status.hostIP volumeMounts: - name: fluent-bit-kubernetes-events-config mountPath: /fluent-bit/etc/ - name: fluent-bit-kubernetes-events-data mountPath: /fluent-bit/data/ resources: limits: cpu: 500m memory: 750Mi requests: cpu: 75m memory: 325Mi securityContext: runAsNonRoot: true runAsUser: 65534 runAsGroup: 65534 readOnlyRootFilesystem: true allowPrivilegeEscalation: false volumes: - name: fluent-bit-kubernetes-events-config configMap: name: fluent-bit-kubernetes-events-config - name: fluent-bit-kubernetes-events-data persistentVolumeClaim: claimName: fluent-bit-kubernetes-events serviceAccountName: fluent-bit-kubernetes-events priorityClassName: system-cluster-critical
The deployment is kept simple and only has three specific configurations.
First, the increased termination grace period to provide Fluent Bit with enough time to shut down during the pod termination phase.
Second, the priority class as we do not want that our Fluent Bit deployment will be evicted from the node by the scheduler when pods with higher priority are scheduled under normal configuration circumstances.
Third, we use the recreate strategy to prevent interference between two pods accessing the database simultaneously.
Configuration
As of version 3.1, Fluent Bit uses a Kubernetes watch stream to retrieve Kubernetes events via the input plugin. Hence, we use the default configuration for the input plugin, followed by several filters to prepare the data for the Azure Data Explorer output plugin.
apiVersion: v1 kind: ConfigMap metadata: name: fluent-bit-kubernetes-events-config namespace: logging data: # General settings # ====================================================== fluent-bit.conf: | [SERVICE] Flush 15 # Ensures that log chunks, where the flush failed previously, are flushed on container termination Grace 60 Log_Level info Daemon Off HTTP_Server On HTTP_Listen 0.0.0.0 HTTP_Port 2020 Health_Check On HC_Errors_Count 5 HC_Retry_Failure_Count 5 HC_Period 60 # Backpressue fallback storage.path /fluent-bit/data/flb-storage/ storage.sync normal storage.checksum off storage.backlog.mem_limit 50M @INCLUDE input-kubernetes.conf @INCLUDE filter-kubernetes.conf @INCLUDE output-kubernetes.conf # Kuberetes Events configuration # ====================================================== input-kubernetes.conf: | [INPUT] Name kubernetes_events Alias events_input Tag kubernetes.events.* DB /fluent-bit/data/flb_kubernetes_events.db DB.sync normal kube_retention_time 1h Log_Level warning filter-kubernetes.conf: | [FILTER] Name nest Alias events_filter_1 Match kubernetes.events.* Operation lift Nested_under involvedObject Add_prefix involvedObject_ [FILTER] Name nest Alias events_filter_2 Match kubernetes.events.* Operation lift Nested_under source Add_prefix source_ [FILTER] Name nest Alias events_filter_3 Match kubernetes.events.* Operation lift Nested_under metadata Add_prefix metadata_ [FILTER] Name modify Alias events_filter_4 Match kubernetes.events.* Condition Key_does_not_exist source_host Add source_host "" [FILTER] Name modify Alias events_filter_5 Match kubernetes.events.* Add Cluster ${CLUSTER} Add Region ${REGION} Add Environment ${ENVIRONMENT} Rename metadata_creationTimestamp CreationTimestamp Rename source_component SourceComponent Rename source_host SourceComputer Rename reportingComponent ReportingComponent Rename reportingInstance ReportingComputer Rename involvedObject_kind Kind Rename involvedObject_apiVersion ApiVersion Rename involvedObject_name Name Rename involvedObject_namespace Namespace Rename count Count Rename action Action Rename reason Reason Rename message Message Rename type KubeEventType Rename firstTimestamp FirstSeen Rename lastTimestamp LastSeen Remove metadata Remove involvedObject Remove source Remove eventTime Remove involvedObject_resourceVersion Remove involvedObject_uid Remove involvedObject_fieldPath Remove involvedObject_labels Remove involvedObject_annotations Remove metadata_name Remove metadata_namespace Remove metadata_uid Remove metadata_resourceVersion Remove metadata_managedFields output-kubernetes.conf: | [OUTPUT] Name azure_kusto Match kubernetes.events.* Tenant_Id ${FLUENT_ADX_TENANT_ID} Client_Id ${FLUENT_ADX_CLIENT_ID} Client_Secret ${FLUENT_ADX_CLIENT_SECRET} Ingestion_Endpoint https://ingest-adxaks.northeurope.kusto.windows.net Database_Name Kubernetes Table_Name KubeEvents Ingestion_Mapping_Reference FluentBitMappingEvents Log_Key log Include_Tag_Key Off Include_Time_Key On Time_Key TimeGenerated Retry_Limit False Log_Level info compression_enabled on ingestion_endpoint_connect_timeout 60 ingestion_resources_refresh_interval 3600 # buffering_enabled false
Before we roll out the Fluent Bit deployment, we prepare the Azure Data Explorer side with a new table called KubeEvents in the Kubernetes database.
.create table KubeEvents ( TimeGenerated: datetime, Namespace: string, Name: string, Kind: string, ApiVersion: string, KubeEventType: string, Action: string, Reason: string, Message: string, Count: string, CreationTimestamp: datetime, FirstSeen: datetime, LastSeen: datetime, SourceComponent: string, SourceComputer: string, ReportingComponent: string, ReportingComputer: string, Cluster: string, Region: string, Environment: string )
Afterwards, we set the ingestion mapping.
.create-or-alter table KubeEvents ingestion json mapping "FluentBitMappingEvents" ```[ {"column": "TimeGenerated", "datatype": "datetime", "Properties": {"Path": "$.TimeGenerated"}}, {"column": "Namespace", "datatype": "string", "Properties": {"Path": "$.log.Namespace"}}, {"column": "Name", "datatype": "string", "Properties": {"Path": "$.log.Name"}}, {"column": "Kind", "datatype": "string", "Properties": {"Path": "$.log.Kind"}}, {"column": "ApiVersion", "datatype": "string", "Properties": {"Path": "$.log.ApiVersion"}}, {"column": "KubeEventType", "datatype": "string", "Properties": {"Path": "$.log.KubeEventType"}}, {"column": "Action", "datatype": "string", "Properties": {"Path": "$.log.Action"}}, {"column": "Reason", "datatype": "string", "Properties": {"Path": "$.log.Reason"}}, {"column": "Message", "datatype": "string", "Properties": {"Path": "$.log.Message"}}, {"column": "Count", "datatype": "string", "Properties": {"Path": "$.log.Count"}}, {"column": "CreationTimestamp", "datatype": "datetime", "Properties": {"Path": "$.log.CreationTimestamp"}}, {"column": "FirstSeen", "datatype": "datetime", "Properties": {"Path": "$.log.FirstSeen"}}, {"column": "LastSeen", "datatype": "datetime", "Properties": {"Path": "$.log.LastSeen"}}, {"column": "SourceComponent", "datatype": "string", "Properties": {"Path": "$.log.SourceComponent"}}, {"column": "SourceComputer", "datatype": "string", "Properties": {"Path": "$.log.SourceComputer"}}, {"column": "ReportingComponent", "datatype": "string", "Properties": {"Path": "$.log.ReportingComponent"}}, {"column": "ReportingComputer", "datatype": "string", "Properties": {"Path": "$.log.ReportingComputer"}}, {"column": "Cluster", "datatype": "string", "Properties": {"Path": "$.log.Cluster"}}, {"column": "Region", "datatype": "string", "Properties": {"Path": "$.log.Region"}}, {"column": "Environment", "datatype": "string", "Properties": {"Path": "$.log.Environment"}}, ]```
Rollout
Once everything is prepared, we roll out the Fluent Bit deployment to gather Kubernetes events and ingest them to the Azure Data Explorer cluster.
❯ ./deploy.sh TENANT_ID CLIENT_ID CLIENT_SECRET ❯ ./deploy.sh 00000000-0000-0000-0000-000000000000 00000000-0000-0000-0000-000000000000 Pa$$W0rd
As seen in the screenshot below, the single Fluent Bit pod for the Kubernetes event gathering is running in our Azure Kubernetes Service cluster.
Besides that, we see the first data flowing into the Azure Data Explorer table.
Summary
Since Fluent Bit switched to the Kubernetes watch stream the configuration for the input plugin is straightforward. The only challenge is the configuration of an external storage to hold the database to keep record which Kubernetes events have been processed already.
The examples can be found on my GitHub repository.
-> https://github.com/neumanndaniel/scripts/blob/main/Azure_Data_Explorer/Fluent_Bit_Kubernetes/Kubernetes_Events_ADX_Output.kql
-> https://github.com/neumanndaniel/kubernetes/tree/master/fluent-bit/azure-data-explorer-kubernetes-events