For a while now Fluent Bit has a new input plugin that allows us to gather Kubernetes events, modify, and ingest them into the logging backend.
-> https://docs.fluentbit.io/manual/pipeline/inputs/kubernetes-events
Today we look at how to configure and deploy Fluent Bit to gather Kubernetes events on an Azure Kubernetes Service cluster and ingest them into an Azure Data Explorer cluster.
Deployment
Fluent Bit runs per default as a Kubernetes daemon set on every node in a Kubernetes cluster to gather container logs. The Kubernetes Events input plugin should not be configured, at the time of writing, on a Fluent Bit daemon set installation as the input plugin does not have a leader election functionality. Hence, we would gather the same Kubernetes events over and over again.
-> https://github.com/fluent/fluent-bit/discussions/6942
The only viable option for the Kubernetes Events input plugin is a Kubernetes deployment with a single replica.
Furthermore, we need external storage for the database that the Kubernetes Events input plugin uses to track the state of events that have already been gathered.
In the case of an Azure Kubernetes Service cluster, I have chosen in this example an Azure File Share as an external storage. Unfortunately, we cannot use one of the already existing storage classes as they are all missing an important configuration parameter.
The nobrl parameter must be set, otherwise Fluent Bit will complain about a locked database. nobrl is used to avoid sending byte range lock requests to the server.
apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: name: azurefile-csi-fluent-bit provisioner: file.csi.azure.com reclaimPolicy: Delete volumeBindingMode: Immediate allowVolumeExpansion: true mountOptions: - mfsymlinks - actimeo=30 - nosharesock - nobrl # nobrl is required for Fluent Bit to work correctly parameters: skuName: Standard_LRS
With the above-mentioned storage class, we hand over the Azure Storage Account creation to Azure. So, no pre-provisioning is required, and the Storage Account will be created within the Azure Kubernetes Service node resource group.
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: fluent-bit-kubernetes-events
spec:
accessModes:
- ReadWriteMany
storageClassName: azurefile-csi-fluent-bit
resources:
requests:
storage: 5Gi
For the persistent volume claim, that represents the Azure File Share, we choose 5 GB as the initial storage capacity.
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: fluent-bit-kubernetes-events
version: v3.2.3
kubernetes.io/cluster-service: "true"
name: fluent-bit-kubernetes-events
namespace: logging
spec:
replicas: 1
strategy:
type: Recreate
selector:
matchLabels:
app: fluent-bit-kubernetes-events
template:
metadata:
labels:
app: fluent-bit-kubernetes-events
version: v3.2.3
kubernetes.io/cluster-service: "true"
spec:
terminationGracePeriodSeconds: 75
containers:
- name: fluent-bit-kubernetes-events
image: cr.fluentbit.io/fluent/fluent-bit:3.2.3
imagePullPolicy: IfNotPresent
ports:
- containerPort: 2020
livenessProbe:
httpGet:
path: /api/v1/health
port: 2020
failureThreshold: 3
initialDelaySeconds: 60
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
env:
- name: FLUENT_ADX_TENANT_ID
valueFrom:
secretKeyRef:
name: azuredataexplorer
key: tenant_id
- name: FLUENT_ADX_CLIENT_ID
valueFrom:
secretKeyRef:
name: azuredataexplorer
key: client_id
- name: FLUENT_ADX_CLIENT_SECRET
valueFrom:
secretKeyRef:
name: azuredataexplorer
key: client_secret
- name: CLUSTER
value: aks-azst-1
- name: REGION
value: northeurope
- name: ENVIRONMENT
value: prod
- name: NODE_IP
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: status.hostIP
volumeMounts:
- name: fluent-bit-kubernetes-events-config
mountPath: /fluent-bit/etc/
- name: fluent-bit-kubernetes-events-data
mountPath: /fluent-bit/data/
resources:
limits:
cpu: 500m
memory: 750Mi
requests:
cpu: 75m
memory: 325Mi
securityContext:
runAsNonRoot: true
runAsUser: 65534
runAsGroup: 65534
readOnlyRootFilesystem: true
allowPrivilegeEscalation: false
volumes:
- name: fluent-bit-kubernetes-events-config
configMap:
name: fluent-bit-kubernetes-events-config
- name: fluent-bit-kubernetes-events-data
persistentVolumeClaim:
claimName: fluent-bit-kubernetes-events
serviceAccountName: fluent-bit-kubernetes-events
priorityClassName: system-cluster-critical
The deployment is kept simple and only has three specific configurations.
First, the increased termination grace period to provide Fluent Bit with enough time to shut down during the pod termination phase.
Second, the priority class as we do not want that our Fluent Bit deployment will be evicted from the node by the scheduler when pods with higher priority are scheduled under normal configuration circumstances.
Third, we use the recreate strategy to prevent interference between two pods accessing the database simultaneously.
Configuration
As of version 3.1, Fluent Bit uses a Kubernetes watch stream to retrieve Kubernetes events via the input plugin. Hence, we use the default configuration for the input plugin, followed by several filters to prepare the data for the Azure Data Explorer output plugin.
apiVersion: v1
kind: ConfigMap
metadata:
name: fluent-bit-kubernetes-events-config
namespace: logging
data:
# General settings
# ======================================================
fluent-bit.conf: |
[SERVICE]
Flush 15
# Ensures that log chunks, where the flush failed previously, are flushed on container termination
Grace 60
Log_Level info
Daemon Off
HTTP_Server On
HTTP_Listen 0.0.0.0
HTTP_Port 2020
Health_Check On
HC_Errors_Count 5
HC_Retry_Failure_Count 5
HC_Period 60
# Backpressue fallback
storage.path /fluent-bit/data/flb-storage/
storage.sync normal
storage.checksum off
storage.backlog.mem_limit 50M
@INCLUDE input-kubernetes.conf
@INCLUDE filter-kubernetes.conf
@INCLUDE output-kubernetes.conf
# Kuberetes Events configuration
# ======================================================
input-kubernetes.conf: |
[INPUT]
Name kubernetes_events
Alias events_input
Tag kubernetes.events.*
DB /fluent-bit/data/flb_kubernetes_events.db
DB.sync normal
kube_retention_time 1h
Log_Level warning
filter-kubernetes.conf: |
[FILTER]
Name nest
Alias events_filter_1
Match kubernetes.events.*
Operation lift
Nested_under involvedObject
Add_prefix involvedObject_
[FILTER]
Name nest
Alias events_filter_2
Match kubernetes.events.*
Operation lift
Nested_under source
Add_prefix source_
[FILTER]
Name nest
Alias events_filter_3
Match kubernetes.events.*
Operation lift
Nested_under metadata
Add_prefix metadata_
[FILTER]
Name modify
Alias events_filter_4
Match kubernetes.events.*
Condition Key_does_not_exist source_host
Add source_host ""
[FILTER]
Name modify
Alias events_filter_5
Match kubernetes.events.*
Add Cluster ${CLUSTER}
Add Region ${REGION}
Add Environment ${ENVIRONMENT}
Rename metadata_creationTimestamp CreationTimestamp
Rename source_component SourceComponent
Rename source_host SourceComputer
Rename reportingComponent ReportingComponent
Rename reportingInstance ReportingComputer
Rename involvedObject_kind Kind
Rename involvedObject_apiVersion ApiVersion
Rename involvedObject_name Name
Rename involvedObject_namespace Namespace
Rename count Count
Rename action Action
Rename reason Reason
Rename message Message
Rename type KubeEventType
Rename firstTimestamp FirstSeen
Rename lastTimestamp LastSeen
Remove metadata
Remove involvedObject
Remove source
Remove eventTime
Remove involvedObject_resourceVersion
Remove involvedObject_uid
Remove involvedObject_fieldPath
Remove involvedObject_labels
Remove involvedObject_annotations
Remove metadata_name
Remove metadata_namespace
Remove metadata_uid
Remove metadata_resourceVersion
Remove metadata_managedFields
output-kubernetes.conf: |
[OUTPUT]
Name azure_kusto
Match kubernetes.events.*
Tenant_Id ${FLUENT_ADX_TENANT_ID}
Client_Id ${FLUENT_ADX_CLIENT_ID}
Client_Secret ${FLUENT_ADX_CLIENT_SECRET}
Ingestion_Endpoint https://ingest-adxaks.northeurope.kusto.windows.net
Database_Name Kubernetes
Table_Name KubeEvents
Ingestion_Mapping_Reference FluentBitMappingEvents
Log_Key log
Include_Tag_Key Off
Include_Time_Key On
Time_Key TimeGenerated
Retry_Limit False
Log_Level info
compression_enabled on
ingestion_endpoint_connect_timeout 60
ingestion_resources_refresh_interval 3600
# buffering_enabled false
Before we roll out the Fluent Bit deployment, we prepare the Azure Data Explorer side with a new table called KubeEvents in the Kubernetes database.
.create table KubeEvents (
TimeGenerated: datetime, Namespace: string, Name: string, Kind: string, ApiVersion: string, KubeEventType: string, Action: string,
Reason: string, Message: string, Count: string, CreationTimestamp: datetime, FirstSeen: datetime, LastSeen: datetime,
SourceComponent: string, SourceComputer: string, ReportingComponent: string, ReportingComputer: string,
Cluster: string, Region: string, Environment: string
)
Afterwards, we set the ingestion mapping.
.create-or-alter table KubeEvents ingestion json mapping "FluentBitMappingEvents"
```[
{"column": "TimeGenerated", "datatype": "datetime", "Properties": {"Path": "$.TimeGenerated"}},
{"column": "Namespace", "datatype": "string", "Properties": {"Path": "$.log.Namespace"}},
{"column": "Name", "datatype": "string", "Properties": {"Path": "$.log.Name"}},
{"column": "Kind", "datatype": "string", "Properties": {"Path": "$.log.Kind"}},
{"column": "ApiVersion", "datatype": "string", "Properties": {"Path": "$.log.ApiVersion"}},
{"column": "KubeEventType", "datatype": "string", "Properties": {"Path": "$.log.KubeEventType"}},
{"column": "Action", "datatype": "string", "Properties": {"Path": "$.log.Action"}},
{"column": "Reason", "datatype": "string", "Properties": {"Path": "$.log.Reason"}},
{"column": "Message", "datatype": "string", "Properties": {"Path": "$.log.Message"}},
{"column": "Count", "datatype": "string", "Properties": {"Path": "$.log.Count"}},
{"column": "CreationTimestamp", "datatype": "datetime", "Properties": {"Path": "$.log.CreationTimestamp"}},
{"column": "FirstSeen", "datatype": "datetime", "Properties": {"Path": "$.log.FirstSeen"}},
{"column": "LastSeen", "datatype": "datetime", "Properties": {"Path": "$.log.LastSeen"}},
{"column": "SourceComponent", "datatype": "string", "Properties": {"Path": "$.log.SourceComponent"}},
{"column": "SourceComputer", "datatype": "string", "Properties": {"Path": "$.log.SourceComputer"}},
{"column": "ReportingComponent", "datatype": "string", "Properties": {"Path": "$.log.ReportingComponent"}},
{"column": "ReportingComputer", "datatype": "string", "Properties": {"Path": "$.log.ReportingComputer"}},
{"column": "Cluster", "datatype": "string", "Properties": {"Path": "$.log.Cluster"}},
{"column": "Region", "datatype": "string", "Properties": {"Path": "$.log.Region"}},
{"column": "Environment", "datatype": "string", "Properties": {"Path": "$.log.Environment"}},
]```
Rollout
Once everything is prepared, we roll out the Fluent Bit deployment to gather Kubernetes events and ingest them to the Azure Data Explorer cluster.
❯ ./deploy.sh TENANT_ID CLIENT_ID CLIENT_SECRET ❯ ./deploy.sh 00000000-0000-0000-0000-000000000000 00000000-0000-0000-0000-000000000000 Pa$$W0rd
As seen in the screenshot below, the single Fluent Bit pod for the Kubernetes event gathering is running in our Azure Kubernetes Service cluster.
Besides that, we see the first data flowing into the Azure Data Explorer table.
Summary
Since Fluent Bit switched to the Kubernetes watch stream the configuration for the input plugin is straightforward. The only challenge is the configuration of an external storage to hold the database to keep record which Kubernetes events have been processed already.
The examples can be found on my GitHub repository.
-> https://github.com/neumanndaniel/scripts/blob/main/Azure_Data_Explorer/Fluent_Bit_Kubernetes/Kubernetes_Events_ADX_Output.kql
-> https://github.com/neumanndaniel/kubernetes/tree/master/fluent-bit/azure-data-explorer-kubernetes-events

