Designing the Azure Data Explorer table structure for Azure Diagnostic Logs or Defender for Cloud data ingestion

In my recent blog posts about Azure Data Explorer, I wrote about Activity Logs and Diagnostic Logs ingestion.

-> https://www.danielstechblog.io/ingesting-azure-diagnostic-logs-into-azure-data-explorer/
-> https://www.danielstechblog.io/export-azure-kubernetes-service-control-plane-logs-to-azure-data-explorer/

Today, I would like to discuss how to design the Azure Data Explorer table for the Diagnostic Logs or Defender for Cloud log data ingestion. Depending on your preferences, you can choose between a generic design that covers all Diagnostic Logs data from different services, or a more customized design that supports you in your day-to-day business.

We have a look today at the latter one, creating a customized table design. As an example, we use Microsoft Defender for Cloud alerts that are ingested via an Event Hub into our Azure Data Explorer cluster.

First, we configure the continuous export of the Microsoft Defender for Cloud alerts to an Event Hub. This can be done via the Azure portal or Infrastructure as Code. In our example, we use Infrastructure as Code using the following Terraform module.

-> https://github.com/neumanndaniel/terraform/tree/master/modules/microsoft_defender_continuous_export

module "microsoft_defender_continuous_export_eventhub" {
  source = "../modules/microsoft_defender_continuous_export"

  name                       = "ExportToEventHub"
  resource_group_name        = "continuous-export-config"
  location                   = "northeurope"
  type                       = "eventhub"
  eventhub_id                = "/subscriptions/<subscription_id>/resourceGroups/adx/providers/Microsoft.EventHub/namespaces/adxaks/eventhubs/microsoft_defender_for_cloud"
  eventhub_connection_string = "Endpoint=sb://adxaks.servicebus.windows.net/;SharedAccessKeyName=mdfc_send;SharedAccessKey=<shared_access_key>;EntityPath=microsoft_defender_for_cloud"
}

Prerequisites

Our first step is the provisioning of a new database in our Azure Data Explorer cluster, which is called MDfC.

-> https://learn.microsoft.com/en-us/azure/data-explorer/create-cluster-and-database?WT.mc_id=AZ-MVP-5000119&tabs=full#create-a-database

Afterward, we open the query editor and create the table MDfCRawRecords.

.alter database MDfC policy streamingingestion enable

.create table MDfCRawRecords (Records: dynamic)

.create table MDfCRawRecords ingestion json mapping 'MDfCRawRecordsMapping' '[{"column":"Records","Properties":{"path":"$"}}]'

This table contains the security alert records as they arrive from the Event Hub into the Azure Data Explorer cluster.

Before we can start with the design and definition of our table holding the final result set, we connect the Event Hub with the Azure Data Explorer cluster.

-> https://learn.microsoft.com/en-us/azure/data-explorer/create-event-hubs-connection?WT.mc_id=AZ-MVP-5000119&tabs=portalADX%2Cget-data-2

Designing the Azure Data Explorer table

Microsoft Defender for Cloud has a nice feature to trigger sample security alerts to generate alerts for testing.

-> https://learn.microsoft.com/en-us/azure/defender-for-cloud/alert-validation?WT.mc_id=AZ-MVP-5000119

We use this functionality to populate security alert data into the MDfCRawRecords table.

Once the data has arrived in our Azure Data Explorer table, we examine the structure to design our final table MDfCSecurityAlerts.

As you might have recognized we use for the ingestion mapping into the Records column the root path $. This allows us to retrieve the JSON data as it is emitted from Microsoft Defender for Cloud. For instance, Azure Diagnostic Logs are emitted differently to an Event Hub under the path $.records. Also, in this case, the root path $ works.

A fallback solution to retrieve the emitted data structure is the usage of the Data Explorer in an Event Hub.

This comes in handy when the expected data structure is different from the emitted data structure and the ingestion mapping does not work.

So, after examining the data structure we define our final table MDfCSecurityAlerts and the corresponding KQL function for the data ingestion into this table.

Let us start with the KQL function.

.create function SecurityAlertRecordsExpand() {
        MDfCRawRecords
        | extend events = Records
        | where isnotnull(events.Severity) and isnotnull(events.SystemAlertId)
        | project
            TimeGenerated = todatetime(events.TimeGenerated),
            StartTimeUtc = todatetime(events.StartTimeUtc),
            EndTimeUtc = todatetime(events.EndTimeUtc),
            Status = tostring(events.Status),
            Severity = tostring(events.Severity),
            CompromisedEntity = tostring(events.CompromisedEntity),
            Intent = tostring(events.Intent),
            AlertType = tostring(events.AlertType),
            AlertName = tostring(events.AlertDisplayName),
            AlertDescription = tostring(events.Description),
            AlertId = tostring(events.SystemAlertId),
            VendorName = tostring(events.VendorName),
            ResourceId = tostring(events.AzureResourceId),
            Properties = events.ExtendedProperties,
            Link = tostring(events.AlertUri),
            Incident = tobool(events.IsIncident)
    }

The KQL function queries the MDfCRawRecords table and uses the extend command to map the Records column onto events. For Azure Diagnostic Logs you would use mv-expand as the Records column would include several entries you would like to have an own row for each entry. But in the case of the Microsoft Defender for Cloud security alerts, we must use extend.

Before we use the project command to map the results to our table structure we use a filter. The filter is necessary as we might want to emit additional Microsoft Defender for Cloud data to the Event Hub and therefore end up in the MDfCRawRecords table and those data entries should not end up in the MDfCSecurityAlerts table.

After creating the KQL function we finally can create the table MDfCSecurityAlerts.

.create table MDfCSecurityAlerts (
    TimeGenerated: datetime, StartTimeUtc: datetime, EndTimeUtc: datetime,
    Status: string, Severity: string, CompromisedEntity: string,
    Intent: string, AlertType: string, AlertName: string, AlertDescription: string,
    AlertId: string, VendorName: string, ResourceId: string, Properties: dynamic,
    Link: string, Incident: bool
    )

Last but not least we set the retention on the MDfCRawRecords table to 0 days.

.alter-merge table MDfCRawRecords policy retention softdelete = 0d

We do not want to keep records in this table as we use the following update policy to execute the KQL function when new data arrives in the MDfCRawRecords table to ingest the data to the MDfCSecurityAlerts table.

.alter table MDfCSecurityAlerts policy update @'[{"Source": "MDfCRawRecords", "Query": "SecurityAlertRecordsExpand()", "IsEnabled": "True", "IsTransactional": true}]'

Summary

Using an Azure Data Explorer cluster as a target for Azure Diagnostic Logs or Microsoft Defender for Cloud data instead of Azure Log Analytics has the advantage of customizing the final table to support the day-to-day business. On the other side, it requires a bit more upfront work to get to this state but it will pay off.

The example KQL file and the Terraform module can be found on my GitHub repository.

-> https://github.com/neumanndaniel/scripts/blob/main/Azure_Data_Explorer/Microsoft_Defender_for_Cloud/Security_Alerts.kql
-> https://github.com/neumanndaniel/terraform/tree/master/modules/microsoft_defender_continuous_export