Log Ingestion Lag in Cloud-Based SIEMs

Lag in log ingestion is a topic that comes up now and then in our Azure Sentinel design discussions with our customers. We even had concerns around the speed of light being a constrain for certain critical log sources. These would be valid for security controls designed to protect strategic infrastructure but one has to find the right tool for the job.

To level the expectations, any cloud-based analytics solution will be affected by it as the on-premises logs have to travel a longer, slower path. In additional to that, a large, powerful, distributed cloud environment (like Azure, AWS, GCS, etc.) will experience delays in synchronization and replication of data within the cloud itself. The cloud vendors are cautious around committing to SLAs around cloud ingestion delays just because it depends on so many factors.

In today’s hybrid infrastructure, the lag would affect the on-prem SIEMs as well as the logs from cloud-based applications would have to be ingested using the same path. With more and more parts of the IT infrastructure moving to the cloud, an on-prem analytical platform will eventually become a liability.

That being said, a table-top IR exercise would probably show that during a security incident, there are other, more significant delays, brought it by the uncertainties on the type of incident, validation of on security event vs. security incident, who should be involved, so on and so forth. The lag in log ingestion and the associated delay in the detection would rarely be brought up as a concern. It may become a concern when the maturity level of the incident response is approaching that mythical level 5.

To take Azure Sentinel as an example, the ingestion delay depends on the time of the day as well, not just the table. There is an agent collection lag, an agent processing lag, a lag for parsing/enriching in Azure lag and an Azure indexing lag (until the record gets into Log Analytics). The metadata around each log entry has a certain level of details around the discrepancies in log ingestion/indexing (see the article mentioned below) that can be used to add additional precision on when the actual event happened.

This naturally combines with the challenges around sliding windows – see Handling sliding windows in Azure Sentinel rules from Ofer Shezaf of Microsoft.

At BlueVoyant, we are continuously working on streamlining the response and the detection process. Alexandre Teixeira, SIEM engineer extraordinaire, has recently joined BlueVoyant and wrote an article on some of the approaches in mitigating potential log ingestion lag for cloud-based security controls – see Different SIEMs, Same Challenges? Only Time(Generated) will tell… – on ingestion vs. generated timestamps, comparison with other SIEM platforms and several other considerations.