Automating Alert Resolution in AWS Security Hub

In our AWS Organization, we relied on a central GitLab instance for shared CI/CD tooling. A key feature was the use of ephemeral, dynamically-created GitLab Runners to execute build and deployment tasks. While this was efficient, it created a significant operational issue: alert fatigue.

Every time a runner was created and destroyed, AWS Security Hub would generate findings. However, the alerts for these terminated resources would persist long after the instance was gone. This flooded our Security Operations Center (SOC) with alerts for resources that no longer existed, creating significant noise and wasting valuable time as they investigated these false positives.

Technologies Used

  • AWS Services: AWS Security Hub, AWS IAM
  • Languages: Python (using the boto3 SDK)

The Automated Solution

To solve this, I developed a Python script to automatically identify and archive these specific stale findings. The script uses the boto3 SDK to interact directly with the Security Hub API.

Its logic is precise, designed to only target findings for resources that are confirmed to be terminated:

  1. Secure Authentication: The script requests an argument from the user to select an AWS profile, ensuring it runs with the intended permissions in the correct account.

  2. Targeted Filtering: It queries the Security Hub get_findings endpoint with a specific set of filters to isolate the irrelevant alerts:
    • WorkflowStatus: Must be NEW or NOTIFIED (i.e., not yet actioned).
    • RecordState: Must be ACTIVE (the finding itself is still considered relevant by Security Hub).
    • ComplianceStatus: The key filter, which must be NOT_AVAILABLE. This status explicitly indicates that Security Hub could not find the resource to perform its compliance check, which is a strong signal that the resource has been terminated.
  3. Automated Archiving: For every finding that matches this precise filter combination, the script automatically updates its workflow status to ARCHIVED. This removes it from the main dashboard, effectively resolving the alert without deleting the record.

This targeted approach ensured that we could safely and efficiently clean up the alert dashboard without risking the accidental removal of genuine security findings.

Challenges & Lessons Learned

The primary challenge was ensuring the script was safe. Accidentally archiving a valid security threat would be a major risk. This meant spending significant time in a development account, testing filter combinations to guarantee that only alerts for terminated resources were targeted.

Furthermore, our Security Hub had over 10,000 active findings. I learned the importance of handling AWS API pagination correctly using the NextToken parameter in the boto3 client to ensure the script processed every single finding and didn’t miss any.

Outcomes & Impact

The script was an immediate success and had a significant, measurable impact on our security operations:

  • Noise Reduction: On its initial run, the script cleared over 4,000 stale findings, instantly improving the signal-to-noise ratio.
  • Increased SOC Efficiency: We reduced the number of daily false-positive alerts by over 90%, saving the SOC an estimated 3-5 hours per week previously spent investigating non-existent resources.
  • Operationalized Automation: Due to its success, the script was later integrated into a weekly scheduled AWS Lambda function, ensuring Security Hub remains clean continuously with zero manual effort.

2025

Welcome to my Portfolio

less than 1 minute read

Hi, I’m Sam! Welcome to my portfolio. I created this site to document and showcase my journey in AWS by building practical, hands-on solutions.

Back to top ↑