Container security: SOC practitioner’s guide

DCDaniel C. · Head of Security Operations
Cloud Security Operations·11 min read

Most container security programs invest in image scanning and call it done, leaving the runtime layer where active threats actually execute underbuilt and unowned. This is the four-layer split, the ownership seams where incidents fall through, and what to press vendors on before you buy.

I've sat through more container security demos than I can count in the last year, and almost every one walked me through the same flow: vulnerability counts, a software bill of materials dashboard, a slide about admission control. Good slides. Then I asked the question I now open every container call with, which is what the product actually sees when a credential gets stolen and someone deploys a privileged pod into a node at 2am. The room usually goes quiet.

That silence exposes the gap. Container security is split across four layers, and most programs only operate in one of them, while the layer with the active threats, runtime, is the one SecOps owns and the one most teams underbuild. The line item I write checks for keeps growing, and I want to know what I'm actually buying across that split.

In Brief:

  • Container security spans four layers, but SecOps owns the one with active threats: runtime. Most programs invest heavily in image scanning and call container security done.
  • Unclear ownership at the seams is the most common failure mode. DevSecOps owns pre-production controls, while SecOps owns runtime detection and IR playbooks including tuning. Gaps appear where each side assumes the other has it covered.
  • Containers get attacked within the first 48 hours of deployment, and the evidence disappears when the pod terminates. Your detection has to be pre-tuned and your IR workflow has to capture runtime state before containment.
  • Kubernetes adds a control plane most SOC teams were never built to monitor. The API server audit log that records every exec and RBAC change is off by default in many managed services.

Container security is four layers, and SOCs operate in one

The textbook definition treats container security as protecting containerized applications across their lifecycle, from build through runtime. That definition is true but too flat for SOC use, because it hides the part a security operations center actually has to operate. Container security has four distinct layers with different owners and telemetry sources, and the threat profile changes by layer.

NIST SP 800-190 and NISTIR 8176 frame container security and define six countermeasure entities that practitioners collapse into four layers: image and supply chain, orchestration, runtime, and network. Container security requires more than image scanning program, and knowing which layer produces active threat evidence versus which one only tells you what might be exposed is the distinction that decides where SecOps spends its effort.

Runtime, where active threats live

Attacks execute at runtime, so this layer sits at the center of SecOps responsibility. The runtime anomaly set includes unexpected process execution and system calls, protected-file changes, writes to unusual locations, new listeners, unexpected network destinations, and malware storage.

Runtime coverage has to catch container escape and privilege escalation, including cryptomining, and MITRE T1611 covers the common escape vectors: privileged container abuse, syscall abuse via unshare and keyctl, and bind mounts to the host filesystem.

A 2025 cryptomining campaign used compromised cloud credentials to create more than 200 containers in a resource group, and static scanning misses this class of attack because some attacks only exploit running processes. Runtime detection tells you what is being exploited right now, which is exactly what posture management can't.

Orchestration, where configuration risk concentrates

Kubernetes concentrates misconfiguration risk in its control plane. The orchestrator decides which containers run where, monitors resource consumption, and exposes an API server that, if reachable, lets an attacker view containers, read secrets, and execute commands. Missing role-based access control (RBAC), an exposed API server, and unencrypted communications are the primary control-plane weaknesses.

The relevant MITRE techniques map cleanly. MITRE T1609 covers administration command abuse of the Docker daemon, Kubernetes API server, or kubelet to run commands inside a container, and T1098.006 covers RBAC abuse, where an adversary creates a RoleBinding or ClusterRoleBinding to maintain access. SecOps needs audit log coverage of the API server, RBAC change monitoring, and admission webhook telemetry in the same coverage plan. This layer is frequently undermonitored because Kubernetes audit logging is off by default in managed services like Amazon EKS, where control-plane logging has to be explicitly enabled.

Image and supply chain, where DevSecOps leads

Most programs already invest in image and supply chain controls, and that layer is primarily owned by DevSecOps and platform engineering. Teams use SBOMs and image signing with cosign or Notation, provenance attestations via Supply-chain Levels for Software Artifacts (SLSA), and admission policies requiring signed images, against concentrated risks like CVEs, hardcoded secrets, base image poisoning, and dependency tampering.

SecOps doesn't own this layer, but you have real stakes in it. For IR triage you consume image digests and signing or SBOM attestations, and when a runtime container is compromised, tracing it back to the specific build and CI/CD pipeline run depends entirely on the image digest the supply chain layer produced.

Network and service mesh, the lateral movement surface

East-west movement happens through Kubernetes networking and service-mesh paths, which is why the practitioner model separates network as its own layer. Kubernetes is open by default here, since a pod is non-isolated for egress and all outbound connections are allowed unless a NetworkPolicy both selects the pod and defines an egress rule. Flat pod-to-pod communication across namespaces is the common gap.

Detection changes with the control boundary. NetworkPolicy operates at L3/L4 and controls which pods talk to which based on IP and port, while a service mesh operates at L7, controlling traffic by service identity and providing mTLS. Network-level detection signals for east-west movement are exactly the signals most organizations struggle to see inside their environments.

SecOps owns runtime and IR, DevSecOps owns pre-production

Container security usually fails at ownership seams rather than tools. Responsibility is decentralized, and in Red Hat's 2024 survey only 18% identified security teams as most responsible for container and Kubernetes security, with DevOps cited most often. That decentralization is fine until an incident crosses a seam nobody owns.

Draw the line by function. DevSecOps owns pre-production: SAST, DAST, SCA, image validation, and IaC scanning. Security operations owns detection content and incident response playbooks, the tuning work to reduce false positives, and the investigation chain from a syscall event back to a pod UID. The handoff is where I've watched incidents fall through: the platform team assumes application teams constrained pod security contexts while application teams assume platform patched the kubelet, and 46% of organizations have already lost revenue or customers to a container incident that fell through one of these seams.

Runtime detection is the layer most programs underbuild

Active threats concentrate at runtime, but investment usually goes somewhere else. The posture/runtime divide is direct: posture tools show potential exposure, while runtime monitoring shows active exploitation. Traditional tools don't fill the gap either, and NIST warns that intrusion prevention systems (IPS) and web application firewalls (WAF) often lack the scale, rate-of-change handling, and container visibility these environments need.

The tools exist, and controlled testing has shown Falco, Tetragon, and Tracee can all detect the lab scenarios they were built to see. Running an eBPF detection pipeline seriously still takes two to three FTE of detection engineering, which normal SOC day-to-day can't absorb, and Falco out of the box generates noise that teams report as 80%+ false positives until rules are tuned over weeks.

Runtime readiness is more than lab escape detection

NIST recommends security information and event management (SIEM) integration so container events flow through the same processes as the rest of the environment. Beyond that, runtime readiness requires a documented rule set, CI tests for the detection logic, and an alert triage routine in the SOC, because a tool that fires in a lab but has no tuned ruleset or triage path behind it isn't a detection capability yet.

Kubernetes is a second operating system your SOC has to monitor

Operationally, Kubernetes behaves like a second operating system with its own attack surface, and traditional endpoint tools were designed for long-lived VMs, not control planes. Red Hat's 2024 survey found that 89% of organizations had a container or Kubernetes incident in the prior 12 months, with runtime incidents at 45% and misconfiguration detections at 40%. The control-plane exposures are specific: the API server runs on port 6443, anonymous login should be turned off, secrets aren't encrypted by default, and RBAC is additive-only with no deny rules.

The clock is short. Newly deployed AKS clusters can face probing within about 18 minutes of deployment, and Unit 42 reported service account token theft in 22% of cloud environments in 2025, including an exchange intrusion that scraped Kubernetes credentials and pivoted to steal millions in cryptocurrency. Turn on the API server audit log, because without it you can't detect misuse or unauthorized exec, and drift detection breaks too.

Container incident response needs a different workflow than traditional IR

Container IR breaks traditional playbooks because traditional IR assumes static hosts and persistent storage. Container IR failure modes include node-level access requirements, workload disruption, missing memory and runtime state, ephemeral or rescheduled pods, and weak evidence attribution in clusters. The chain-of-evidence requirement is unchanged, but you collect different evidence and you have to move faster, because evidence disappears when the pod terminates.

Capture runtime state before containment. The preservation sequence preserves the container filesystem snapshot and process list, then captures Kubernetes audit log events for the affected namespace, and identifies the image digest separately so you can trace the build. Worker-node evidence is your backstop, since the kernel doesn't respect container boundaries, so a connect() to an external IP or an execve() of a suspicious binary still appears in the kernel audit log, though correlating it to a pod requires cross-referencing /proc/<pid>/cgroup.

Put the container forensics commands in the playbook before you need them

Have the practical commands staged before an incident, not improvised during one. Use crictl ps and crictl inspect for runtime inspection, ss -tunap for active connections, and find /var/lib/containerd -type f -mmin -60 for recent file modifications, with eBPF as the forensic backstop that captures process trees and command-line arguments before the container terminates.

What to evaluate when buying container security tooling

Start a tooling evaluation with agentless versus agent-based, because this architectural fork determines whether you get runtime visibility at all. Agentless runtime visibility is the core limitation, since a vulnerability scanner can tell you a vulnerability exists but can't tell you whether someone is exploiting it right now. Agentless deploys faster and needs no agent maintenance, while agent-based gives you deep runtime telemetry and inline enforcement. If agents are a hard no, accept the runtime tradeoff explicitly rather than pretending posture coverage closes the gap.

In the proof of concept (PoC), press on these points:

  • Cross-layer correlation: Falco's known weakness is that you see container events but not how they connect to application or cloud activity, so a tool that detects a breach but can't show the post-escape attack path leaves you piecing it together by hand.
  • Detection versus response: many open-source tools are detection-only. Falco needs Talon and still has no reliable inline mitigation, and while Tetragon can enforce via bpf_send_signal(), override return is mostly turned off in production.
  • SIEM and SOAR (Security Orchestration, Automation, and Response) integration, validated in the PoC: cloud-native application protection platforms (CNAPPs) forward enriched signals into the SIEM, and with the average SOC already taking 70 minutes to investigate each alert, an unintegrated tool just adds another console.
  • Production false positives: all three open-source eBPF tools hit 100% detection and 0% false positives in a controlled lab, and those numbers do not hold in production. I don't trust production false-positive rates from slideware, so demand them from the vendor or run the PoC long enough to generate your own.

The vendor market is real and the differences are structural. Wiz is a posture-first CNAPP strong on attack-path prioritization but with passive runtime capabilities, while Sysdig Secure runs deep eBPF-based runtime telemetry. Prisma Cloud mixes agent and agentless and folds into Cortex XDR, and CrowdStrike Falcon Cloud and SentinelOne Singularity extend their endpoint and broader detection positions into the container layer. Verify pricing directly with each vendor instead of assuming it from market chatter.

At 2am, my team needs to know what it will see and what evidence will still exist when the pod is gone. If the answer stops at vulnerabilities, SBOMs, or posture dashboards, the runtime gap is still sitting with SecOps, and that is the gap to close before the next privileged pod lands on a node nobody is watching.

Frequently asked questions about container security

What does a SOC need to cover in container security?

Container security is the practice of protecting containerized applications across four layers: image and supply chain, orchestration, runtime, and network. Each layer has a different owner and threat profile. For a SOC, the load-bearing layer is runtime, where active threats like container escape and privilege escalation execute, including cryptomining.

How is container security different from VM security?

Containers share the host kernel, which means weaker isolation than VMs and a larger inter-object attack surface. They're also ephemeral and immutable, so a container might live 30 seconds and take its forensic evidence with it when it dies. Traditional endpoint tools built for long-lived VMs can't reliably attribute a suspicious process to one of 200 containers on a shared host, which is why syscall-layer monitoring via eBPF has become the standard alternative.

What does container runtime security detect?

Runtime security detects behaviors that only appear when a container is executing: container escape to the host, unexpected process execution, suspicious system calls, cryptomining, credential harvesting, and fileless malware run from memory. Tools like Falco, Tetragon, and Tracee watch kernel events via eBPF to catch these in real time, which static scanning cannot do because some attacks only exploit running processes.

Where should the SOC draw the line in container security?

The SOC owns runtime detection content and container incident response playbooks, including the tuning work to reduce false positives. The investigation chain from a runtime alert back to the specific pod and namespace belongs with the SOC too, as does the link back to the image digest. DevSecOps owns pre-production controls like image scanning and IaC validation, but runtime alerts and IR are SecOps responsibilities.

How should a SOC evaluate container security tools?

Start with the agentless versus agent-based decision, because agentless gives you no runtime visibility. Then evaluate cross-layer correlation, whether the tool offers response or detection only, SIEM and SOAR integration validated in a PoC, and real-world false positive rates rather than lab numbers. Named platforms worth evaluating include Wiz, Sysdig Secure, Prisma Cloud, CrowdStrike Falcon Cloud, and SentinelOne Singularity, each with different runtime depth.



About the author

DCDaniel C. is a security operations leader with over a decade of experience building and scaling SOC capabilities for cloud-native companies. He has led security teams through multiple stages of growth — from early-stage environments with minimal tooling to mature organizations operating 24/7 security operations with distributed teams. His experience includes designing SOC architectures, evaluating and managing MDR providers, and building internal detection and response capabilities. Daniel has been responsible for vendor selection across SIEM, EDR, and XDR platforms, as well as defining SLAs, response models, and escalation frameworks. He has also worked closely with executive leadership on budgeting, board reporting, and aligning security operations with broader business risk. He writes about the practical decisions security leaders face — including build vs buy tradeoffs, how to evaluate security vendors, and what it actually takes to run an effective security operations function at scale

Stay sharp on security operations

Practitioner takes on SOC modernization, detection engineering, threat hunting, and more. No fluff. No product pitches.

Container security: SOC practitioner’s guide | Future of SecOps