How Shift-Left Extremism is Harming your API Security Strategy
Shift-left security philosophy promotes the notion that organizations should push more of security processes earlier into the design and development phases of software development lifecycles. This ideal is promoted heavily in DevOps and DevSecOps programs as a way to detect quality or security issues early and remediate those problems before they make it into production where impacts become more costly. Rightfully so, many organizations have gone all-in with shift-left to try to realize these benefits and cost savings.
A specific focus within shift-left is “securing the build pipeline,” which requires that teams get security tooling plugged into CI/CD build pipelines and git-based developer workflows. Securing build pipelines requires a range of security tooling including dependency analysis, static analysis, dynamic analysis, schema validators, fuzzers, and vulnerability scanners. The type of security tooling that is needed varies based on what artifacts are moving through the pipeline, what must be built, and where must it be delivered.
Organizations often struggle with two aspects of build pipeline security:
- “Full” coverage requires sourcing multiple types of security testing tooling - Some organizations will try to reduce cost by opting for free open-source tooling partially or entirely. Decisions vary based on awareness, security budgets, and risk tolerance.
- Tooling must be integrated and automated to serve the pipeline - It’s not enough to scan, find issues, and spit out a report. The end result of security scanning must be digestible and fit within CI/CD processes.
Even if you succeed at satisfying these requirements, a number of other problems emerge for organizations with the myopic view that securing the build pipeline is the end goal of an API security strategy.
Vulnerability scanning is commoditized and low value
Vulnerability assessment and vulnerability management (VA/VM) tooling rule the roost in many security organizations. This type of network and infrastructure scanning is extremely common in organizations, as it’s been promoted as IT and security best practice for decades. Some VA/VM vendors claim they cover applications as well, but it’s typically to identify vulnerabilities in commercial or open-source software packages. These tools can also be helpful for identifying some types of misconfigurations of servers or workloads. However, they do little to assess the security of your custom applications and APIs. It is commonplace to see VA/VM as the standard approach in security programs rather than a formalized application security approach with purpose-built tooling such as application security testing (AST) tools.
If VA/VM is that tooling that enables you to scan production infrastructure for known vulnerabilities, then SCA is the tooling that enables you to scan componentry to identify vulnerable external dependencies in application source code or infrastructure-as-code. SCA scans can be triggered as part of code commits, run as part of builds, initiated as part of delivery (e.g., instantiating a containerized workload) or run continuously in production to detect drift. SCA has also been commoditized where it can be found as integrated capabilities in AST suites, included in git-based offerings such as GitHub and GitLab, and baked into the myriad of workload protection offerings. There are also plentiful open-source alternatives for SCA. Even if SCA functions only as a basic dependency checker, that may provide “good enough” security for many organizations.
Scanning container images or container registries for published, well-known vulnerabilities is trivial in comparison to runtime analysis and anomaly detection. Even workload protection offerings must provide continuous scanning in runtime to be beneficial. Scanning for vulnerable dependencies isn’t just a point in time, or pipeline scan. Environment drift is common in “day 2 operations” and the longer an application or its supporting infrastructure lives. If you go down the rabbit hole of nested or transitive dependencies, verifying whether vulnerable code is truly exploitable takes you into a complex world of extended static analysis, dynamic analysis, fuzzing, runtime analysis, and behavior analysis.
Learn how to secure APIs in dev and runtime and enable a model of continuous improvement
Download NowThere’s more to security issues than CVEs
Common vulnerabilities and exposures (CVE) IDs cover known vulnerabilities in published software packages and hardware. These types of security issues are more easily identified by scanning tools. Common techniques include scanning network IP address ranges for listening services, querying server banner information, fingerprinting running software on hosts, and evaluating headers in server responses.
Common weakness enumeration (CWE) IDs are broader categories of weaknesses in software which is the more appropriate classification system for issues in homegrown applications and APIs. They are lesser known but also more loosely defined, meaning that a given issue in your custom application may map to multiple CWE IDs. Unfortunately, this complicates risk scoring and remediation tracking, but it is the reality of custom development. Even organizations that buy software more than they build still end up creating custom code to integrate systems and exchange data. CWE is the more relevant taxonomy for the types of security issues you may be inadvertently creating as you build or integrate code. If you’re only looking at CVEs, you’re missing the bigger picture.
Known vulnerable dependencies that you reference in code or infrastructure-as-code become part of your running workloads. As a result, most organizations end up with a mashup of CWEs, CVEs and more. Though these are well-defined taxonomies from MITRE and NIST, they don’t cover everything. There is also inherent latency in reporting and vulnerability disclosure. There isn’t necessarily a “fix” for CVE reporting latency either. It is a byproduct of self-reporting process, mean-time-to-detect problems, coordinated vulnerability disclosure agreements, and logistics of coordinating responsible parties.
The dark secrets of AST
There’s an elephant in the room when it comes to the limitations of AST tooling:
Static analysis and dynamic analysis tools can’t detect business logic flaws.
It’s simply not possible for these breeds of tools as part of their design. Business logic is unique to your organization and how you design and code APIs. As a result, the code that represents your business logic rarely follows well-defined patterns where signatures or rules can be built accordingly. This limitation is similar to the pitfalls of threat protection mechanisms in API gateways and WAFs.
AST that can instrument and analyze applications or APIs at the code-level as they run, (e.g., IAST), may be able to uncover subsets of privilege escalation weaknesses. Instrumentation-driven security testing may provide some limited coverage for broken object level authorization and broken authentication weaknesses. However, most vendor offerings don’t go deep in testing authentication or authorization beyond cursory checks such as detecting weak forms of authentication like basic or digest access. Or the AST tool might analyze how credentials are input, passed or stored, which again is only one small piece of the puzzle. Technically, testing for privilege escalation weaknesses requires multiple runs of a DAST tool against a given app and its API. Unfortunately, time is already at a premium in many organizations embracing DevOps and pushing tighter release windows. DAST tools are notorious for running for extended periods of time, worse than SAST and slightly better than fuzzing.
DAST tools are also not built for web API testing. They typically require a front-end to initiate application requests, such as JavaScript executing in a web browser engine or a mobile binary running on a mobile device. DAST tools detect issues by intercepting, analyzing, and manipulating traffic. Some of that traffic may be APIs, but DAST tools don’t have the context of how those APIs function. Scanning REST APIs effectively requires a good deal of care and feeding with DAST, similar to training of a WAF. Scanner configuration may be informed partially by API documentation or defined manually, but it is necessary so that a DAST tool can analyze an API endpoint somewhat intelligently. Even still, a DAST may try to detect vulnerabilities or weaknesses that aren’t relevant in the world of APIs, such as hidden administrator control panels or URL directory brute forcing issues. It’s on the practitioner to tune the DAST scan configuration further and suppress such rules. However, such tuning requires expert knowledge of the system and all its APIs.
Simply stated, organizations must look to runtime analysis and baselining of API behaviors in order to identify the broad spectrum of API attacks and business logic abuse. Static analysis and dynamic analysis have always had their shortcomings. The problem is worsened in the world of APIs.
The shortcomings of schema validators
A form of static analysis, API schema validators are often pitched as the DevOps friendly solution for build pipeline security. The pitch often goes like this: ‘Give us your schema definitions, we can scan your APIs, make sure they’re conformant, and check for vulnerabilities.
A number of issues exist with the schema validation approach:
- Not everything needs to be defined in API schema: API specification formats like OpenAPI specification (OAS) and Swagger don’t require that you define all fields or functions in the API documentation. It is common for developers to forget to document something fully, particularly if they aren’t working within API design tools like Postman.
- Many organizations are lackluster at documenting: Humans are notoriously bad at documenting and especially documenting everything fully. Lack of documentation is not a problem specific to developers. OAS can help in that it is self-documenting, but there is still manual effort required. Some tooling may also be better at generating the OAS definition than others.
- API drift: Deviations from the original specification and what is running production are common. API drift parallels one of the biggest problems organizations run into with secure design review and threat modeling processes. Sometimes the thing you intend to build ends up looking much different than the real-world product.
Schema validation and enforcement is the old paradigm of positive security in new clothes. Instead of security teams having to create rules or signatures, the burden is shifted to development teams. The mashup of CWEs and CVEs will inevitably rear its head again. Just like their AST counterparts, schema validators still can’t detect business logic flaws, and they may also miss sensitive data exposures.
The illusion of production mirrors
Non-production environments that mirror production are required for running certain types of AST effectively, including DAST and IAST. Unfortunately, such testing environments are often a luxury for many organizations. There are likely discrepancies between production and non-production if all elements aren’t fully containerized and easily deployable. Container platforms and service meshes can make it easier to spin up environments as needed, but these technologies incur massive learning curves. Organizations that may not be as far in their DevOps maturity struggle with deploying and operationalizing them. There are also likely cost or license constraints to have a fully functional, non-production environment that mirrors production. Cloud may seem like an answer since it is easy to spin up ephemeral instances in cloud service providers, but cloud compute expenses add up quickly.
Absent working non-production environments, some organizations opt to run DAST or IAST tools in production. Though vendors will claim that a given tool is reasonably production safe, many organizations sacrifice scan depth or accuracy by disabling certain types of tests and limiting recursive scans that could otherwise result in a service outage. These types of concessions cut further into the limited efficacy of AST tooling when it comes to APIs.
Successful security test automation can’t exist without non-security test automation
Automating your security testing in development toolchains, version control systems, and build pipelines requires discipline in many non-security practices. Good code coverage and test coverage can’t be accomplished without adequate test data management, production-like test environments, defined unit tests, functional test automation scripts (e.g., appium, selenium), and more.
The data, tooling and processes that are necessary to succeed at test automation are shared responsibility amongst identity, I&O, QA, and development teams. For some organizations, QA may not even exist. In many cases, this role has been significantly downsized or eliminated entirely. And presuming all of these test automation fundamentals are in place, it is even rarer for security teams to be collaborating with application teams to make use of it all.
Security and non-security collaboration is nonexistent in some organizations, and DevOps maturity may be lower. Siloed IT teams and isolationism may persist unless the organization has embraced a broader digital transformation initiative. These multi-year approaches are about more than just seeking pipeline tooling to solve a problem. Digital transformation involves radically revamping how the organization runs, what kind of culture it encourages, how it recruits talent, and more. Agile methodologies and DevOps practices are typically intertwined with transformation, but there are many pillars that take years to become proficient at. Many security teams continue to operate independently from development teams so they can maintain focus on other security risks that impact their organizations. Friction between development, operations, and security teams is also still more common than we’d like to think.
Wrestling with assembly line thinking
We as practitioners of any IT role don’t work in factories. In some circles these pipeline approaches and DevOps programs are sometimes referred to as software factories. The software factory concept promotes assembly line thinking, except we're not machines or factory workers producing things. Many would also argue that factory workers aren't always happy workers. If a worker is unhappy, that directly impacts quality (and security). Any form of code should be viewed as less than perfect since we're humans creating it. We haven’t reached that tipping point that machines are generating the code themselves yet. Code also evolves over time, with many hands touching a codebase over its lifetime. Code also passes through many sets of scanners over time and may still be low quality, exploitable, or abusable. Numerous DevOps role model organizations with secure build pipelines have still suffered API attacks resulting in data loss, privacy impacts, safety issues, brand damage, and more.
Toyota is often referenced as an agile success story, pioneering lean manufacturing and agile thinking. Toyota also empowered assembly line workers to halt the assembly line by pulling an andon cord if they observed a problem. Toyota vehicles were plagued with unintended acceleration issues for years in the 2000s. Experts later reviewed Toyota source code for the numerous embedded systems and testified that unintended acceleration issues were likely caused by spaghetti code, not obtrusive floor mats or stuck accelerator pedals as originally claimed. Even if you dismiss this event to be the result of corruption or business goals trumping public safety, these are still factors that resulted in the bypassing of established safety protocols and development standards.
Issues found during security testing must get logged as bugs or defects, inevitably ending up in a backlog, or you must halt the pipeline and fail the build. Failing builds excessively holds up release trains. Other aspects of production release also unfold. Do we know if issues are truly exploitable or abusable? What severity and priority should be assigned to a detected issue? How far into the defect backlog should the detected issue go? Do you halt a production release for potentially low severity issues which may or may not be exploitable?
There’s likely more than one pipeline
Most organizations have a mixture of technology stacks that involve different programming languages and development processes. There is likely a blend of monolithic and microservices architecture, legacy and modern technology stacks, and waterfall and agile methodologies. This blending inevitably leads to multiple build pipelines for organizations, as well as reduced visibility or control of those pipelines. Some contributing factors include:
- Some code and the pipeline(s) it moves through may be out of your control.
- A formalized build pipeline may be nonexistent if the owning party isn’t embracing CI/CD.
- Many organizations outsource or offshore development fully or in part.
- Development efforts are often splintered, with separate front-end and back-end teams, or teams devoted to certain subsets of functionality.
- Low code platforms may not provide formalized workflow and build services, or code artifacts may be an unknown quantity.
The realities of enterprise application development, integration, and systems engineering inevitably result in multiple build pipelines. The more pipelines that exist, the harder it becomes for security to gain visibility, get tooling plugged in, or exercise control. Unifying all builds into one build pipeline is unrealistic for most organizations since there are often legitimate business needs for different technology stacks.
Think about pipelines in a different light
Best practices and tooling for securing the build pipeline itself are also nascent. Organizations sometimes cut corners in the security of CI/CD services, or tooling isn’t properly designed to ensure integrity of all that moves through a pipeline. The various VCS, CI and CD services are also applications themselves, often built on APIs. They too can be attacked in various ways, which puts at risk everything that goes through your pipeline. Teams that operate within the pipeline may suppress checks or results in order to release code. Inherent supply chain risks worsen the more you source code or dependencies from third parties. Partners or suppliers may also provide custom code that isn't part of your pipeline, yet it makes it into the organization’s compiled code or complete system. All of these factors potentially impact the quality and security of everything that moves through your pipelines.
It’s important to remember that pipeline scanning includes checks for all forms of code but also controls that can be audited programmatically. DevOps practices expand beyond just application source code to also include infrastructure-as-code (IaC) and policy-as-code (PaC). Just as functionality is enabled by more than just the application code, the same holds true for security, and many security controls exist external to code. Items you validate in pipelines may simply involve confirming that a given runtime security control is enabled and configured appropriately in IaC or PaC.
Conclusion
Establishing and gaining adoption of secure build pipeline approaches is a multi-year endeavor for organizations. And no secure pipeline effort can succeed without attaining a certain level of DevOps maturity. They say DevOps is a journey for a reason. It takes years to mature your processes and toolchains. More importantly, it takes time to gain adoption from all the IT personas within and external to the organization. Attackers don’t care about your DevOps journey though, and in the meantime are circumventing access controls, exploiting weaknesses, and abusing business logic.
Shift-left and secure build pipeline approaches have their merits. However, organizations must accept the risk that there are many types of security issues that simply can’t be caught as part of automated design, development, and build-time scans. Organizations that are mature in their API security strategies and DevSecOps programs concede that at best they are catching only a portion of security issues with a given scanning tool. The findable issues are limited to vulnerabilities and weaknesses that follow well-defined patterns. Many security problems only manifest themselves in runtime, as part of the complete system, and within enterprise architecture.
Behavior analysis in runtime is a path forward for organizations and a way to protect themselves from the wide spectrum of API issues and attacks. Any offering should provide early detection and prevention, stopping attackers before they succeed at exploiting or abusing your APIs. The Salt Security API Protection Platform was built to avoid the many pitfalls of traditional scanning approaches and runtime mitigations. The Salt API Protection Platform is:
- Full lifecycle, providing capabilities for design-time, build-time, and runtime phases to harden and protect your APIs against the broad spectrum of API attack patterns.
- Integrated, working with your existing technology investments and minimizing impact to the workflows of all the personas that touch APIs
- Informed by ML, bringing expert intelligence across the multiple security domains critical for API security and that is otherwise difficult to staff for
- Automated, learning your unique business logic and API design patterns in order to offer tailored security insights automatically
To learn more about how Salt can help defend your organization from API risks, you can connect with a rep or schedule a personalized demo.