Demystifying AI Governance: Access Control [Part 4]

Overview

AI governance involves providing guardrails for safely using AI systems. This can seem like an obtuse topic, but once we start approaching it systematically, we find that it breaks into concrete subproblems. A lot of these map to well-understood problems in cybersecurity, with a few twists here and there; the rest are novel, but can be framed as clear requirements.

The first three articles in this series, we broke down AI governance into a set of specific problems and covered the wrapper+collector solution framework. In this final article, we will go over access control.

Why access control

Access Control is a foundational component of AI governance. If we cannot control which principals (i.e. users and services) can access which AI systems, it gets really tricky to impose visibility and control measures. Conversely, if we can limit access to AI systems to a small set of trusted, well-understood principals, then we can get away with not-so-onerous solutions for other AI governance problems.

Access control can look very different depending on whether the AI system is internal or external to the enterprise. Internal solutions, especially those offered by the cloud platform, can benefit from the native IAM (Identity and Access Manangement) primitives provided by the platform. This does not solve the problem completely - we need to make sure that the access permissions are not overly broad, but it starts things off from a good baseline. External vendors on the other hand, almost always leverage API keys for access control.

API keys: The problems

API keys are simple to start using. The external AI system vendor provides one or more API keys, usually from the admin portal. The onus is on the enterprise to manage these API keys carefully. And this is a hard problem for a bunch of reasons.

First off, API keys, or more generally, shared secrets, break the access control model. In the cloud setting, access to resources is captured in IAM, in what is commonly called the Identity graph. This graph, which can be constructed from all the IAM policies defined in the enterprise’s cloud platform account, captures which principals can access which resources. However, API keys/shared secrets circumvent this, because access to the external AI system in question does not get captured in this identity graph. Any entity who has seen the API key at some point can access this resource.

Second, storing and retrieving API keys is tricky. Some mature enterprises leverage secret management mechanisms to carefully control access to secrets, but more often than not, secrets get stored in all kinds of insecure locations such as configs/code, leading to them being compromised.

Finally, API keys make it hard to audit access, and attribute actions back to specific principals. All we know is that someone in possession of a given API key performed an action.

Abstracting away API keys

We can avoid the above issues with API keys by abstracting them away and capturing access to external AI systems within the identity graph. The basic idea here is to model the API key as a resource that the IAM system understands, and use access to this resource as a proxy for the external AI system, i.e., a principal can access the external AI system if and only if they can access this resource.

A simple way to do this is to store the key as an object in the controller S3 bucket. The key is stored alongside all the configs, logs, and other data in the bucket, but access to the key is tightly controlled via IAM policies set on the bucket. Now, to specify which users/principals have access to the API, we just need to add them to the policy corresponding to this object. To make this really effective, we should couple this with a tight rotation mechanism that ensures that API key lifetimes are limited. Using a blue-green rotation strategy can avoid downtimes during rotation.

Access Control as a chokepoint

The final piece of the puzle is using access to the API keys as a way to build an effective chokepoint. This is easier to visualize with a centralized gateway. If we ensure that API keys to an external AI system is restricted to the gateway, then all calls to this system have to go through the gateway, thereby making the gateway an effective chokepoint. With internal AI systems, the logic is similar, except that here the idea is to ensure that the gateway is the only principal that can access the AI system.

We can extend this idea to distributed wrappers. With external AI systems, the wrapper obtains the API key from the controller (assuming that the calling service has the privilege to do so). This API key should not be made available outside the wrapper. With internal services, one simple way to do this is via a layer of indirection. The AI system is accessible only by a specific role, and this role is to be assumed only within the wrapper. Note that because the wrapper runs in the same security context as the calling service, fundamentally, the wrapper cannot have any extra privileges than the calling service. In other words, there is no way to restrict a secret to stay within the wrapper. However, this restriction is not a particular issue in this context, building an effective chokepoint by controlling access to the API key. We are not trying to protect against a malicious developer who we’re worried would steal and leak the API key; if so, we should employ other mechanisms to guard against that threat.

Recap and Next Steps

In this article, we covered why access control is a core part of AI governance, how API keys can be problematic, and how we can turn this around and leverage API keys and more broadly access control as a way to stand up chokepoints from where we can implement various AI governance mechanisms.

In this series of articles on AI governance,

we have broken down AI governance into distinct subproblems which map to well understood problems in cybersecurity.
realized how we can target these problems by a logical chokepoint that intercepts all access to AI systems, and
built a simple but effective chokepoint with the wrapper + controller design, leveraging access control.

All of this is available in code in the aigov repository. The code is simple and extensible, so clone it and play around as you like!