AKASA AI-powered automation is only possible because of the people behind it. Our team of talented engineers work within the AWS infrastructure, which offers power cloud computing and security. But, because we're in the healthcare space, security is an especially big priority for us. In order to stay efficient while also staying safe, we needed a unique approach to our user management. Enter, IAM management as code.
At AKASA, we believe every dollar spent on healthcare matters because healthcare matters to everyone. Our mission is to remedy the financial complexity in United States healthcare and curb wasteful spending. And we aim to do this using automation.
Inspired by the machine learning approaches that made self-driving cars possible, our team created the AKASA platform for the healthcare revenue cycle. The AKASA platform brings together the best of people, data, and technology to address financial complexity in our healthcare system.
AKASA uses this unique approach and a set of proprietary technologies to provide health systems with a solution to efficiently, accurately, and autonomously navigate the complex state of medical reimbursement in the United States.
Working side-by-side with healthcare providers is key to understanding the unique challenges of healthcare revenue cycle processes. We take the responsibility of ensuring the privacy and security for information related to so many patients very seriously.
At AKASA, we impose strong security requirements on ourselves. For example, all of our client’s data is stored in different databases, and they are all encrypted. All activities that involve protected health information (PHI) are logged. And, data is stored in the cloud — meaning the laptops that our developers use do not have any PHI stored locally and are only used as terminals.
When it comes to security and authentication, each of our engineers has a unique ID that is logged in to the system, so we can track who did what and audit these activities. At AKASA, we adopt the principle of least privilege access, so that you don’t have to worry about privilege escalation in our system. These are policies that we take very seriously in order to mitigate security challenges. But we believe this alone is not enough, and we strive to go above and beyond in our security practices.
As we strive for cutting-edge security methods, we wanted to implement strong and sophisticated authorization and authentication tools, and we believe Amazon Web Services (AWS) IAM is the best choice. However, using this tool effectively requires more than a simple integration. We faced a variety of challenges during the implementation, and this article will explain these challenges, our thought process, and the solution, with source code provided in github.
At AKASA, we have many developers that use our resources in the AWS account, and we need to track each action they take. CloudTrail helps us to do just that by recording all the AWS API calls or, in simple words, “their actions” with principalId. IAM user and IAM role are both principal types, thus they are logged as principalId in the API calls.
We want our developers to do their magic in the devservers (EC2 instances in the cloud), rather than using their own computers. Laptops are used as terminals to connect into the remote devserver. If we create an IAM user for each engineer, nothing can prevent the engineer from using the access_key_id of the IAM user in the laptops. As discussed above, this is against our security standards. IAM roles do not have this problem, as they are not password-based. IAM roles can be attached only to EC2 instances, but not laptops. Therefore, we decided to use IAM roles instead of IAM users.
Please note that the assumed IAM role session expires in a configurable amount of time and needs to be refreshed. The auto-refresh feature is already implemented by AWS through instance profile for AWS EC2 instances. We just have to wrap the IAM role into an instance profile and connect it with the EC2 instance, then the instance will auto-refresh the role.
AWS’s design philosophy is very simple but can pose complications and challenges. IAM roles are like hats that can be worn (“assumed”) by different AWS principals. But we want to have “personal” roles, meaning each role is only taken by a single engineer (more specifically, the engineer’s devserver). Since the engineers need different permissions, managing these IAM roles manually becomes more and more complicated and error-prone. Let’s take a look at a scenario.
So, the question now is why duplicated permissions in different IAM policies are bad. The answer is simple: it becomes increasingly difficult to maintain.
Whenever making a change to a permission, we have to make the same change to all the IAM policies that contain the permission. So, a quick answer is defining a set of atomic permissions and using a tree data structure with multi-level inheritance. Details are shown in the following section.
Another issue is lack of version control with AWS IAM. We want to track the history of permission changes, and we want permission changes to be peer-reviewed. So, the solution is using code in a GitHub repo to manage these atomic permissions and the tree structure. That’s why this article is named “IAM Management As Code.”
The architecture of IAM as code is shown above. It is a Directed-Acyclic-Graph (DAG). The root of the DAG is a set of Policies, each of which contains an atomic permission. The PolicyGroups inherit from the Policies and other PolicyGroups.
For example, Base PolicyGroup contains three Policies that are needed for all the engineers at AKASA, such as a permission to access an S3 CDN bucket, and a permission to read a secret in SecretsManager. TeamADev PolicyGroup contains Base PolicyGroup and one additional Policy (which allows deploying to the dev environment). TeamAProd PolicyGroup contains TeamADev PolicyGroup and one more Policy (this one allows deploying to the prod environment).
TeamBDev and TeamBProd are similar to team A PolicyGroups. The bottom-level PolicyGroups have a one-to-one mapping with individual engineers. As you can see, this design is very flexible: Eng B works cross-functionally and thus has TeamBDev PolicyGroup in addition to TeamAProd PolicyGroup. We use a python script to render CloudFormation manifests from the bottom-level PolicyGroups.
Each CloudFormation manifest contains a few IAM policies, an IAM role (associated with the IAM policies), and an IAM instance profile. Each manifest will be deployed into AWS as a CloudFormation stack. The IAM instance profile is assigned to an individual developer’s devserver. No developer can use other people’s IAM instance profile.
Each instance of Policy contains an atomic permission. Note that atomic permission is defined based on AKASA engineering needs. The AWS resources’ naming needed to be designed appropriately to accommodate this need.
For example, we separate clients’ data using S3 prefixes, like s3://ops-data/{client}. Then the authorizations needed to read from s3://ops-data/tcm/* is an atomic permission and should be used by an IAM user for client tcm. The authorizations needed to read from s3://ops-data/* is also an atomic permission, for the usage of developers. (AKASA developers work across all clients, instead of being assigned to specific clients.)
from typing import List INDENT = ' ' DEFAULT_INDENT = 5 class Condition(): def __init__( self, name, operator: str, condition_key: str, values: List[str], ): self.name = name self.operator = operator self.condition_key = condition_key self.values = values def to_yaml(self, **kargs) -> str: """Convert the condition to a yaml string""" base_indent = (DEFAULT_INDENT + 2) * INDENT name_str = f"\n{base_indent}# {self.name}" operator_str = f"\n{base_indent}{self.operator}:" condition_str = f"\n{base_indent}{INDENT}{self.condition_key}:" values_str = '' for value in self.values: values_str += f"\n{base_indent}{2*INDENT}- \"{value}\"" yaml = name_str + operator_str + condition_str + values_str for k, v in kargs.items(): self._template_replace(yaml, k, v) return yaml def __lt__(self, other): return self.name def _template_replace(self, template, key, value): return template.replace("{{{}}}".format(str(key)), value) class Policy(): def __init__( self, name: str, effect: str, actions: List[str], resources: List[str], conditions: List[Condition] = None, ): """Initialize the Policy object""" self.name = name self.effect = effect self.actions = actions if isinstance(actions, str): raise ValueError("Actions must be a list, got %r." % actions) self.resources = resources self.conditions = conditions def to_yaml(self, **kargs) -> str: """Convert the policy to a yaml string""" base_indent = DEFAULT_INDENT * INDENT name_str = f"\n{base_indent}# {self.name}" effect_str = f"\n{base_indent}- Effect: \"{self.effect}\"" action_str = '' for action in self.actions: action_str += f"\n{base_indent}{2*INDENT}- \"{action}\"" action_str = f"\n{base_indent}{INDENT}Action:{action_str}" resource_str = '' for resource in self.resources: resource_str += f"\n{base_indent}{2*INDENT}- \"{resource}\"" resource_str = f"\n{base_indent}{INDENT}Resource:{resource_str}" condition_str = '' if self.conditions is not None: condition_str += f"\n{base_indent}{INDENT}Condition:" for condition in self.conditions: condition_str += condition.to_yaml() yaml = name_str + effect_str + action_str + resource_str + condition_str for k, v in kargs.items(): yaml = self._template_replace(yaml, k, v) return yaml def __lt__(self, other): return self.name def _template_replace(self, template, key, value): return template.replace("{{{}}}".format(str(key)), value)
An atomic permission needs a name, effect, a list of actions, a list of resources, and an optional list
of conditions. See below for an example Policy named S3AllowListOnBucketMldata that allows s3:ListBucket on the bucket ops-data.
Note that s3:ListBucket grants permission to list some or all objects in this bucket, and the resource for it must be buckets. If one wants to allow listing only a subset of objects, s/he can do it by using setting up a condition, see example of S3AllowListOnBucketMldataPrefixDevserver, which allows s3:ListBucket only on prefixes that looks like devserver/* (regex-wise).
POLICY_S3_LIST_MLDATA_BUCKET = Policy( 'S3AllowListOnBucketMldata', 'Allow', ['s3:ListBucket'], [ 'arn:aws:s3:::ops-data', ], ) POLICY_S3_LIST_MLDATA_BUCKET_PREFIX_DEVSERVER = Policy( 'S3AllowListOnBucketMldataPrefixDevserver', 'Allow', ['s3:ListBucket'], [ 'arn:aws:s3:::ops-data', ], conditions = [ Condition( name=f"RestrictListBucketToPrefix", operator="StringLike", condition_key="s3:prefix", values=["devserver/*"]) ] )
As shown in the architecture, the middle and leaf nodes are PolicyGroups. Each leaf node is mapped to an engineer. Code of such mapping is in the Generator.
As shown in the code below, PolicyGroup contains a list of Policies and a list of PolicyGroups. The flatten() function will fetch all the Policies recursively through a depth-first search. The to_yaml() function will sort these flattened Policies by name and render them to a YAML format. The generated yaml files will be managed by git. Sorting is mainly for ease of showing file differences
in GitHub pull-requests.
class PolicyGroup(): def __init__(self, policies=None, policy_groups=None): """Initialize the object with policies and policy_groups. Note that PolicyGroup has a nested definition, i.e. a policyGroup can contain other policyGroups. But there is no need to do cycle detections, since the policyGroups are immutable, thus a parent can never refer to its child (the child does not exist yet when the parent is initialized), thus cycle does exist.""" self.policies = policies self.policy_groups = policy_groups self.policies_dedupped = None def flatten(self): if self.policies_dedupped is not None: return self.policies_dedupped self.policies_dedupped = set() if self.policies is not None: for policy in self.policies: self.policies_dedupped.add(policy) if self.policy_groups is not None: for policy_group in self.policy_groups: self.policies_dedupped.update(policy_group.flatten()) return self.policies_dedupped def to_yaml(self, **kargs) -> str: """ Convert the policyGroup to a yaml string""" flattened = self.flatten() yaml_str = '' for policy in sorted(list(flattened)): yaml_str += policy.to_yaml(**kargs) return yaml_str
For each engineer, we use CloudFormation to generate an IAM instance profile, which is attached to the engineer’s EC2 devserver. The CloudFormation manifest is generated by filling a template. The template includes a few things: a few IAM policies, an IAM role that is associated with these IAM policies, and an IAM instance profile that “contains” the IAM role.
Theoretically, each PolicyGroup produces a single IAM Policy. Why do we have a few IAM policies in a CloudFormation stack? The reason is AWS IAM has a hard limit of 6,144 characters for each IAM policy (see Ref). The generated IAM policy for a PolicyGroup easily exceeds 6,144 characters. That’s why we need to split it. As of now, we use the heuristic rule, i.e. all S3-related permissions go to the 1st IAM policy, all SecretsManager-related permissions go to the 2nd IAM policy, and the remaining goes to the 3rd IAM policy.
We can always do more splitting if needed.
The CloudFormation template for IAM instance profile is shown below. {Policies} is a combination of multiple IAM policies, which are generated by the IAM policy template below. Using engineer Jane Doe as an example, RoleName will be PersonalRoleJaneDoe, InstanceProfileName will be PersonalInstanceProfileJaneDoe, and the three PolicyNames will be PersonalPolicyS3JaneDoe, PersonalPolicySecretManagerJaneDoe and PersonalPolicyMiscJaneDoe.
The code used to render the CloudFormation manifest from the template is in github.
AWSTemplateFormatVersion: {TemplateFormatVersion} Description: {ProfileDescription} Resources: {Policies} {RoleName}: Type: AWS::IAM::Role Properties: AssumeRolePolicyDocument: Version: {AssumeRolePolicyDocumentVersion} Statement: - Effect: Allow Principal: Service: - ec2.amazonaws.com Action: - sts:AssumeRole - Effect: Allow Principal: AWS: - arn:aws:iam::025412125743:user/ServiceUserProdKrun Action: - sts:AssumeRole Path: "/" ManagedPolicyArns: - arn:aws:iam::aws:policy/CloudWatchAgentServerPolicy {PolicyReferences} RoleName: {RoleName} {InstanceProfileName}: Type: AWS::IAM::InstanceProfile Properties: InstanceProfileName: {InstanceProfileName} Path: "/" Roles: - !Ref {RoleName} IAM policy template: {PolicyName}: Type: AWS::IAM::ManagedPolicy Properties: Description: {PolicyDescription} ManagedPolicyName: {PolicyName} Path: / PolicyDocument: Version: {PolicyDocumentVersion} Statement: {Effects}
Note that in addition to personal roles, the framework discussed above can be applied to other kinds of IAM needs, as long as appropriate templates are defined.
For example, to use the framework to manage an IAM user, the template is defined like below. We won’t go into details for this template in this article.
AWSTemplateFormatVersion: {TemplateFormatVersion} Description: {StackDescription} Resources: {PolicyName}: Type: AWS::IAM::ManagedPolicy Properties: Description: {PolicyDescription} ManagedPolicyName: {PolicyName} Path: "/" PolicyDocument: Version: {PolicyDocumentVersion} Statement: {Effects} {GroupName}: Type: AWS::IAM::Group Properties: GroupName: {GroupName} ManagedPolicyArns: - !Ref {PolicyName} {UserName}: Type: AWS::IAM::User Properties: Groups: - !Ref {GroupName} UserName: {UserName}
Using the IAM-as-code discussed above, the procedure to make IAM changes is as below.
The tech lead of the team needs to create an S3-read Policy, see below. It allows listing all objects in the bucket and reading the objects in prefix streaming/*.
And then, the tech lead adds the Policy to the PolicyGroup of the team.
Since each ML team member’s PolicyGroup either contains the POLICY_GROUP_TEAM_ML or is POLICY_GROUP_TEAM_ML itself, this change will be reflected in all ML team members.
POLICY_S3_READ_MLDATA_BUCKET_PREFIX_STREAMING = Policy( 'S3AllowReadOnBucketMldataPrefixStreaming', 'Allow', [ 's3:GetObject', 's3:GetObjectAcl', 's3:GetObjectVersion', 's3:ListBucket', 's3:ListObjectVersions', 's3:GetBucketLocation', ], [ 'arn:aws:s3:::ml-data', 'arn:aws:s3:::ml-data/streaming/*', ], ) POLICY_GROUP_TEAM_ML = PolicyGroup(policies=[ ## Existing policies ... POLICY_S3_READ_MLDATA_BUCKET_PREFIX_STREAMING, ])
Same as above, the tech lead (name is John Doe) creates a Policy for the special permission. And then s/he adds the new Policy to the personal PolicyGroup, see below. If the tech lead used to use JohnDoe: POLICY_GROUP_TEAM_ML mapping in the generator, now he can change it to JohnDoe: POLICY_GROUP_JOHN_DOE. You can tell how flexible the system is.
POLICY_ML_SPECIAL = ... POLICY_GROUP_JOHN_DOE = PolicyGroup( policies=[ POLICY_ML_SPECIAL, ], policiGroups=[ POLICY_GROUP_TEAM_ML, ], )
Since we use Kubernetes (AWS-managed EKS), developers need to be authenticated when operating the Kubernetes cluster. We use a ConfigMap to map the developers’ IAM roles to a ClusterRole (called eng-clusterrole), which has the proper authorizations to operate the Kubernetes cluster. If you do not know ClusterRole in Kubernetes, refer to this link.
Since we restrict the developers’ access to the AWS console, a problem arises: how can the developers get the CodeBuild artifact and logs? (Context: at AKASA, we use AWS CodeBuild to build the docker images or artifacts.) So our solution is to build a tool called cbgm, which is a script containing AWS CLI commands, to assist the developers in fetching CodeBuild logs and artifacts.
It was a long journey, but after much collaboration and trial and error, we found a system that works for our team and allows us to build world-class automation for healthcare operations. To recap:
If you’re interested in joining our talented team and want to help us drive down the cost of healthcare in the U.S., we’re always looking for talented engineers. Be sure to check out our open positions today. We can’t wait for you to be a part of our growing team.