akasa
AKASA
December 05, 2024

The Gist

AKASA AI-powered automation is only possible because of the people behind it. Our team of talented engineers work within the AWS infrastructure, which offers power cloud computing and security. But, because we're in the healthcare space, security is an especially big priority for us. In order to stay efficient while also staying safe, we needed a unique approach to our user management. Enter, IAM management as code.

At AKASA, we believe every dollar spent on healthcare matters because healthcare matters to everyone. Our mission is to remedy the financial complexity in United States healthcare and curb wasteful spending. And we aim to do this using automation.

Inspired by the machine learning approaches that made self-driving cars possible, our team created the AKASA platform for the healthcare revenue cycle. The AKASA platform brings together the best of people, data, and technology to address financial complexity in our healthcare system.

AKASA uses this unique approach and a set of proprietary technologies to provide health systems with a solution to efficiently, accurately, and autonomously navigate the complex state of medical reimbursement in the United States.

Why IAM?

Working side-by-side with healthcare providers is key to understanding the unique challenges of healthcare revenue cycle processes. We take the responsibility of ensuring the privacy and security for information related to so many patients very seriously.

At AKASA, we impose strong security requirements on ourselves. For example, all of our client’s data is stored in different databases, and they are all encrypted. All activities that involve protected health information (PHI) are logged. And, data is stored in the cloud — meaning the laptops that our developers use do not have any PHI stored locally and are only used as terminals.

When it comes to security and authentication, each of our engineers has a unique ID that is logged in to the system, so we can track who did what and audit these activities. At AKASA, we adopt the principle of least privilege access, so that you don’t have to worry about privilege escalation in our system. These are policies that we take very seriously in order to mitigate security challenges. But we believe this alone is not enough, and we strive to go above and beyond in our security practices.

As we strive for cutting-edge security methods, we wanted to implement strong and sophisticated authorization and authentication tools, and we believe Amazon Web Services (AWS) IAM is the best choice. However, using this tool effectively requires more than a simple integration. We faced a variety of challenges during the implementation, and this article will explain these challenges, our thought process, and the solution, with source code provided in github.

Key Terms

  • Software Engineers on Infrastructure (or DevOps): a group of engineers at AKASA who are in charge of managing the AWS resources, Kubernetes clusters, and authentications/authorizations.
  • Developer: AKASA engineers who are not DevOps. They write code to implement the company’s business needs and deploy services.
  • Permission: a logical group of authorizations to operate certain AWS resources. It can have different granularity, e.g. permission 1 allows reading from an S3 bucket, permission 2 allows reading from a particular prefix of the bucket, then permission 2 is strictly more granular than permission 1. However, it is impossible to define a “most-granular” permission for this S3-read scenario. For example, you can always define an S3 read permission more granular than permission 1 by using a longer prefix.
  • Atomic permission: a logic group of authorizations that always work together, and are not to be divided in the full life-cycle (as far as the AKASA developers are concerned).
  • IAM policy: an AWS entity, which is a logical group of permissions (not necessarily
    atomic permissions)
  • IAM user: an AWS IAM principal that can be assigned with multiple IAM policies. The IAM user thus owns a union of the permissions from these policies (use policy evaluation logic to handle conflicts). An IAM user can operate AWS resources through two approaches (both can be turned on and off): AWS console (through password and MFA), and AWS CLI (through access_key_id and secret_access_key). One IAM user should be used by a single person (i.e. no password-sharing).
  • IAM role: an AWS IAM principal that is similar to IAM user, with the following differences: IAM role is not password-based and can be “assumed” by different IAM principals (such as IAM users); IAM role can access AWS CLI but not AWS console. It is used like a hat that can be worn by different people (or services) as needed.
  • IAM instance profile: an AWS IAM principal that is a “container” for one IAM role. The IAM instance profile can be attached to an EC2 instance, and as a result, the contained IAM role can be assumed by the EC2 instance.
  • Policy: A self-defined entity in our framework that contains a single atomic permission.
  • PolicyGroup: A self-defined entity in our framework that contains Policies and other PolicyGroups (thus, it is a recursive definition).

Background, Challenges, and Solutions

Background

At AKASA, we have many developers that use our resources in the AWS account, and we need to track each action they take. CloudTrail helps us to do just that by recording all the AWS API calls or, in simple words, “their actions” with principalId. IAM user and IAM role are both principal types, thus they are logged as principalId in the API calls.

We want our developers to do their magic in the devservers (EC2 instances in the cloud), rather than using their own computers. Laptops are used as terminals to connect into the remote devserver. If we create an IAM user for each engineer, nothing can prevent the engineer from using the access_key_id of the IAM user in the laptops. As discussed above, this is against our security standards. IAM roles do not have this problem, as they are not password-based. IAM roles can be attached only to EC2 instances, but not laptops. Therefore, we decided to use IAM roles instead of IAM users.

Please note that the assumed IAM role session expires in a configurable amount of time and needs to be refreshed. The auto-refresh feature is already implemented by AWS through instance profile for AWS EC2 instances. We just have to wrap the IAM role into an instance profile and connect it with the EC2 instance, then the instance will auto-refresh the role.

Challenges

AWS’s design philosophy is very simple but can pose complications and challenges. IAM roles are like hats that can be worn (“assumed”) by different AWS principals. But we want to have “personal” roles, meaning each role is only taken by a single engineer (more specifically, the engineer’s devserver). Since the engineers need different permissions, managing these IAM roles manually becomes more and more complicated and error-prone. Let’s take a look at a scenario.

  • The ML team has five engineers, and they all need a set of permissions. DevOps creates an IAM policy called ML-policy-base with these permissions and assigns the engineers.
  • ML team’s tech lead needs some special permissions. Adding these permissions to ML-policy-base is not acceptable, as doing so means granting permission to all members of the ML team, which violates the principle of least privilege. So, DevOps creates another IAM policy called ML-policy-lead with these special permissions and assigns it to the tech lead.
  • Later on, the ML team’s tech lead wants to delegate some work to a team member and to do so, the person needs to have a subset of permissions in ML-policy-lead. As you can see, we cannot grant ML-policy-lead to this person since s/he only needs a subset of ML-policy-lead. The only solution is creating another IAM policy named ML-policy-delegate with this subset of permissions and assign to the person. You should be able to see an issue here: ML-policy-delegate and ML-policy-lead have some duplicated permissions.
  • A new team, the DS team, needs a subset of permissions in ML-policy-base to perform their work. The same issue arises: we cannot grant ML-policy-base to the DS team’s members since ML-policy-base contains more than what they need. The DevOps has to create a new IAM policy, named DS-policy-base, to be used by the DS team. You should be able to see the issue again: ML-policy-base and DS-policy-base have duplicated permissions.

The Solution

So, the question now is why duplicated permissions in different IAM policies are bad. The answer is simple: it becomes increasingly difficult to maintain.

Whenever making a change to a permission, we have to make the same change to all the IAM policies that contain the permission. So, a quick answer is defining a set of atomic permissions and using a tree data structure with multi-level inheritance. Details are shown in the following section.

Another issue is lack of version control with AWS IAM. We want to track the history of permission changes, and we want permission changes to be peer-reviewed. So, the solution is using code in a GitHub repo to manage these atomic permissions and the tree structure. That’s why this article is named “IAM Management As Code.”

IAM Management

akasa

Design

The architecture of IAM as code is shown above. It is a Directed-Acyclic-Graph (DAG). The root of the DAG is a set of Policies, each of which contains an atomic permission. The PolicyGroups inherit from the Policies and other PolicyGroups.

For example, Base PolicyGroup contains three Policies that are needed for all the engineers at AKASA, such as a permission to access an S3 CDN bucket, and a permission to read a secret in SecretsManager. TeamADev PolicyGroup contains Base PolicyGroup and one additional Policy (which allows deploying to the dev environment). TeamAProd PolicyGroup contains TeamADev PolicyGroup and one more Policy (this one allows deploying to the prod environment).

TeamBDev and TeamBProd are similar to team A PolicyGroups. The bottom-level PolicyGroups have a one-to-one mapping with individual engineers. As you can see, this design is very flexible: Eng B works cross-functionally and thus has TeamBDev PolicyGroup in addition to TeamAProd PolicyGroup. We use a python script to render CloudFormation manifests from the bottom-level PolicyGroups.

Each CloudFormation manifest contains a few IAM policies, an IAM role (associated with the IAM policies), and an IAM instance profile. Each manifest will be deployed into AWS as a CloudFormation stack. The IAM instance profile is assigned to an individual developer’s devserver. No developer can use other people’s IAM instance profile.

Class of Policy

Each instance of Policy contains an atomic permission. Note that atomic permission is defined based on AKASA engineering needs. The AWS resources’ naming needed to be designed appropriately to accommodate this need.

For example, we separate clients’ data using S3 prefixes, like s3://ops-data/{client}. Then the authorizations needed to read from s3://ops-data/tcm/* is an atomic permission and should be used by an IAM user for client tcm. The authorizations needed to read from s3://ops-data/* is also an atomic permission, for the usage of developers. (AKASA developers work across all clients, instead of being assigned to specific clients.)

from typing import List
INDENT = '  '
DEFAULT_INDENT = 5
class Condition():
    def __init__(
        self,
        name,
        operator: str,
        condition_key: str,
        values: List[str],
    ):
        self.name = name
        self.operator = operator
        self.condition_key = condition_key
        self.values = values

    def to_yaml(self, **kargs) -> str:
        """Convert the condition to a yaml string"""
        base_indent = (DEFAULT_INDENT + 2) * INDENT
        name_str = f"\n{base_indent}# {self.name}"
        operator_str = f"\n{base_indent}{self.operator}:"
        condition_str = f"\n{base_indent}{INDENT}{self.condition_key}:"
        values_str = ''
        for value in self.values:
            values_str += f"\n{base_indent}{2*INDENT}- \"{value}\""
        yaml = name_str + operator_str + condition_str + values_str
        for k, v in kargs.items():
            self._template_replace(yaml, k, v)
        return yaml

    def __lt__(self, other):
        return self.name 
    def _template_replace(self, template, key, value):
        return template.replace("{{{}}}".format(str(key)), value)

class Policy():
    def __init__(
        self,
        name: str,
        effect: str,
        actions: List[str],
        resources: List[str],
        conditions: List[Condition] = None,
    ):
        """Initialize the Policy object"""
        self.name = name
        self.effect = effect
        self.actions = actions
        if isinstance(actions, str):
            raise ValueError("Actions must be a list, got %r." % actions)
        self.resources = resources
        self.conditions = conditions

    def to_yaml(self, **kargs) -> str:
        """Convert the policy to a yaml string"""
        base_indent = DEFAULT_INDENT * INDENT
        name_str = f"\n{base_indent}# {self.name}"
        effect_str = f"\n{base_indent}- Effect: \"{self.effect}\""
        action_str = ''
        for action in self.actions:
            action_str += f"\n{base_indent}{2*INDENT}- \"{action}\""
        action_str = f"\n{base_indent}{INDENT}Action:{action_str}"
        resource_str = ''
        for resource in self.resources:
            resource_str += f"\n{base_indent}{2*INDENT}- \"{resource}\""
        resource_str = f"\n{base_indent}{INDENT}Resource:{resource_str}"
        condition_str = ''
        if self.conditions is not None:
            condition_str += f"\n{base_indent}{INDENT}Condition:"
            for condition in self.conditions:
                condition_str += condition.to_yaml()
        yaml = name_str + effect_str + action_str + resource_str + condition_str
        for k, v in kargs.items():
            yaml = self._template_replace(yaml, k, v)
        return yaml

    def __lt__(self, other):
        return self.name 
    def _template_replace(self, template, key, value):
        return template.replace("{{{}}}".format(str(key)), value)

An atomic permission needs a name, effect, a list of actions, a list of resources, and an optional list
of conditions. See below for an example Policy named S3AllowListOnBucketMldata that allows s3:ListBucket on the bucket ops-data.

Note that s3:ListBucket grants permission to list some or all objects in this bucket, and the resource for it must be buckets. If one wants to allow listing only a subset of objects, s/he can do it by using setting up a condition, see example of S3AllowListOnBucketMldataPrefixDevserver, which allows s3:ListBucket only on prefixes that looks like devserver/* (regex-wise).

POLICY_S3_LIST_MLDATA_BUCKET = Policy(
    'S3AllowListOnBucketMldata',
    'Allow',
    ['s3:ListBucket'],
    [
        'arn:aws:s3:::ops-data',
    ],
)
POLICY_S3_LIST_MLDATA_BUCKET_PREFIX_DEVSERVER = Policy(
    'S3AllowListOnBucketMldataPrefixDevserver',
    'Allow',
    ['s3:ListBucket'],
    [
        'arn:aws:s3:::ops-data',
    ],
    conditions = [ Condition(
        name=f"RestrictListBucketToPrefix",
        operator="StringLike",
        condition_key="s3:prefix",
        values=["devserver/*"])
    ]
)

As shown in the architecture, the middle and leaf nodes are PolicyGroups. Each leaf node is mapped to an engineer. Code of such mapping is in the Generator.

As shown in the code below, PolicyGroup contains a list of Policies and a list of PolicyGroups. The flatten() function will fetch all the Policies recursively through a depth-first search. The to_yaml() function will sort these flattened Policies by name and render them to a YAML format. The generated yaml files will be managed by git. Sorting is mainly for ease of showing file differences
in GitHub pull-requests.

class PolicyGroup():
    def __init__(self, policies=None, policy_groups=None):
        """Initialize the object with policies and policy_groups. Note that PolicyGroup has a nested 
        definition, i.e. a policyGroup can contain other policyGroups. But there is no need to do 
        cycle detections, since the policyGroups are immutable, thus a parent can never refer to its
        child (the child does not exist yet when the parent is initialized), thus cycle does exist."""

        self.policies = policies
        self.policy_groups = policy_groups
        self.policies_dedupped = None

    def flatten(self):
        if self.policies_dedupped is not None:
            return self.policies_dedupped
        self.policies_dedupped = set()
        if self.policies is not None:
            for policy in self.policies:
                self.policies_dedupped.add(policy)
        if self.policy_groups is not None:
            for policy_group in self.policy_groups:
                self.policies_dedupped.update(policy_group.flatten())
        return self.policies_dedupped

    def to_yaml(self, **kargs) -> str:
        """ Convert the policyGroup to a yaml string"""
        flattened = self.flatten()
        yaml_str = ''
        for policy in sorted(list(flattened)):
            yaml_str += policy.to_yaml(**kargs)
        return yaml_str

Generator

For each engineer, we use CloudFormation to generate an IAM instance profile, which is attached to the engineer’s EC2 devserver. The CloudFormation manifest is generated by filling a template. The template includes a few things: a few IAM policies, an IAM role that is associated with these IAM policies, and an IAM instance profile that “contains” the IAM role.

Theoretically, each PolicyGroup produces a single IAM Policy. Why do we have a few IAM policies in a CloudFormation stack? The reason is AWS IAM has a hard limit of 6,144 characters for each IAM policy (see Ref). The generated IAM policy for a PolicyGroup easily exceeds 6,144 characters. That’s why we need to split it. As of now, we use the heuristic rule, i.e. all S3-related permissions go to the 1st IAM policy, all SecretsManager-related permissions go to the 2nd IAM policy, and the remaining goes to the 3rd IAM policy.

We can always do more splitting if needed.

Template for Instance Profile

The CloudFormation template for IAM instance profile is shown below. {Policies} is a combination of multiple IAM policies, which are generated by the IAM policy template below. Using engineer Jane Doe as an example, RoleName will be PersonalRoleJaneDoe, InstanceProfileName will be PersonalInstanceProfileJaneDoe, and the three PolicyNames will be PersonalPolicyS3JaneDoe, PersonalPolicySecretManagerJaneDoe and PersonalPolicyMiscJaneDoe.

The code used to render the CloudFormation manifest from the template is in github.

AWSTemplateFormatVersion: {TemplateFormatVersion}
Description: {ProfileDescription}

Resources:
{Policies}
  {RoleName}:
    Type: AWS::IAM::Role
    Properties:
      AssumeRolePolicyDocument:
        Version: {AssumeRolePolicyDocumentVersion}
        Statement:
        - Effect: Allow
          Principal:
            Service:
            - ec2.amazonaws.com
          Action:
          - sts:AssumeRole
        - Effect: Allow
          Principal:
            AWS:
            - arn:aws:iam::025412125743:user/ServiceUserProdKrun
          Action:
          - sts:AssumeRole

Path: "/"
          ManagedPolicyArns:
        - arn:aws:iam::aws:policy/CloudWatchAgentServerPolicy
{PolicyReferences}
      RoleName: {RoleName}

  {InstanceProfileName}:
    Type: AWS::IAM::InstanceProfile
    Properties:
      InstanceProfileName: {InstanceProfileName}
      Path: "/"
      Roles:
      - !Ref {RoleName}


IAM policy template:
 {PolicyName}:
    Type: AWS::IAM::ManagedPolicy
    Properties:
      Description: {PolicyDescription}
      ManagedPolicyName: {PolicyName}
      Path: /
      PolicyDocument:
        Version: {PolicyDocumentVersion}
        Statement: {Effects}

Note that in addition to personal roles, the framework discussed above can be applied to other kinds of IAM needs, as long as appropriate templates are defined.

For example, to use the framework to manage an IAM user, the template is defined like below. We won’t go into details for this template in this article.

AWSTemplateFormatVersion: {TemplateFormatVersion}
Description: {StackDescription}

Resources:
  {PolicyName}:
    Type: AWS::IAM::ManagedPolicy
    Properties:
      Description: {PolicyDescription}
      ManagedPolicyName: {PolicyName}
      Path: "/"
      PolicyDocument:
          Version: {PolicyDocumentVersion}
          Statement: {Effects}

  {GroupName}:
    Type: AWS::IAM::Group
    Properties:
      GroupName: {GroupName}
      ManagedPolicyArns:
        - !Ref {PolicyName}

  {UserName}:
    Type: AWS::IAM::User
    Properties: 
      Groups: 
        - !Ref {GroupName}
      UserName: {UserName}

Procedure of Changes

Using the IAM-as-code discussed above, the procedure to make IAM changes is as below.

  • An engineer makes code changes in the python scripts
  • The engineer runs CloudFormation generator, which modifies/adds files to the repo
  • The engineer submits a Github PR with the file changes
  • The DevOps team reviews the PR, comments, and fixes
  • The engineer (or DevOps) merges the PR
  • The DevOps team deploys the IAM changes (only DevOps team has permissions to do so)

Example: Add read permission to an S3 bucket (and prefix) to a team

The tech lead of the team needs to create an S3-read Policy, see below. It allows listing all objects in the bucket and reading the objects in prefix streaming/*.
And then, the tech lead adds the Policy to the PolicyGroup of the team.

Since each ML team member’s PolicyGroup either contains the POLICY_GROUP_TEAM_ML or is POLICY_GROUP_TEAM_ML itself, this change will be reflected in all ML team members.

POLICY_S3_READ_MLDATA_BUCKET_PREFIX_STREAMING = Policy(
    'S3AllowReadOnBucketMldataPrefixStreaming',
    'Allow',
    [
        's3:GetObject',
        's3:GetObjectAcl',
        's3:GetObjectVersion',
        's3:ListBucket',
        's3:ListObjectVersions',
        's3:GetBucketLocation',
    ],
    [
        'arn:aws:s3:::ml-data',

        'arn:aws:s3:::ml-data/streaming/*',
    ],
)


POLICY_GROUP_TEAM_ML = PolicyGroup(policies=[
    ## Existing policies ...
    POLICY_S3_READ_MLDATA_BUCKET_PREFIX_STREAMING,
])

Example: The tech lead needs a special permission

Same as above, the tech lead (name is John Doe) creates a Policy for the special permission. And then s/he adds the new Policy to the personal PolicyGroup, see below. If the tech lead used to use JohnDoe: POLICY_GROUP_TEAM_ML mapping in the generator, now he can change it to JohnDoe: POLICY_GROUP_JOHN_DOE. You can tell how flexible the system is.

POLICY_ML_SPECIAL = ...


POLICY_GROUP_JOHN_DOE = PolicyGroup(
    policies=[
        POLICY_ML_SPECIAL,
    ],
    policiGroups=[
        POLICY_GROUP_TEAM_ML,
    ],
)

Additional Discussion

Since we use Kubernetes (AWS-managed EKS), developers need to be authenticated when operating the Kubernetes cluster. We use a ConfigMap to map the developers’ IAM roles to a ClusterRole (called eng-clusterrole), which has the proper authorizations to operate the Kubernetes cluster. If you do not know ClusterRole in Kubernetes, refer to this link.

Since we restrict the developers’ access to the AWS console, a problem arises: how can the developers get the CodeBuild artifact and logs? (Context: at AKASA, we use AWS CodeBuild to build the docker images or artifacts.) So our solution is to build a tool called cbgm, which is a script containing AWS CLI commands, to assist the developers in fetching CodeBuild logs and artifacts.

Summary

It was a long journey, but after much collaboration and trial and error, we found a system that works for our team and allows us to build world-class automation for healthcare operations. To recap:

  • Each engineer has his/her own IAM role, thus, all logged activities have a unique owner and can be easily tracked.
  • Due to the multi-inheritance structure, we eliminate duplicates.
  • Principle of least privilege. The DevOps team has a centralized control that adheres to this principle.
  • Changing permission is a systematic, fast, and secure process.

If you’re interested in joining our talented team and want to help us drive down the cost of healthcare in the U.S., we’re always looking for talented engineers. Be sure to check out our open positions today. We can’t wait for you to be a part of our growing team.

AKASA is hiring: help us build the future of healthcare with AI. See open positions.

You may also like

Blog auto
Feb 20, 2024

How the AKASA Engineering Team Created an Automation Solution for Database Migrations

AKASA builds products and tools to improve the various components of revenue cycle management (medical billing) for hospital systems....

Blog Resource
May 1, 2023

ChatGPT and Healthcare: Exciting Potential That Needs To Be Channeled

Recently I heard that as a fun exercise, the security officer at one of our healthcare clients tried asking...

Blog Resource
Jun 12, 2023

Overcoming the Top 3 Challenges Holding Back Healthcare Innovation

Healthcare is notoriously slow at adapting and incorporating new technologies into day-to-day operations. Healthcare lags behind as one of...

Blog Resource
Jun 12, 2023

7 IT Mistakes You’re Making With Your RCM Automation Partner

The right revenue cycle management (RCM) automation is capable of helping healthcare organizations overcome a litany of issues —...

Blog Resource
Jun 12, 2023

Questions Healthcare IT Teams Should Ask About Revenue Cycle Automation

RCM leaders at your organization are discussing automation. Period. The healthcare revenue cycle is fighting non-stop battles. Staffing challenges...

Blog Resource
Jul 23, 2024

9 Healthcare Technology Trends To Watch

Keeping track of the rapid changes in healthcare technology is no small task. The industry has seen numerous healthcare...

Blog Resource
Nov 30, 2022

The Gradient Podcast: An Interview on AI and Healthcare With AKASA CTO and Co-Founder Varun Ganapathi

On a recent episode of the Gradient Podcast, host Daniel Bashir sat down with AKASA CTO and co-founder, Varun...

Blog Machine Learning in Medicine: Using AI to Predict Optimal Treatments Hero Image
Sep 1, 2022

Machine Learning in Medicine: Using AI to Predict Optimal Treatments

At AKASA, we’re always thinking about how we can use machine learning (ML) and artificial intelligence (AI) to better...

Find out how AKASA's GenAI-driven revenue cycle solutions can help you.