engineering-blog

How to create a secret rotation function that uses Aurora Data API

Written by Alejandro C De Baca | Feb 18, 2020 7:00:00 AM

TL;DR

  • It absolutely can be done.

  • There are two hurdles to overcome that don’t apply to traditional VPC-attached rotation functions:

    1. Data API can only use the AWSCURRENT version stage of a secret, so you need to create a temporary secret for the testSecret step (depending on your implementation, possibly for other steps as well).

    2. Data API calls require the ARN of the target DB cluster, which you have to construct from the “host” field in a cluster-attached secret.

Prerequisites

This article assumes that you already understand:

  • What Data API is (even if you don’t know how it works under the hood)

  • How to enable Data API on an Aurora Serverless cluster

  • How to create a lambda function in the language of your choice

  • How secret rotation works (particularly the 4 steps to secret rotation)

  • How to create and modify IAM policies for lambda roles

If you are at the proficiency level where you would ask, “Can a secret rotation function use Data API?”, then you’ll be fine.

Introduction

You may find that as you move down the serverless path, the need for VPCs starts to evaporate. Indeed, the AWS Aurora team has suggested that a no-VPC configuration for Aurora Serverless is in the works. If you are using SAM or CloudFormation to deploy lambdas, you also get a nice 45-minute-per-lambda wait when issuing a stack delete because reasons.

Data API (a.k.a. HTTP endpoint, a.k.a RDS data services) provides an AWS API for issuing SQL commands to an Aurora Serverless cluster, and can be used by a lambda function that is not attached to the VPC that contains the cluster. The lambda function’s execution role requires permission to issue Data API calls, and permission to access a secret (and its encryption key if a KMS CMK is specified on creation) that contains database credentials.

This model is great for moving away from VPC-attached lambdas, but there is one missing piece to the puzzle. Security best practices dictate that the secret should be periodically rotated, which can be performed automatically on a schedule by a rotation lambda function. Presently, the AWS-provided rotation functions still have to be attached to the VPC.

Here, we briefly cover the steps involved in authoring your own rotation lambda that uses Data API.

Details

Lifecycle of a secret

A secret is conceptually an IAM-policy protected, KMS-encrypted AWS resource that wraps a JSON document. Actually, it can hold one or two versions of the document, and each version is known as a “version stage”, providing a very lightweight form of multi-version concurrency control that allows a secret to be used by clients while a new version of the secret is being created.

When a new secret is first created, an initial JSON doc is provided to secrets manager containing a randomly-generated password. The document is in the AWSCURRENT version stage, which makes it available to any client with sufficient permissions. Note that Data API always uses the AWSCURRENT version stage to connect to an Aurora Serveless cluster - there is currently no way to specify a secret version stage via Data API.

After initial creation, a secret gets attached to an Aurora Serverless cluster. Attachment adds keys to the AWSCURRENT JSON document containing the fully-qualified DNS name of the cluster endpoint and the TCP port on which the cluster listens inside of the VPC. Note that these fields are irrelevant to Data API, as Data API actions take the ARN of the database as a resource argument.

Upon rotation, a new JSON doc is attached to the secret containing a new password. That doc is in the AWSPENDING version stage, and thus invisible to Data API. The database login credentials are then updated with the new password, and then the new password is tested. Once these steps complete, the AWSCURRENT doc is overwritten with the AWSPENDING doc, and rotation is complete. There is no longer an AWSPENDING version stage on the secret, and the AWSCURRENT version stage is used by all subsequent invocations.

The full details of secret rotation are given in Overview of the Lambda Rotation Function. As a side note, I recommend configuring the secret to alternate between two database logins that have the same privileges to avoid a race condition between the time that the password on the database role is updated and the time that the AWSPENDING version stage is promoted to AWSCURRENT. This is essentially a form of double-buffering with database roles.

Embedded Assumptions and Challenges

The first embedded assumption in the rotation framework is that clients want the DNS name for the cluster endpoint. The cluster-secret attachment contains information that is useful for a client that speaks the wire protocol of the cluster engine type (PostgreSQL or MySQL) and is connecting inside of the VPC. Data API actions, however, require the database ARN. There are two options here:

  1. Meh option. Add the cluster ARN to the document. This is trickier than it sounds if you are using secrets manager to create the initial master password for a new database cluster, because the secret has to be created before the cluster. Indeed, this is part of the reason that the secret-target attachment resource exists. So this has to be done after both resources have been created, and must use a key in the JSON document that does not collide with those used by the attachment.

  2. Slightly-less-meh option. In the rotation function, parse the hostname (leaf-node) record from the DNS name provided by the attachment resource (key=”host”) by string-splitting on dot and taking the first element. Then combine it with the AWS account ID and region to construct the database arn:

    arn:aws:rds:REGION:ACCOUNT_ID:cluster:HOSTNAME

 

The second embedded assumption in the rotation framework is that rotation functions connecting directly to the TCP listener on the database, and are therefore not using the secret to connect. In other words, there is a network path between your lambda and the target cluster, and you are using a MySQL or PostgreSQL client library to communicate with the database. The only step where this assumption becomes problematic is during the testSecret step, in which the rotation function attempts to login to the database using credentials that have not yet been promoted to AWSCURRENT. You can either skip the step and risk promoting an invalid secret should bad luck befall you, or you can dynamically create an ephemeral secret whose AWSCURRENT version stage equals the AWSPENDING version stage of the secret we are rotating.

You will want to destroy this secret immediately after testing, using a “finally” or “ensure” block to avoid resource leaks.

Pseudocode for Rotation Function Steps

Note that the secret function is invoked at least four times per rotation - once for each of the four steps. Any given step, however, can be invoked more than once, so an idempotency token is passed into each of the request events (key: “ClientRequestToken”). This token needs to be passed into the PutSecretValue action, after which point it becomes a key in the VersionIdsToStages map (a.k.a. a “version label”) in the result of a call to DescribeSecret.

createSecret(CRT)

  1. GetSecretValue(SecretArn, “AWSCURRENT”) # Fetch AWSCURRENT to confirm that secret exists in a valid state

  2. DescribeSecret

    1. If VersionIdsToStages[CRT] is a list containing “AWSPENDING”

      1. If GetSecretValue(SecretArn, CRT, “AWSPENDING”) does not raise an exception, halt # Pending secret already exists

    2. Else

      1. Create a copy of the AWSCURRENT JSON doc

      2. Populate copy with a new password and choose the next user name (if multi-user rotation is used)

      3. PutSecretValue(SecretArn, NewSecretJsonDocument, CRT, AWSPENDING)

setSecret(CRT)

  1. Try logging into DB with pending secret via Data API

    1. GetSecretValue(SecretArn, CRT, “AWSPENDING”) # Created in createSecret step

    2. DescribeSecret(SecretArn) # Retrieve KMS Key ID of secret, which we use for temp secret

    3. CreateSecret(PendingSecretJson, KmsKeyId)

    4. ExecuteStatement(TempSecretArn, DatabaseArn, Query) # Run simple query, e.g. “SELECT 1”, via Data API

    5. DeleteSecret(TempSecretArn)

  2. If no authentication error raised during ExecuteStatement, halt # Pending secret already in DB

  3. Else

    1. Try logging into DB with current secret via Data API

      1. GetSecretValue(SecretArn, “AWSCURRENT”) # Fetch DB ARN

      2. ExecuteStatement(SecretArn, DatabaseArn, Query) # Another simple query via Data API

      3. If authentication failure, fail # Invalid state: Neither pending nor current secrets work

      4. Else ExecuteStatement(SecretArn, DatabaseArn, “CREATE OR UPDATE ROLE …”) # Set password for pending DB login

testSecret(CRT)

  1. Try logging into DB with pending secret via Data API

    1. GetSecretValue(SecretArn, CRT, “AWSPENDING”) # Created in createSecret step

    2. DescribeSecret(SecretArn) # Retrieve KMS Key ID of secret, which we use for temp secret

    3. CreateSecret(PendingSecretJson, KmsKeyId)

    4. ExecuteStatement(TempSecretArn, DatabaseArn, Query) # Run simple query, e.g. “SELECT 1”, via Data API

    5. DeleteSecret(TempSecretArn)

  2. If no authentication error raised during ExecuteStatement, halt # Pending secret works

  3. Else fail # Pending secret not active in DB

finishSecret(CRT)

  1. DescribeSecret(SecretArn)

  2. Let Current = Key in VersionIdsToStages whose value is a list containing “AWSCURRENT”

  3. If CRT == Current, halt # New secret for this request is already current

  4. Else UpdateSecretVersionStage(SecretArn, RemoveFromVersionId: Current, MoveToVersionId: CRT, VersionStage: “AWSCURRENT”)

Implementation Notes

In the pseudocode above, the first section of setSecret is nearly identical to the entirety of setSecret, and should probably live in a separate procedure.

Precondition Checks

A common set of step-independent validations on the secret’s state that can be run ahead of each phase:

  1. DescribeSecret(SecretArn)

    1. Fail if rotation is not enabled on this secret

    2. Fail if CRT is not a key present in VersionIdsToStages

    3. Halt if CRT is a key whose value is a list containing “AWSCURRENT”

    4. Fail if CRT is a key whose value does not contain “AWSPENDING” # Neither pending nor current is invalid

IAM policy for rotation function role

The policy below can be used, with a few substitutions for the ${variables}, for the role that the rotation lambda runs. This policy covers all of the pseudocode given above, including precondition validations. Notes:

  • Never blindly copy IAM policies off the internet.

  • Secrets can also have resource policies, which are beyond the scope of this article.

  • The key policy for the KMS CMK used to encrypt the secret must permit at least the KMS actions shown below to the lambda role.

  • If a key alias is used when creating secrets instead of an ARN, you will need to modify the policy to include aliases as well.

  • The policy assumes that all ephemeral secrets are created with a common prefix in the name.

  • If you want to tighten up the statement that includes GetSecretValue using additional conditions, e.g. secretsmanager:Resource/AllowRotationLambdaArn or aws:ResourceTag, you must bear in mind that Data API calls may fail with “User is not authorized to perform secretsmanager:GetSecretValue on resource:…”, even if calls directly to GetSecretValue succeed. Data API at present appears not to work with condition keys. You can use a dedicated prefix for your persistent (non-temporary) secrets if you want to constrain the scope, and you can move DescribeSecretValue, PutSecretValue, and UpdateSecretVersionStage into a separate statement that does include the conditions.

[{
  "Effect": "Allow",
  "Action": ["kms:Decrypt", "kms:GenerateDataKey*"],
  "Resource": "arn:aws:kms:${Region}:${Account}:key/id-of-key-for-secret",
  "Condition": {
    "StringEquals": {
      "kms:ViaService": "secretsmanager.${Region}.amazonaws.com"
    }
  }
},
{
  "Effect": "Allow",
  "Action": ["rds-data:ExecuteStatement", "secretsmanager:GetRandomPassword"],
  "Resource": "*"
},
{
  "Effect": "Allow",
  "Action": "secretsmanager:CreateSecret",
  "Resource": "*",
  "Condition": {
    "StringLike": {
      "secretsmanager:Name": "/prefix/for/temp/secrets/*",
      "secretsmanager:KmsKeyId": "arn:aws:kms:${Region}:${Account}:key/id-of-key-for-secret"
    }
  }
},
{
  "Effect": "Allow",
  "Action": ["secretsmanager:DeleteSecret", "secretsmanager:GetSecretValue"],
  "Resource": "arn:aws:secretsmanager:${Region}:${Account}:secret:/prefix/for/temp/secrets/*"
},
{
  "Effect": "Allow",
  "Action": [
    "secretsmanager:DescribeSecret",
    "secretsmanager:GetSecretValue",
    "secretsmanager:PutSecretValue",
    "secretsmanager:UpdateSecretVersionStage"
  ],
  "Resource": "arn:aws:secretsmanager:${Region}:${Account}:secret:id-of-secret-to-be-rotated"
}]