Centralized logging with CloudWatch OAM and HCP Terraform Blog Devoteam Rebirth

Reading time: 10 minutes

This blog post is based on my speech at the Dach Community Day in Munich. In my daily work, I often encounter a frustrating problem. When I need to debug or understand how an application works, I naturally turn to CloudWatch. However, different parts of a project are often divided into different AWS accounts. So I return to the SSO portal and access this new account.

But very frequently, I have to go back to the first account, which is still open in a tab of my browser, but, oh no, I have been disconnected from the account… Embarrassing, isn’t it?

Now that we’ve discussed the problem, let’s see how to access all of our log data without being logged out.

The solution: CloudWatch OAM

Although third-party solutions such as Datadog and Dynatrace exist, they incur costs and require leaving the AWS environment. I’m going to introduce you to a relatively unknown and underutilized AWS feature: CloudWatch Observability Access Manager (OAM).

Source:

Before we begin, I would like to review the basics of observabilitywhose three pillars are:

Metric: These digital data points tell you how your systems and applications are performing. Examples include CPU usage, memory usage, and query latency.
Newspapers: These are records of events in your systems and applications. They provide detailed information about what happened, when and where it happened.
Traces: These track the flow of requests through your applications, helping you understand how different components interact and identify performance bottlenecks.

These three pillars are valid for all applications and infrastructures, not just on AWS. However, in addition to these three pillars, there are two other types of data that CloudWatch OAM can manage:

Application information: This enables in-depth monitoring of your applications, collecting data such as request rates, error rates and response times.
Internet Monitor: This monitors the availability and performance of your applications from the perspective of your end users, giving you insight into how they perceive your services.

Considering this article as an excellent opportunity to simultaneously present to you another deployment method, I will present to you HCP Terraform, the version of Terraform managed by HashiCorp with lots of little extras that do not exist in the community version. You can compare it to what AWS is for on-premises infrastructure.

Why HCP Terraform?

HCP Terraform is a Terraform offering managed by HashiCorp. Think of it as the AWS of the Terraform world. It offers several advantages compared to the community version, such as:

Managed infrastructure: No need to manage your own Terraform infrastructure. HCP takes care of it for you.
Improved collaboration: Enhanced collaboration features make it easier for teams to collaborate.
Integration with other HashiCorp tools: Seamless integration with Vault, Consul and Nomad, expanding your infrastructure management capabilities.

Understand the implementation

Before we start the implementation, I’ll walk you through the full setup to help you understand why we’re doing it this way.

We will create a Terraform stack that will be used to deploy our AWS organization. We could do this with AWS Account Factory for Terraform (AFT) or Control Tower, but these methods each have their own pros and cons. We will therefore manually create our first project for our main account, which will be connected to a stack hosted on a GitHub repository. This Terraform stack will deploy (i.e. create) our HCP Terraform projects on HCP and create our AWS accounts, each of which will be linked to its own HCP project. We could create a single HCP project, but with a true enterprise organization that might contain 100 or even 1,000 AWS accounts, the Terraform stack would quickly become unmanageable. However, if we have one HCP project per account, we will only have one Terraform codebase that will be hosted on one repository. Here is a diagram:

As you can see, the organizational stack will also create the observability account, but it will not have its own HCP project. The account will be managed directly by the organizational stack.

Before we continue, let’s see how Terraform Cloud can access our AWS account to create and modify resources. To do this, we will need an AWS role, which Terraform will assume. Since we will need this role on every account, we need to automate this, and the easiest solution is to create a StackSet. Here is a summary:

Once this role is created, each Terraform stack will be able to create CloudWatch links on child accounts. Remember that CloudWatch is regional, so we need to create one per region.

Implementation

Now let’s move on to implementation. We need two GitHub repositories:

aws-org: This will contain our Terraform code for our AWS organization.
child-accounts-aws-org-childs: This will contain our Terraform code for each child account’s landing zone.

Allow Terraform to deploy AWS resources with an identity provider

We need to manually create the identity provider in our main account to allow Terraform to assume our role:

The documentation with all the details is here:

After that, we move on to creating the role with the strategy:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "*",
            "Resource": "*"
        }
    ]
}

Finally, the relationship of trust. Again, it’s intentionally quite broad, so feel free to narrow the scope based on your needs.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Federated": "arn:aws:iam::12345678912:oidc-provider/app.terraform.io"
      },
      "Action": "sts:AssumeRoleWithWebIdentity",
      "Condition": {
        "StringEquals": {
          "app.terraform.io:aud": "aws.workload.identity"
        },
        "StringLike": {
          "app.terraform.io:sub": "organization:YOUR_ORG:project:*:workspace:*:run_phase:*"
        }
      }
    }
  ]
}

Now we can move on to the HCP Terraform part. To do this, you will need an account, an organization, a project and a workspace. I’ll let you create all that; their user interface is very well guided. Once all this is created, you will need to go to the project settings to inform HCP Terraform of the role we just created. We do this via environment variables by creating TFC_AWS_PROVIDER_AUTH & TFC_AWS_RUN_ROLE_ARN. TFC_AWS_PROVIDER_AUTH should be set to “true” while you identify your role ARN in the second.

Getting started with our Terraform organization stack

Now let’s create our Terraform stack and code. If you selected CLI to launch runs when creating your workspace, you can start by adding the backend. This is similar to what we usually do, but instead of specifying an S3 bucket, we’ll specify a Terraform Cloud project:

terraform {
  cloud {
    organization = "filol-tf-org"

    workspaces {
      name = "aws-organisation"
    }
  }
}

Once that’s done, let’s just create a few AWS accounts:

resource "aws_organizations_account" "fradex_tgtg_dev" {
 name              = "fradex-tgtg-dev"
 email             = "[email protected]"
 close_on_deletion = true
 parent_id         = aws_organizations_organizational_unit.projects.id
}

resource "aws_organizations_account" "fradex_babar_dev" {
 name              = "fradex-babar-dev"
 email             = "[email protected]"
 close_on_deletion = true
 parent_id         = aws_organizations_organizational_unit.projects.id
}

resource "aws_organizations_account" "observability" {
 name              = "observability"
 email             = "[email protected]"
 close_on_deletion = true
 parent_id         = data.aws_organizations_organization.this.roots[0].id
}

Automating IAM role creation with CloudFormation

As we mentioned earlier, we need to automate the creation of a role on each account so that Terraform can create resources on AWS. For this, we chose a CloudFormation StackSet solution.

resource "aws_cloudformation_stack_set_instance" "main" {
 deployment_targets {
   organizational_unit_ids = [
     data.aws_organizations_organization.this.roots[0].id
   ]
 }
 region         = "eu-west-1"
 stack_set_name = aws_cloudformation_stack_set.main.name
}


resource "aws_cloudformation_stack_set" "main" {
 permission_model = "SERVICE_MANAGED"
 name             = "main"
 capabilities = ["CAPABILITY_NAMED_IAM", "CAPABILITY_IAM"]
 auto_deployment {
   enabled = true
 }
 template_body = file(
   "cf-iam/template.yaml"
 )
}

And here is the model:

AWSTemplateFormatVersion: '2010-09-09'
Resources:
  IAMOIDCProvider00oidcproviderappterraformio00JEAcz:
    Type: "AWS::IAM::OIDCProvider"
    Properties:
      ClientIdList:
      - "aws.workload.identity"
      ThumbprintList:
      - "9e99a48a9960b14926bb7f3b02e22da2b0ab7280"
      Url: "
  MyIAMRole:
    Type: 'AWS::IAM::Role'
    Properties:
      RoleName: hcp-terraform
      AssumeRolePolicyDocument:
        Version: '2012-10-17'
        Statement:
          - Effect: Allow
            Principal:
              Federated: !Sub 'arn:aws:iam::${AWS::AccountId}:oidc-provider/app.terraform.io'
            Action: 'sts:AssumeRoleWithWebIdentity'
            Condition:
              StringEquals:
                'app.terraform.io:aud': 'aws.workload.identity'
              StringLike:
                'app.terraform.io:sub': 'organization:filol-tf-org:project:*:workspace:*:run_phase:*'
      Policies:
        - PolicyName: AdministratorAccessPolicy
          PolicyDocument:
            Version: '2012-10-17'
            Statement:
              - Effect: Allow
                Action: '*'
                Resource: '*'

Outputs:
  IAMRoleArn:
    Description: 'The ARN of the created IAM Role'
    Value: !GetAtt MyIAMRole.Arn

For this project, and to show you the diversity of HCP Terraform’s capabilities, I do not use the same deployment mode for child accounts. We will configure a GitHub trigger for HCP Terraform. This means that as soon as we have a change on GitHub, the HCP Terraform stack will be automatically triggered to make the changes. This is similar to what we could have done with a CI/CD pipeline, but there’s almost nothing to do here.

resource "tfe_oauth_client" "filol-tf-org" {
 name             = "my-github-oauth-client"
 organization     = data.tfe_organization.filol-tf-org.name
 api_url          = "
 http_url         = "
 oauth_token      = var.github_oauth_token
 service_provider = "github"
}

Automating Terraform Workspace Management with Terraform

Now that all that is done, let’s create all our Terraform workspaces in a few lines:

resource "tfe_workspace" "accounts" {
 for_each               = local.foreach_childs_accounts
 organization           = "filol-tf-org"
 project_id             = "prj-fakeidwarning"
 name                   = "${each.value.id}-${each.value.name}"
 auto_apply             = true
 auto_apply_run_trigger = true
 vcs_repo {
   identifier         = "filol/aws-org-childs-accounts"
   ingress_submodules = true
   oauth_token_id     = tfe_oauth_client.filol-tf-org.oauth_token_id
 }
}

When I edit my organization stack, I also want to retrigger all child account stacks because I will later create dependencies between these two accounts.

resource "tfe_run_trigger" "accounts" {
 for_each      = local.foreach_childs_accounts
 workspace_id  = tfe_workspace.accounts[each.key].id
 sourceable_id = data.tfe_workspace.this.id
}

Of course, we must also think about creating our variables in each workspace with the roles that will be created by our StackSet.

resource "tfe_variable" "TFC_AWS_PROVIDER_AUTH" {
 for_each     = local.foreach_childs_accounts
 key          = "TFC_AWS_PROVIDER_AUTH"
 value        = "true"
 category     = "terraform"
 workspace_id = tfe_workspace.accounts[each.key].id
}

resource "tfe_variable" "TFC_AWS_RUN_ROLE_ARN" {
 for_each     = local.foreach_childs_accounts
 key          = "TFC_AWS_RUN_ROLE_ARN"
 value        = "arn:aws:iam::${each.value.id}:role/hcp-terraform"
 category     = "terraform"
 workspace_id = tfe_workspace.accounts[each.key].id
}

I’ll let you run Terraform to create our accounts and different projects.

Once this is done we can do a quick test and verify that everything is working. I make a change to my organization’s stack:

And as soon as it’s finished, I see that all the other stacks are being updated:

Once this is done, let’s move on to configuring our AWS account, dedicated to observability. To do this, we need to configure two other AWS providers:

provider "aws" {
 region = "us-east-1"
 assume_role {
   role_arn = "arn:aws:iam::${aws_organizations_account.observability.id}:role/OrganizationAccountAccessRole"
 }
 alias = "observability_us-east-1"
}

provider "aws" {
 region = "eu-west-1"
 assume_role {
   role_arn = "arn:aws:iam::${aws_organizations_account.observability.id}:role/OrganizationAccountAccessRole"
 }
 alias = "observability_eu-west-1"
}

Implementing our Cloudwatch data sink: Sink

Let’s configure our CloudWatch service on this account to allow it to receive data from our entire AWS organization:

resource "aws_oam_sink" "central_logging_sink" {
 provider = aws.observability_us-east-1
 name     = "central-logging-sink-org"
}

resource "aws_oam_sink" "central_logging_sink_eu-west-1" {
 provider = aws.observability_eu-west-1
 name     = "central-logging-sink-org"
}

resource "aws_oam_sink_policy" "central_logging_sink_policy" {
 provider        = aws.observability_us-east-1
 sink_identifier = aws_oam_sink.central_logging_sink.id
 policy          = local.sink_policy
}

resource "aws_oam_sink_policy" "central_logging_sink_policy_eu-west-1" {
 provider        = aws.observability_eu-west-1
 sink_identifier = aws_oam_sink.central_logging_sink_eu-west-1.id
 policy          = local.sink_policy
}

You will find the policy used below:

locals {
  sink_policy = <<-EOT
  {
    "Version": "2012-10-17",
    "Statement": [
      {
        "Effect": "Allow",
        "Principal": "*",
        "Action": ["oam:CreateLink", "oam:UpdateLink"],
        "Resource": "*",
        "Condition": {
          "ForAllValues:StringEquals": {
            "oam:ResourceTypes": ["AWS::Logs::LogGroup", "AWS::CloudWatch::Metric", "AWS::XRay::Trace", "AWS::ApplicationInsights::Application"]
          },
          "ForAnyValue:StringEquals": {
            "aws:PrincipalOrgID": "${data.aws_organizations_organization.this.id}"
          }
        }
      }
    ]
  }
  EOT
}

Implementation of our Cloudwatch data receiver: Link

Once this is done, we can go to our Terraform stack that manages all the child accounts and create the resource that will send the data to our observability account:

resource "aws_oam_link" "oam_source_link" {
 sink_identifier = var.central_logging_sink
 label_template  = var.account_name
 resource_types  = ["AWS::Logs::LogGroup", "AWS::CloudWatch::Metric", "AWS::XRay::Trace", "AWS::ApplicationInsights::Application"]
}

resource "aws_oam_link" "oam_source_link_eu-west-1" {
 provider = aws.aws_eu-west-1
 sink_identifier = var.central_logging_sink_eu_west_1
 label_template  = var.account_name
 resource_types  = ["AWS::Logs::LogGroup", "AWS::CloudWatch::Metric", "AWS::XRay::Trace", "AWS::ApplicationInsights::Application"]
}

And here is the result:

We notice that a toast appears at the top right, indicating that this is a global observability account. We receive data from different accounts despite having the same log group name in the logs.

Cost of this solution

Don’t worry about the cost of monitoring; it’s free! You can find this information in the AWS documentation:

“Cross-account observability comes with no additional costs for logs and metrics. CloudWatch provides the first trace copy stored in the first monitoring account at no additional cost. Any trace copies sent to additional monitoring accounts are billed to the source accounts for recorded traces based on AWS X-Ray pricing. Standard CloudWatch pricing applies to features used in account monitoring, such as CloudWatch Dashboards, Alarms, or Logs Insights queries.

Source: October – 2024

Conclusion

To conclude, we saw a free way to consolidate our observability data across all of our accounts without an external tool, using CloudWatch OAM. With this, we are able to quickly start enterprise-grade monitoring and integrate some automations based on that.

We just need to create a receiver, also called a sink, inside an account that will become the global observability account. As with any common AWS resource, we need to attach a policy to allow external resources (or accounts here) to send data; we can authorize the whole organization with one keyword. And finally, create a link inside each account from which we want to send the data. As it is a regional approach, we have to do it for each region.

Technology