Migrate to AWS EC2 with SQL licensing included

While performing a lift and shift migration of Windows SQL Server using the AWS Application Migration Service I was challenged with wanting the newly migrated instance to have a Windows OS license ‘included’ but additionally the SQL Server Standard license billed to the account. The customer was moving away from their current hosting platform where both licenses were covered under SPLA. Rather then going to a license reseller and purchasing SQL Server it was preferred to have all the Windows OS and SQL Server software licensing to be payed through their AWS account.

In the Application Migration Service, under launch settings > Operating System Licensing. We can see all we have is OS licence options available to toggle between license-included and BYOL.

Choose whether you want to Bring Your Own Licenses (BYOL) from the source server into the Test or Cutover instance. This defines whether the launched test or cutover instance will include the license for the operating system (License-included), or if the licensing will be based on that of the migrated server (BYOL: Bring Your Own License).

If we review a migrated instance where ‘license-included’ was selected during launch, using Powershell on instance itself we see only a singular ‘BillingProduct = bp-6ba54002’ for Windows:

((Invoke-WebRequest http://169.254.169.254/latest/dynamic/instance-identity/document).Content | ConvertFrom-Json).billingProducts

bp-6ba54002 

AWS Preferred Approach

There are a lots of options for migrating SQL Server to AWS, so we weren’t without choices.

  1. Leverage the AWS Database Migration Service (DMS) to migrate on-premises Windows SQL Server to a Relation Database Services (RDS).
  2. Leverage the AWS Database Migration Service (DMS) to migrate on-premises Windows SQL Server to AWS EC2 Instance provisioned from a Marketplace AMI which includes SQL licensing.
  3. Leverage SQL Server native tooling between an on-premises Windows SQL Server to AWS EC2 Instance provisioned from a Marketplace AMI which includes SQL licensing. Use either
    1. Native backup and restore
    2. Log shipping
    3. Database mirroring
    4. Always On availability groups
    5. Basic Always On availability groups
    6. Distributed availability groups
    7. Transactional replication
    8. Detach and attach
    9. Import/export

The only concern our customer had with all the above approaches was that there was technical configuration on the source server that wasn’t well understand. The risk of reimplementation on a new EC2 instance and missing configuration was perceived to be high impact.

Solution

The solution was to create a new EC2 instance from a AWS Marketplace AMI that we would like to be billed for. In my case I chose ‘Microsoft Windows Server 2019 with SQL Server 2017 Standard – ami-09ee4321c0e1218c3’.

The procedure is to detach all the volumes (including root) from the migrated EC2 instance that has all the lovely SQL data and attach it to the newly created instance with the updated BillingProducts of ‘bp-6ba54002′ for Windows and bp-6ba54003′ for SQL Standard assigned to it.

If we review a Marketplace EC2 instance where SQL Server Standard was selected using Powershell on the instance:

((Invoke-WebRequest http://169.254.169.254/latest/dynamic/instance-identity/document).Content | ConvertFrom-Json).billingProducts

bp-6ba54002
bp-6ba54003 

How will it work?

This process will require a little outage as both EC2 Instances will have to be stopped to detach the volumes and re-attach. This all happens pretty fast so only expect it to last a minute.

NOTE: The primary ENI interface cannot be changed so there will be an IP swap, so be aware of any DNS updates you may need to do post to resolve the SQL Server being available via hostname to other servers.

The high level process of the script:

  1. Get Original Instance EBS mappings
  2. Stop the instances
  3. Detach the volumes from both instances
  4. Add the Original Instance’s EBS mappings to the New Instance
  5. Tag the New Instance with the Original Instance’s tags
  6. Tag the New Instance with the tag ‘Key=convertedFrom’ and ‘Value=<Original Instance ID>’
  7. Update the Name tag on the Original Instance with ‘Key=Name’ and ‘Value=<OldValue+.old>
  8. Update the Original Instance tags with its original BlockMapping for reference e.g. ‘Key=xvdc’ and ‘Value=vol-0c2174621f7fc2e4c’
  9. Start the New Instance

After the script completes the Original Instance will have the following information:

The New Instance will have the following information:

The volumes connected on the New Instance:

$orginalInstanceID = "i-0ca332b0b062dbe76"
$newInstanceID = "i-0ce3eeadfa27e2f64"
$AccessKey = ""
$Secret = ""
$Region = "ap-southeast-2"

If (!(get-module -ListAvailable | ? {$_.Name -like "*AWS.Tools.EC2*"}))
{                
    Write-Output "WARNING: EC2 AWS Modules Not Installed Yet..." 
    Exit
}
$getModuleResults = Get-Module "AWS.Tools.EC2"
If (!$getModuleResults) 
{
    Write-Output "INFO: Loading AWS Module..."
    Import-Module AWS.Tools.Common -ErrorAction SilentlyContinue -Force
    Import-Module AWS.Tools.EC2 -ErrorAction SilentlyContinue -Force
}
else{
    Write-Output "INFO: AWS Module Already Loaded"
}

Set-AWSCredential -AccessKey $AccessKey -SecretKey $Secret -ProfileLocation $Region
Write-Output "INFO: Getting details $($orginalInstanceID)"
$originalInstance = (Get-EC2Instance -InstanceId $orginalInstanceID).Instances
$orginalBlockMappings = $originalInstance.BlockDeviceMappings
$originalVolumes = @()
Write-Output "INFO: Getting EBS volumes from $($orginalInstanceID)"
ForEach($device in $orginalBlockMappings){
    $Object = New-Object System.Object
    #Get EBS volumes for the machine
    $Object | Add-Member -type NoteProperty -name "DeviceName" -Value $device.DeviceName
    $Object | Add-Member -type NoteProperty -name "VolumeId" -Value $device.ebs.VolumeId
    $Object | Add-Member -Type NoteProperty -name "Status" -Value $device.ebs.Status
    $volume = Get-EC2Volume -VolumeId $device.ebs.VolumeId
    $Object | Add-Member -Type NoteProperty -name "AvailabilityZone" -Value $volume.AvailabilityZone
    $Object | Add-Member -Type NoteProperty -name "Iops" -Value $volume.Iops
    $Object | Add-Member -Type NoteProperty -name "CreateTime" -Value $volume.CreateTime
    $Object | Add-Member -Type NoteProperty -name "Size" -Value $volume.Size
    $Object | Add-Member -Type NoteProperty -name "VolumeType" -Value $volume.VolumeType
    $originalVolumes += $Object
}
Write-Output $originalVolumes | Format-Table
$tempInstance = (Get-EC2Instance -InstanceId $newInstanceID).Instances
$tempBlockMappings = $tempInstance.BlockDeviceMappings
$tempVolumes = @()
Write-Output "INFO: Getting details $($newInstanceID)"
ForEach($device in $tempBlockMappings){
    $Object = New-Object System.Object
    #Get EBS volumes for the machine
    $Object | Add-Member -type NoteProperty -name "DeviceName" -Value $device.DeviceName
    $Object | Add-Member -type NoteProperty -name "VolumeId" -Value $device.ebs.VolumeId
    $Object | Add-Member -Type NoteProperty -name "Status" -Value $device.ebs.Status
    $volume = Get-EC2Volume -VolumeId $device.ebs.VolumeId
    $Object | Add-Member -Type NoteProperty -name "AvailabilityZone" -Value $volume.AvailabilityZone
    $Object | Add-Member -Type NoteProperty -name "Iops" -Value $volume.Iops
    $Object | Add-Member -Type NoteProperty -name "CreateTime" -Value $volume.CreateTime
    $Object | Add-Member -Type NoteProperty -name "Size" -Value $volume.Size
    $Object | Add-Member -Type NoteProperty -name "VolumeType" -Value $volume.VolumeType
    $tempVolumes += $Object
}
Write-Output $tempVolumes | Format-Table
#Lets do the work
Write-Output "INFO: Stop the instance $($orginalInstanceID)...."
try{
    Stop-EC2Instance -InstanceId $originalInstance -ErrorAction Stop
}catch{
    Write-Output "ERROR: $_"
    exit
}
While((Get-EC2Instance -InstanceId $orginalInstanceID).Instances[0].State.Name -ne 'stopped'){
    Write-Verbose "INFO: Waiting for instance to stop..."
    Start-Sleep -s 10
}
Write-Output "INFO: Stop the instance $($newInstanceID)...."
try{
    Stop-EC2Instance -InstanceId $newInstanceID -Force -ErrorAction Stop
}catch{
    Write-Output "ERROR: $_"
    exit
}
While((Get-EC2Instance -InstanceId $newInstanceID).Instances[0].State.Name -ne 'stopped'){
    Write-Verbose "INFO: Waiting for instance to stop..."
    Start-Sleep -s 10
}

Write-Output "INFO: detaching the EBS volumes from $($orginalInstanceID)...."
ForEach($volume in $originalVolumes){
    try{
        Dismount-EC2Volume -VolumeId $volume.VolumeId -InstanceId $orginalInstanceID -Device $volume.DeviceName -ErrorAction Stop
    }catch{
        Write-Output "ERROR: $_"
        exit
    }
}

Write-Output "INFO: detaching the EBS volumes from $($newInstanceID)...."
ForEach($volume in $tempVolumes){
    try{
        Dismount-EC2Volume -VolumeId $volume.VolumeId -InstanceId $newInstanceID -Device $volume.DeviceName -ErrorAction Stop
    }catch{
        Write-Output "ERROR: $_"
        exit
    }
}

Write-Output "INFO: Migrating $($orginalInstanceID) to $($newInstanceID) with $($originalVolumes.Count) connected volumes"
Write-Output "INFO: attaching the EBS volumes to $($newInstanceID)...."
ForEach($volume in $originalVolumes){
    try{
        Add-EC2Volume -VolumeId $volume.VolumeId -InstanceId $newInstanceID -Device $volume.DeviceName -ErrorAction Stop
    }catch{
        Write-Output "ERROR: $_"
        exit
    }
}

Write-Output "INFO: Tagging the $($newInstanceID) with original instance tags"
$orginalInstanceTags = $originalInstance.tags
ForEach($T in $orginalInstanceTags){
    try{
        $tag = New-Object Amazon.EC2.Model.Tag
        $tag.Key = $T.Key
        $value = $T.Value
        $tag.Value = $value
        New-EC2Tag -Resource $newInstanceID -Tag $tag -ErrorAction Stop
    }catch{
        Write-Output "ERROR: $_"
    }
}

Try{
    $tag = New-Object Amazon.EC2.Model.Tag
    $tag.Key = "convertedFrom"
    $value = $orginalInstanceID
    $tag.Value = $value
    New-EC2Tag -Resource $newInstanceID -Tag $tag -ErrorAction Stop
}catch{
    Write-Output "ERROR: $_"
}

Write-Output "INFO: Marking the $($orginalInstanceID) as old"
$orginalInstanceName = ($originalInstance.tags | ? {$_.Key -like "Name"}).Value
If($orginalInstanceName){
    try{
        $tag = New-Object Amazon.EC2.Model.Tag
        $tag.Key = "Name"
        $value = $orginalInstanceName+".old"
        $tag.Value = $value
        New-EC2Tag -Resource $orginalInstanceID -Tag $tag -ErrorAction Stop
    }catch{
        Write-Output "ERROR: $_"
    }
}

Write-Output "INFO: Tagging the $($orginalInstanceID) with original volumes for failback"
ForEach($device in $orginalBlockMappings){
    try{
        $tag = New-Object Amazon.EC2.Model.Tag
        $tag.Key = $device.DeviceName
        $value = $device.ebs.VolumeId
        $tag.Value = $value
        New-EC2Tag -Resource $orginalInstanceID -Tag $tag -ErrorAction Stop
    }catch{
        Write-Output "ERROR: $_"
    }
}

Write-Output "INFO: Starting the instance $($newInstanceID) with newly attached drives...."
try{
    Start-EC2Instance -InstanceId $newInstanceID -Force -ErrorAction Stop
}catch{
    Write-Output "ERROR: $_"
    exit
}
While((Get-EC2Instance -InstanceId $newInstanceID).Instances[0].State.Name -ne 'Running'){
    Write-Verbose "INFO: Waiting for instance to start..."
    Start-Sleep -s 10
}
$filterENI = New-Object Amazon.EC2.Model.Filter -Property @{Name = "attachment.instance-id"; Values = $newInstanceID}
$newInterface = Get-EC2NetworkInterface -Filter $filterENI
Write-Output "INFO: Conversion complete to $($newInstanceID)"
Write-Output "SUCCESS: Try logging into $($newInterface.PrivateIpAddress)"

Thanks Rene and Evan for passing on the idea.


Cognito authentication integration with Django using authorization code grant.

Note: Assumed knowledge of AWS Cognito backend configuration and underlying concepts, mostly it’s just the setup from an application integration perspective that is talked about here.

Recently we have been working on a Django project where a secure and flexible authentication system was required, as most of our existing structure is on AWS we chose Cognito as the backend.

Below are the steps we took to get this working and some insights learned on the way.

Django Warrant

The first attempt was using django_warrant, this is probably going to be the first thing that comes up when you google ‘how to django and cognito’.

Django_warrant works by injecting an authentication backend into django which does some magic that allows your username/password to be submitted and checked against a configured user pool, on success it authenticates you and if required creates a stub django user.

The basics of this were very easy to get working and integrated but had a few issues such as:

  • We still see username/password requests and have to send them on.
  • By default can only be configured for one user pool.
  • Does not support federated identity provider workflows.
  • Github project did not seem super active or updated.

Ultimately we chose not to use this module, however inspiration was taken from its source code to do some of the user handling stuff we implemented later on.

Custom authorization_code workflow implementation

This involves using the cognito hosted login form, which does both user pool and connected identity provider authentication (O365/Azure, Google, Facebook, Amazon) .

The form can be customised with HTML, CSS, images and put behind a custom URL, other aspects of the process and events can be changed and reacted upon using triggers and lambda.

Once you are authenticated in cognito it redirects you back to the page of your choosing (usually your applications login page or custom endpoint) with a set of tokens, using these tokens you then grab the authenticated users details and authenticate them within the context of your app.

The difference between authorization code grant and implicit grant are:

  • Implicit grant
    • Intended for client side authentication (javascript applications mostly)
    • Sends both the id_token (JWT) and acccess_token in the redirect response
    • Sends the tokens with an #anchor before them so it is not seen by the web server
    • https://your-app/login#id_token=n&auth_token=n
  • Authorization code grant
    • Intended for server side authentication
    • Sends a authorization code in the redirect response
    • Sends this as a normal GET parameter
    • https://your-app/login?code=n
    • Your application holds a preconfigured secret
    • Code + secret get turned into id_token token and access_token via oauth2/token endpoint

We chose to use the authorization code grant workflow, it takes a bit more effort to setup but is generally more secure and alleviates any hacky javascript shenanigans that would be needed to get implicit grant working with a django server based backend.

After these steps you can use boto3 or helpers to turn those tokens into a set of attributes (email, name, other custom attributes) kept by cognito. Then you simply hook this up to your internal user/session logic by matching them with your chosen attributes like email, username etc.

I was unable to find any specific library support to handle some aspects of this, like the token handling in python or the django integration so i have included some code which may be useful.

Code

This can be integrated into a view to get the user details from Cognito based on a token, this will be sitting at the redirect URL that cognito returns from.

import warrant
import cslib.aws

def tokenauth(request):
    authorization_code = request.GET.get("code")
    token_grabber = cslib.aws.CognitoToken(
        <client_id>
        <client_secret>
        <domain>
        <redir>
        <region>?
    )

    id_token, access_token = token_grabber.get(authorization_code)

    if id_token and access_token:
        # This uses warrant (different than django_warrant)
        # A helper lib that wraps cognito
        # Plain boto3 can do this also.  
        cognito = warrant.Cognito(
            <user_pool_id>
            <client_id>
            id_token=id_token,
            access_token=access_token,
        )

        # Their lib is a bit broken, because we dont supply a username it wont
        # build a legit user object for us, so we reach into the cookie jar....
        # {'given_name': 'Joe', 'family_name': 'Smith', 'email': 'joe@jtwo.solutions'}
        data = cognito.get_user()._data
        return data
    else:
        return None

Class that handles the oauth/token2 workflow, this is mysteriously missing from the boto3 library which seems to handle everything else quite well…

from http.client import HTTPSConnection
from base64 import b64encode
import urllib.parse
import json

class CognitoToken(object):
    """
    Why you no do this boto3...
    """
    def __init__(self, client_id, client_secret, domain, redir, region="ap-southeast-2"):
        self.client_id = client_id
        self.client_secret = client_secret
        self.redir = redir
        self.token_endpoint = "{0}.auth.{1}.amazoncognito.com".format(domain, region)
        self.token_path = "/oauth2/token"

    def get(self, authorization_code):
        headers = {
            "Authorization" : "Basic {0}".format(self._encode_auth()),
            "Content-type": "application/x-www-form-urlencoded",
        }

        query = urllib.parse.urlencode({
                "grant_type" : "authorization_code",
                "client_id" : self.client_id,
                "code" : authorization_code,
                "redirect_uri" : self.redir,
            }
        )

        con = HTTPSConnection(self.token_endpoint)
        con.request("POST", self.token_path, body=query, headers=headers)
        response = con.getresponse()

        if response.status == 200:
            respdata = str(response.read().decode('utf-8'))
            data = json.loads(respdata)
            return (data["id_token"], data["access_token"])

        return None, None

    def _encode_auth(self):
        # Auth is a base64 encoded client_id:secret
        string = "{0}:{1}".format(self.client_id, self.client_secret)
        return b64encode(bytes(string, "utf-8")).decode("ascii")

Further reading


Tagging EC2 EBS Volumes in Auto Scaling Groups

Tagging becomes a huge part of your life when in the public cloud. Metadata is thrown around like hotcakes, and why not. At cloudstep.io we preach the ways of the DevOps gods and especially infrastructure as code for repeatable and standardised deployments. This way everything is uniform and everything gets a TAG!

I ran into an issue recently where I would build an EC2 instance and capture the operating system into an AMI as part of a CloudFormation stack. This AMI would then be used as part of a launch configuration and subsequent auto scaling group. The original EC2 instance had every tag needed across all parts that make up the virtual machine including:

  • EBS root volume
  • EBS data volumes
  • Elastic Network Interfaces (ENI)
  • EC2 Instance itself

When deploying my auto scaling group all the user level tags I’d applied had been removed from the volumes and ENI. This caused a few issues:

  1. EBS volumes couldn’t be tagged for billing.
  2. EBS volumes couldn’t be snapped based on tag level policies in Lifecycle Manager.
  3. Objects didn’t have a ‘Name’ tag which made it hard in the console to understand which virtual machine instance the object belonged too.

There are two methods I derived to add my tags back that I’ll share with you. The tags needed to be added upon launch of the instance when the auto scaling group added a server. The methods I used were:

  1. The auto scaling group has a Launch Configuration where the ‘User data’ field runs a script block at startup.
  2. Initiate a Lambda whenever CloudTrail logged an API reference of a launch event of an instance using CloudWatch.

Tagging with the User Data property and PowerShell

User data is simply:

When you launch an instance in Amazon EC2, you have the option of passing user data to the instance that can be used to perform common automated configuration tasks and even run scripts after the instance starts. You can pass two types of user data to Amazon EC2: shell scripts and cloud-init directives.

https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/user-data.html
Try {
 # Use the metadata service to discover which instance the script is running on
 $InstanceId = (Invoke-WebRequest '169.254.169.254/latest/meta-data/instance-id').Content
 $AvailabilityZone = (Invoke-WebRequest '169.254.169.254/latest/meta-data/placement/availability-zone').Content
 $Region = $AvailabilityZone.Substring(0, $AvailabilityZone.Length -1)
 $mac = (Invoke-WebRequest '169.254.169.254/latest/meta-data/network/interfaces/macs/').content
 $URL = "169.254.169.254/latest/meta-data/network/interfaces/macs/"+$mac+"/interface-id"
 $eni = (Invoke-WebRequest $URL).content
# Get the list of volumes attached to this instance
 $BlockDeviceMappings = (Get-EC2Instance -Region $Region -Instance $InstanceId).Instances.BlockDeviceMappings
 $Tags = (Get-EC2Instance -Region $Region -Instance $InstanceId).Instances.tag

 }
Catch{Write-Host "Could not access the AWS API, are your credentials loaded?" -ForegroundColor Yellow}
$BlockDeviceMappings | ForEach-Object -Process {
        $volumeid = $_.ebs.volumeid # Retrieve current volume id for this BDM in the current instance
        # Set the current volume's tags
        $Tags | ForEach-Object -Process {
        If($_.Key -notlike "aws:*"){
        New-EC2Tag -Resources $volumeid -Tags @{ Key = $_.Key ; Value = $_.Value } # Add tag to volume
        }
        }
}
# Set the current nics tag
$Tags | ForEach-Object -Process {
  If($_.Key -notlike "aws:*"){
        New-EC2Tag -Resources $eni -Tags @{ Key = $_.Key ; Value = $_.Value } # Add tag to eni
  }
}


This script block is great and works a treat with newly created instances from an Amazon Marketplace AMI’s e.g. a vanilla Windows Server 2019 template. The launch configuration would apply the script as a part of the cfn-init function at startup. Unfortunately I’d already used the cfn-init function as part of the original image customisation and capture, the cfn-init would not re-run and didn’t execute this script block. So back to the drawing board in my scenario.

Tagging with CloudWatch and Lambda Function

The second solution was to create a Lambda function and trigger it using an Amazon CloudWatch Events rule. The Instance ID is parsed from the CloudWatch event in JSON to the Lambda function.

Here is the Lambda function that is written in python2.7 and leverages the boto3 and JSON modules.

from __future__ import print_function
import json
import boto3
def lambda_handler(event, context):
  print('Received event: ' + json.dumps(event, indent=2))
  ids = []
  try:
      ec2 = boto3.resource('ec2')
      items = event['detail']['responseElements']['instancesSet']['items']
      for item in items:
        ids.append(item['instanceId'])
      base = ec2.instances.filter(InstanceIds=ids)
      for instance in base:
        ec2tags = instance.tags
        tags = [n for n in ec2tags if not n["Key"].startswith("aws:") ]
        print('   original tags:', ec2tags)
        print('   applying tags:', tags)
        for volume in instance.volumes.all():
          print('    volume:', volume)
          if volume.tags != ec2tags:
            volume.create_tags(DryRun=False, Tags=tags)
        for eni in instance.network_interfaces:
          print('    eni:', eni)
          eni.create_tags(DryRun=False, Tags=tags)
      return True
  except Exception as e:
    print('Something went wrong: ' + str(e))
    return False   



The OVF package is invalid and cannot be deployed – In the trenches with the AWS Discovery Connector

I was working with a customer recently who had trouble deploying the AWS Discovery Connector to their VMware environment. AWS offer this appliance as an OVA file. For those who aren’t aware, OVA (Open Virtualisation Archive) is an open standard used to describe virtual infrastructure to be deployed on a hypervisor of your choice. Typically speaking, these files are hashed with an algorithm to ensure that the contents of the files are not changed or modified in transit (prior to being deployed within your own environment.)

At the time of writing, AWS currently offer this appliance hashed in two flavours… MD5 or SHA256. All sounds quite reasonable right?

  • Download the OVA with a hash of your choice
  • Deploy to VMware.
  • Profit???

Wrong! I was surprised to receive an email from my customer stating that their deployment had failed (see below.)

There’s a small clue here…

The Solution

My immediate response was to fire up google and do some reading. Surely someone had blogged about this before? After all…. I am no VMware expert. I finally arrived at the VMware knowledge base, where I began sifting through supported ciphers for ESX/ESXi and vCenter. The findings were quite interesting, you can find them summarised below:

  • If your VMware cluster consists of hosts which run ESX/ESXi 4.1 or less (hopefully no one) – MD5 is supported
  • If your VMware cluster consists of hosts which run ESX/ESXi 5.x or 6.0 – SHA1 is supported
  • If your VMware cluster consists of hosts which run ESX/ESXi 6.5 or greater – SHA256 is supported

In the particular environment I was working in, the customer had multiple environments with a mix of 5.5 and 6.0 physical hosts. As I was short on time, I had no real way of telling if the MD5 hashed image would deploy on a newer environment. I also don’t have a VMware development environment to test this approach on (by design.)

After a few more minutes of googling, I was rewarded with another VMware knowledge base article. VMware provide a small utility called “OVFTool.” This applications sole purpose in life is to convert OVA files (you guessed it) ensuring that they are hashed with supported cipher of your choice. In my scenario, the file was re-written using the supported SHA1 cipher. All of this was triggered from a windows command line by executing:

ovftool.exe –shaAlgorithm=SHA1 <source image.ova> <destination image.ova>

After this I was able to successfully deploy the AWS Discovery Connector OVA as expected using my freshly minted image.

You can grab a copy of the tool – here

You can read more about VMware supported ciphers – here

Finally, I should call out that this solution is not specific to deploying the AWS Discovery Connector. Consider this approach if you are experiencing similar symptoms deploying another OVA based appliance in your VMware environment.


AWS ECS CloudFormation Fails – Unable to assume the service linked role.

I ran into an interesting issue when building a new ECS Cluster using CloudFormation. The CloudFormation stack would fail on Type: AWS::ECS::Service with error:

Unable to assume the service linked role. Please verify that the ECS service linked role exists. (Service: AmazonECS; Status Code: 400; Error Code: InvalidParameterException; Request ID: beadf3d5-3406-11e9-828d-b16cd52796ef)

Okay google, what’s this service linked role thingy?

A service-linked role is a unique type of IAM role that is linked directly to Amazon ECS. Service-linked roles are predefined by Amazon ECS and include all the permissions that the service requires to call other AWS services on your behalf.

https://docs.aws.amazon.com/AmazonECS/latest/developerguide/using-service-linked-roles.html

The first few times I ran my stack I assumed that this was for an IAM role that I was needing to assign to the AWS::ECS::Service to perform tasks much like a IamInstanceProfile of Type: AWS::EC2::Instance. When reviewing the available properties for Type: AWS::ECS::Service there was a Role definition:

  • Cluster
  • DeploymentConfiguration
  • DesiredCount
  • HealthCheckGracePeriodSeconds
  • LaunchType
  • LoadBalancers
  • NetworkConfiguration
  • PlacementConstraints
  • PlacementStrategies
  • PlatformVersion
  • Role
  • SchedulingStrategy
  • ServiceName
  • ServiceRegistries
  • TaskDefinition
Role - The name or ARN of an AWS Identity and Access Management (IAM) role that allows your Amazon ECS container agent to make calls to your load balancer.

I had some well defined Type: AWS::IAM::Role objects in my YAML for ECS execution and task roles but none of them were helping me with service linked account issue no matter how far I took the IAM policies.

Solution

To cut a long story and much googling short, the issue was nothing to do with my IAM policies but rather that the very first ECS cluster you create in the console using the getting started wizard creates the linked account in the backend. If your unlike me and read the full article about service linked roles you would have read:

when you create a new cluster (for example, with the Amazon ECS first run, the cluster creation wizard, or the AWS CLI or SDKs), or create or update a service in the AWS Management Console, Amazon ECS creates the service-linked role for you, if it does not already exist.

No mention in the above statement about CloudFormation. As per usual I jumped straight into a CloudFormation template without a test drive of the service and this time my attempt at being clever had given me a few moments of madness.

The easiest fix is to open up AWS CLI and run the following against your account once, then jump back into CloudFormation for YAML fun:

aws iam create-service-linked-role --aws-service-name ecs.amazonaws.com  

Resulting output:

{
    "Role": {
        "AssumeRolePolicyDocument": {
            "Version": "2012-10-17", 
            "Statement": [
                {
                    "Action": [
                        "sts:AssumeRole"
                    ], 
                    "Effect": "Allow", 
                    "Principal": {
                        "Service": [
                            "ecs.amazonaws.com"
                        ]
                    }
                }
            ]
        }, 
        "RoleId": "AROAIXGB2WBYGCXSPXY4O", 
        "CreateDate": "2019-02-19T05:55:58Z", 
        "RoleName": "AWSServiceRoleForECS", 
        "Path": "/aws-service-role/ecs.amazonaws.com/", 
        "Arn": "arn:aws:iam::112233445566:role/aws-service-role/ecs.amazonaws.com/AWSServiceRoleForECS"
    }
}

Job done. It all seemed so simple in retrospect.