AWS EC2/ASG: Set instance health as unhealthy using API an call

ASG instance health

2 min read | by Jordi Prats

If we are running an application on EC2 using an ASG that it's health check fails to properly notify it's status using an HTTP endpoint, if fixing it is not an option, we can create another custom HTTP endpoint to improve the existing, or if the situation cannot be fixed, we can set the instance heath as unhealthy from within the instance itself (or a lambda if needed)

The Amazon EC2 Auto Scaling API have the SetInstanceHealth that allows us to set the instance health on an instance. If we set an instance as Unhealthy using this API call, Amazon EC2 Auto Scaling will terminate and replace it.

To be able to do so, we'll have to add the following policy to the instance profile:

{
  "Version": "2012-10-17",
  "Statement": [
      {
          "Effect": "Allow",
          "Action": [
              "autoscaling:SetInstanceHealth"
          ],
          "Resource": [
              "*",
          ]
      }
  ]
}

Then, we can use python and boto3 to tail some log and decide if this instance needs to be terminated. On the following example the script is going to tail /var/log/messages looking for messages telling us that some processes have been killed by the Out Of Memory Killer:

import subprocess
import requests
import select
import boto3
import os

def get_instance_id():
  response = requests.get('http://169.254.169.254/latest/meta-data/instance-id')
  return response.text

def get_region():
  response = requests.get('http://169.254.169.254/latest/meta-data/placement/region')
  return response.text

DEBUG = os.getenv('DEBUG', False)

f = subprocess.Popen(['tail','-F', "/var/log/messages"], stdout=subprocess.PIPE, stderr=subprocess.PIPE)
p = select.poll()
p.register(f.stdout)

is_healthy = True

while True:
  if p.poll(1):
    line = f.stdout.readline().decode("utf-8")

    if DEBUG:
      print(line, end = '')

    if 'kernel: Out of memory: Kill process' in line:
      print('INFO: Found OOM killer')
      instance_id = get_instance_id()

      if DEBUG:
        print('DEBUG: Instance ' + instance_id + ' is unhealthy')
      else:
        if is_healthy:
          # change instance heatlth status to unhealthy
          autoscaling_client = boto3.client(service_name='autoscaling', region_name=get_region())

          sethealth_response = autoscaling_client.set_instance_health(
                                            InstanceId=instance_id,
                                            HealthStatus='Unhealthy',
                                            ShouldRespectGracePeriod=False
                                          )

          print('DEBUG: set instance '+instance_id+' as unhealthy: '+str(sethealth_response))

          is_healthy = False
        else:
          if DEBUG:
            print('DEBUG: Instance ' + instance_id + ' is already unhealthy')

Posted on 29/08/2022