0%

ElasticSearch migrated from EC2 to AWS managed ES

Background

Let’s say there’s a legacy platform called XXX, and it still poses an operational cost for SRE to manage the monitoring / alerting / upgrade etc.

At present, XXX’s ElasticSearch architecture consists of 3 hosts (m4.xlarge) acting as master, client and data nodes, for XXX’s searching functionality.

All of them are using Ubuntu Trusty (14.04) image, which is no longer supported and need to be migrated to Xenial (16.04). Its usage seems pretty low and cost wise so we would like to migrate the XXX ES to AWS managed ElasticSearch Service to reduce operational overhead.

This doc will cover the project plan / checklist for migration.

Solution overview

Elasticsearch (ES) indexes can be migrated with following steps:

  • Create baseline indexes

    • Create a snapshot repository and associate it to an AWS S3 Bucket.
    • Create the first snapshot of the indexes to be migrated, which is a full snapshot on EC2 - The snapshot will be automatically stored in the AWS S3 bucket created in the first step.
    • Restore this full snapshot to the AWS ES.
  • Periodic incremental snapshots

    • Repeat serval incremental snapshot and restore.
  • Final snapshot and service switchover

    • Stop services which can modify index data.
    • Create a final incremental snapshot of the EC2.
    • Perform service switchover to the AWS ES.

Checklist

Here’s a list of what will we do in detail:

Assess and analyse current data

The ES data is around 30G, setting number_of_replicas to 1, then according to the simplified version of calculation:

1
Source Data * (1 + Number of Replicas) * 1.45 = Minimum Storage Requirement

So the minimum storage for AWS managed ES is about 90G.

1
2
3
4
5
6
7
8
9
10
11
curl $es/_cat/indices?v
health status index pri rep docs.count docs.deleted store.size pri.store.size
green open .marvel-2019.10.12 1 1 115081 0 413mb 206.3mb
green open site-1.0 200 1 937559 884 1.6gb 848.7mb
green open site_v1 200 1 324638 33 335.8mb 167.9mb
green open .marvel-2019.10.14 1 1 115009 0 424.5mb 212.2mb
green open .marvel-2019.10.15 1 1 114962 0 424.7mb 212mb
green open .marvel-2019.10.16 1 1 14290 0 56.3mb 28.3mb
green open .marvel-2019.10.13 1 1 115104 0 410.5mb 205mb
green open search 200 1 20829799 6167280 24gb 12gb
green open .marvel-kibana 1 1 1 0 6.4kb 3.2kb

Create an S3 bucket for current Elasticsearch data (on EC2)

1
e.g. s3://es-backup (us-west-2)

Make ES Cluster snapshot and move it to S3

  • check snapshot repository setttings

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    curl localhost:9200/_snapshot?pretty

    {
    "XXX-snapshot-s3-repo" : {
    "type" : "s3",
    "settings" : {
    "bucket" : "es-backup",
    "region" : "us-west-2"
    }
    }
    }
  • set snapshot repo to S3

    1
    2
    3
    4
    5
    6
    7
    curl -XPUT localhost:9200/_snapshot/XXX-snapshot-s3-repo?verify=false -d '{
    "type": "s3",
    "settings": {
    "bucket": "es-backup",
    "region": "us-west-2"
    }
    }'
  • check indices

    1
    curl localhost:9200/_cat/indices?pretty
  • make snapshots

    1
    2
    3
    4
    5
    6
    7
    8
    # specified indices
    curl -XPUT "localhost:9200/_snapshot/XXX-snapshot-s3-repo/marvel-20191101?wait_for_completion=true" -d '{
    "indices": ".marvel-2019.11.01",
    "ignore_unavailable": "true",
    "include_global_state": false
    }'

    {"snapshot":{"snapshot":"marvel-20191024","indices":[".marvel-2019.10.24"],"state":"SUCCESS","start_time":"2019-10-25T00:20:47.625Z","start_time_in_millis":1571962847625,"end_time":"2019-10-25T00:21:06.058Z","end_time_in_millis":1571962866058,"duration_in_millis":18433,"failures":[],"shards":{"total":1,"failed":0,"successful":1}}}
  • check snapshots

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    # check snapshots

    curl localhost:9200/_snapshot/XXX-snapshot-s3-repo/_all?pretty
    {
    "snapshots" : [ {
    "snapshot" : "marvel-20191016_20191018",
    "indices" : [ "site_v1" ],
    "state" : "SUCCESS",
    "start_time" : "2019-10-18T01:56:21.344Z",
    "start_time_in_millis" : 1571363781344,
    "end_time" : "2019-10-18T01:57:56.012Z",
    "end_time_in_millis" : 1571363876012,
    "duration_in_millis" : 94668,
    "failures" : [ ],
    "shards" : {
    "total" : 200,
    "failed" : 0,
    "successful" : 200
    }
    } ]
    }

Create and configure AWS Elasticsearch service (version 1.5)

Migrate the data (Restore cluster from S3 to AWS ES)

  • check current iam instance profile role on XXX ES hosts

    1
    2
    3
    4
    5
    6
    aws sts get-caller-identity
    {
    "Account": "XXXXXXXXXXX",
    "UserId": "AROAISLXH37RO7WDQU7H2:i-f9f6cf20",
    "Arn": "arn:aws:sts::XXXXXXXXXXX:assumed-role/elasticsearch-cloud-aws/i-f9f6cf20"
    }
  • set snapshot repository for AWS managed ES

Must sign your snapshot requests, if your access policies specify IAM users or roles.

p.s. If using curl will get the following error message:

{“Message”:”User: anonymous is not authorized to perform: iam:PassRole on resource: arn:aws:iam::XXXXXXXXXXX:role/test-role”}

Example code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
python register_es_repo.py

import boto3
import requests
import json
from requests_aws4auth import AWS4Auth

host = 'https://vpc-es-XXX-prod-ji6wgswtt3btfue5pwqwgvudry.us-west-2.es.amazonaws.com/'
region = 'us-west-2'
service = 'es'
credentials = boto3.Session().get_credentials()
awsauth = AWS4Auth(credentials.access_key, credentials.secret_key, region, service, session_token=credentials.token)

# Register repository
path = '_snapshot/XXX-snapshot-s3-repo' # the Elasticsearch API endpoint
url = host + path

payload = {
"type": "s3",
"settings": {
"bucket": "es-backup",
"region": "us-west-2",
"role_arn": "arn:aws:iam::XXXXXXXXXX:role/elasticsearch-cloud-aws"
}
}

headers = {"Content-Type": "application/json"}

r = requests.put(url, auth=awsauth, json=payload, headers=headers)

print(r.status_code)
print(r.text)


# Output
200
{"acknowledged":true}
  • check snapshot repository settings

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    curl -XGET $es/_snapshot?pretty

    {
    "XXX-snapshot-s3-repo" : {
    "type" : "s3",
    "settings" : {
    "bucket" : "es-backup",
    "role_arn" : "arn:aws:iam::XXXXXXXXXXX:role/aws-elasticsearch-backup",
    "region" : "us-west-2"
    }
    }
    }
  • update search index settings to speed up restore process

1
2
3
4
5
6
7
8
9
10
11
12
13
14
# https://aws.amazon.com/premiumsupport/knowledge-center/elasticsearch-indexing-performance/
# For huge index, reduce from 30min to 10min together with incremental snapshot:

# before migration
curl -XPUT "$es/search/_settings" -d' {
"number_of_replicas" : 0,
"refresh_interval" : "-1"
}'

# set it back after migration
curl -XPUT "$es/search/_settings" -d' {
"number_of_replicas" : 1,
"refresh_interval" : "1s"
}'
  • restore snapshots from S3 into AWS managed ES

    1
    2
    3
    4
    5
    6
    7
    8
    # e.g restore from specific snapshot
    curl -XPOST "$es/_snapshot/XXX-snapshot-s3-repo/marvel-20191024/_restore"

    # e.g. restore just one index, ".marvel-2019.10.24", from "marvel-20191024" snapshot in the "XXX-snapshot-s3-repo" snapshot repository:
    curl -XPOST "$es/_snapshot/XXX-snapshot-s3-repo/marvel-20191024/_restore" -d '{"indices": ".marvel-2019.10.24"}' -H 'Content-Type: application/json'

    # e.g restore all indices except for the .kibana index
    curl -XPOST "$es/_snapshot/XXX-snapshot-s3-repo/marvel-20191024/_restore" -d '{"indices": "*,-.kibana"}' -H 'Content-Type: application/json'
  • Run es_migrate.sh script to snapshot / restore

1
2
3
4
5
time bash -x es_migrate.sh

real 10m7.667s
user 0m0.129s
sys 0m0.142s

Switch from self-hosted to AWS managed ES

get ECS task parameter settings and set ES endpoint to AWS managed ES for application

Testing

  • General function testing
  • Rollback to old ELB in CNAME if errors
  • Data Integration Check

Will generate doc_count diff result for old / new ES (take index .marvel-* for example):

1
2
3
4
5
6
7
8
9
10
11
12
13
#index  old_doc_count   new_doc_count   diff_rate

.marvel-2019.11.08 114929 114929 0.0000%
.marvel-2019.11.09 114930 114930 0.0000%
.marvel-2019.11.10 114904 114904 0.0000%
.marvel-2019.11.11 104327 103532 -0.7620%
search 21025214 21025162 -0.0002%
site-1.0 939656 939657 0.0001%
site_v1 324638 324638 0.0000%

real 10m7.303s
user 0m0.156s
sys 0m0.121s

Tidy up resources in Puppet / Terraform

The last step is that clean up old resources in Puppet or Terraform (e.g. EC2 / SG / Route53 etc.)

Reference

knowledge-center/elasticsearch-indexing-performance

AWS - sizing domains

Migrating your self-hosted ElasticSearch to AWS ElasticSearch Service

Welcome to my other publishing channels