Background
Let’s say there’s a legacy platform called XXX, and it still poses an operational cost for SRE to manage the monitoring / alerting / upgrade etc.
At present, XXX’s ElasticSearch architecture consists of 3 hosts (m4.xlarge) acting as master, client and data nodes, for XXX’s searching functionality.
All of them are using Ubuntu Trusty (14.04) image, which is no longer supported and need to be migrated to Xenial (16.04). Its usage seems pretty low and cost wise so we would like to migrate the XXX ES to AWS managed ElasticSearch Service to reduce operational overhead.
This doc will cover the project plan / checklist for migration.
Solution overview
Elasticsearch (ES) indexes can be migrated with following steps:
Create baseline indexes
- Create a snapshot repository and associate it to an AWS S3 Bucket.
- Create the first snapshot of the indexes to be migrated, which is a full snapshot on EC2 - The snapshot will be automatically stored in the AWS S3 bucket created in the first step.
- Restore this full snapshot to the AWS ES.
Periodic incremental snapshots
- Repeat serval incremental snapshot and restore.
Final snapshot and service switchover
- Stop services which can modify index data.
- Create a final incremental snapshot of the EC2.
- Perform service switchover to the AWS ES.
Checklist
Here’s a list of what will we do in detail:
Assess and analyse current data
The ES data is around 30G, setting number_of_replicas
to 1, then according to the simplified version of calculation:
1 | Source Data * (1 + Number of Replicas) * 1.45 = Minimum Storage Requirement |
So the minimum storage for AWS managed ES is about 90G.
1 | curl $es/_cat/indices?v |
Create an S3 bucket for current Elasticsearch data (on EC2)
1 | e.g. s3://es-backup (us-west-2) |
Make ES Cluster snapshot and move it to S3
check snapshot repository setttings
1
2
3
4
5
6
7
8
9
10
11curl localhost:9200/_snapshot?pretty
{
"XXX-snapshot-s3-repo" : {
"type" : "s3",
"settings" : {
"bucket" : "es-backup",
"region" : "us-west-2"
}
}
}set snapshot repo to S3
1
2
3
4
5
6
7curl -XPUT localhost:9200/_snapshot/XXX-snapshot-s3-repo?verify=false -d '{
"type": "s3",
"settings": {
"bucket": "es-backup",
"region": "us-west-2"
}
}'check indices
1
curl localhost:9200/_cat/indices?pretty
make snapshots
1
2
3
4
5
6
7
8# specified indices
curl -XPUT "localhost:9200/_snapshot/XXX-snapshot-s3-repo/marvel-20191101?wait_for_completion=true" -d '{
"indices": ".marvel-2019.11.01",
"ignore_unavailable": "true",
"include_global_state": false
}'
{"snapshot":{"snapshot":"marvel-20191024","indices":[".marvel-2019.10.24"],"state":"SUCCESS","start_time":"2019-10-25T00:20:47.625Z","start_time_in_millis":1571962847625,"end_time":"2019-10-25T00:21:06.058Z","end_time_in_millis":1571962866058,"duration_in_millis":18433,"failures":[],"shards":{"total":1,"failed":0,"successful":1}}}check snapshots
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21# check snapshots
curl localhost:9200/_snapshot/XXX-snapshot-s3-repo/_all?pretty
{
"snapshots" : [ {
"snapshot" : "marvel-20191016_20191018",
"indices" : [ "site_v1" ],
"state" : "SUCCESS",
"start_time" : "2019-10-18T01:56:21.344Z",
"start_time_in_millis" : 1571363781344,
"end_time" : "2019-10-18T01:57:56.012Z",
"end_time_in_millis" : 1571363876012,
"duration_in_millis" : 94668,
"failures" : [ ],
"shards" : {
"total" : 200,
"failed" : 0,
"successful" : 200
}
} ]
}
Create and configure AWS Elasticsearch service (version 1.5)
Migrate the data (Restore cluster from S3 to AWS ES)
check current iam instance profile role on XXX ES hosts
1
2
3
4
5
6aws sts get-caller-identity
{
"Account": "XXXXXXXXXXX",
"UserId": "AROAISLXH37RO7WDQU7H2:i-f9f6cf20",
"Arn": "arn:aws:sts::XXXXXXXXXXX:assumed-role/elasticsearch-cloud-aws/i-f9f6cf20"
}set snapshot repository for AWS managed ES
Must sign your snapshot requests, if your access policies specify IAM users or roles.
p.s. If using curl will get the following error message:
{“Message”:”User: anonymous is not authorized to perform: iam:PassRole on resource: arn:aws:iam::XXXXXXXXXXX:role/test-role”}
Example code:
1 | python register_es_repo.py |
check snapshot repository settings
1
2
3
4
5
6
7
8
9
10
11
12curl -XGET $es/_snapshot?pretty
{
"XXX-snapshot-s3-repo" : {
"type" : "s3",
"settings" : {
"bucket" : "es-backup",
"role_arn" : "arn:aws:iam::XXXXXXXXXXX:role/aws-elasticsearch-backup",
"region" : "us-west-2"
}
}
}update search index settings to speed up restore process
1 |
|
restore snapshots from S3 into AWS managed ES
1
2
3
4
5
6
7
8# e.g restore from specific snapshot
curl -XPOST "$es/_snapshot/XXX-snapshot-s3-repo/marvel-20191024/_restore"
# e.g. restore just one index, ".marvel-2019.10.24", from "marvel-20191024" snapshot in the "XXX-snapshot-s3-repo" snapshot repository:
curl -XPOST "$es/_snapshot/XXX-snapshot-s3-repo/marvel-20191024/_restore" -d '{"indices": ".marvel-2019.10.24"}' -H 'Content-Type: application/json'
# e.g restore all indices except for the .kibana index
curl -XPOST "$es/_snapshot/XXX-snapshot-s3-repo/marvel-20191024/_restore" -d '{"indices": "*,-.kibana"}' -H 'Content-Type: application/json'Run
es_migrate.sh
script to snapshot / restore
1 | time bash -x es_migrate.sh |
Switch from self-hosted to AWS managed ES
get ECS task parameter settings and set ES endpoint to AWS managed ES for application
Testing
- General function testing
- Rollback to old ELB in CNAME if errors
- Data Integration Check
Will generate doc_count
diff result for old / new ES (take index .marvel-*
for example):
1 | #index old_doc_count new_doc_count diff_rate |
Tidy up resources in Puppet / Terraform
The last step is that clean up old resources in Puppet or Terraform (e.g. EC2 / SG / Route53 etc.)
Reference
knowledge-center/elasticsearch-indexing-performance
AWS - sizing domains
Migrating your self-hosted ElasticSearch to AWS ElasticSearch Service