0%

Error

Error creating application autoscaling target: ValidationException: ECS service doesn’t exist: service/AAA

Encountered this type of dependency issue before, but re-run terraform plan/apply usually can fix it, but not lucky this time :-)

After having a quick look, and found out the module depends_on (dependency between terraform modules) only officially supported for Terraform version 0.13 - hashicorp/terraform#10462 - but we’re on version 0.12 unfortunately.

Workaround

Found the workaround here:

The key insight here is that variables are nodes in the dependency graph too, and so can use them as a “hub” for passing dependencies across the module boundary.

Example

e.g.

In module A (e.g. with resource aws_appautoscaling_target)

  1. Add the following in variables.yml:
1
2
3
4
variable "mod_depends_on" {
type = any
default = null
}
  1. Add depends_on for resource A aws_appautoscaling_target (as depends_on can refer directly to variables in Terraform version 0.12)
1
2
3
4
resource "aws_appautoscaling_target" "target" {
...
depends_on = [var.mod_depends_on]
}

In module B (e.g. with resource aws_ecs_service):

Just need to add the following:

1
mod_depends_on = [module.A]

Reference:

https://discuss.hashicorp.com/t/tips-howto-implement-module-depends-on-emulation/2305

TL;DR

  • ElasticSearch Backup (snapshot / restore) on AWS S3

  • Steps / Configrations for ES snapshot / restore

  • Use elastic curator to manage snapshots (create / remove)

  • Docker image for ES Curator to manage Elasticsearch snapshots - davidlu1001/docker-curator

Overview

The purpose of this blog is to investigate the possible solutions to backup and restore ES indices. So that in the event of a failure, the cluster data can be quickly restored and minimized the business impact.

Read more »

Steps

1. Check existing terraform version

1
terraform -v

2. Upgrade to Terraform 0.11 first (if applicable)

If the current version is not 0.11 then upgrade it to 0.11.14 first. If you are on 0.11.x, please make sure you are on 0.11.14.

3. Pre-upgrade Checklist

Terraform v0.11.14 introduced a temporary helper command terraform 0.12checklist, which analyses the configuration to detect any required steps that will be easier to perform before upgrading.

4. Initialisation in 0.11

1
terraform init

5. Plan and make sure no errors are shown

1
terraform plan

6. Apply

to ensure that your real infrastructure and Terraform state are consistent with the current configuration.

1
terraform apply
Read more »

What are AWS Savings Plans

AWS Savings Plans is launched in November 2019, and allows customers to save up to 72% on Amazon EC2 / AWS Fargate in exchange for making a commitment (on how much they will spend per hour) to a consistent amount of compute usage for a 1 or 3-year term.

Customers can choose how much they wish to commit to (minimum $0.001 per hour) and layer Savings Plans on top of one another.

The major difference between Reserved Instances and AWS Savings Plans are that, rather than committing to a specific instance type in return for a discount, you are committing to a specific spend per hour.

It’s important to note that at this time, AWS doesn’t allow customers to change their Savings Plans contract once purchased or sell unused discounts in the AWS Marketplace. Once customers commit to a Savings Plan price, they are locked in for the one or three years they committed to.

There are two types of AWS Savings Plans:

  • EC2 Instance Saving Plans: like Standard RIs

  • Compute Savings Plans: shares attributes with Convertible RIs with the added bonus that discounts can be applied to the Fargate container service.

These two types provide the choice between maximising financial benefit and sacrificing flexibility or maximising flexibility while benefiting from a smaller discount.

Here’s a quick comparison of the two types:

Read more »

We can simply send custom metric to Datadog without any of the DogStatsD client libraries)

Generally DogStatsD creates a message that contains information about your metric, event, or service check and sends it to a locally installed Agent as a collector. The destination IP address is 127.0.0.1 and the collector port over UDP is 8125.

Based on the official doc of Datadog, here’s the raw datagram format for metrics, events, and service checks that DogStatsD accepts:

1
<METRIC_NAME>:<VALUE>|<TYPE>|@<SAMPLE_RATE>|#<TAG_KEY_1>:<TAG_VALUE_1>,<TAG_2>

So could use nc and socat on indivial host:

e.g.

1
2
3
# use nc

echo -n "kafka.partition_size:123|g|#topic_name:test,partition_number:1,broker_id:28041,hostname:kafka-12345" | nc -w 1 -cu localhost 8125
Read more »

Slide uploaded on 01/Jul/2020

Updated on 01/May/2020

There are other two areas where it is possible to reduce the AWS cost:

  1. Considering migrate from Classic Load Balancer to Application Load Balancer (technical debt):

  2. Using AWS Savings Plans: please refer to my article about AWS Savings Plans Overview


Rather than write a big, manual-style cost optimization guide, I’d like to share a few pits I’ve encountered during the process.

Tools

Common tools for AWS cost optimization are as follows:

  1. AWS Cost Explorer
  2. Cost Reports in S3
  3. AWS Trusted Advisor - Cost Optimization

Strategies

Based on AWS Cost Optimization Best Practice, the main measures are probably the following aspects:

  1. Right Sizing: Use a more appropriate (convenient) Instance Type / Family (for EC2 / RDS)
  2. Price models: leverage Reserved Instances (RI) and Spot Instances (SI)
  3. Delete / Stop unused resources: e.g. EBS Volume / Snapshot, EC2 / RDS / EIP / ELB, etc.
  4. Storage Tier / Backup Policy: Move cold data to cheaper storage tiers like Glacier; Review EBS Snapshot / RDS backup policy
  5. Right Tagging: Enforce allocation tagging, while improving Tag coverage and accuracy
  6. Scheduling On / Off times: Review existing Auto Scaling policies; Stop instances used in Dev and Prod when not in use and start them again when needed
Read more »

ElasticSearch can be a beast to manage. Knowing the most used endpoints during outages or simple maintenance by heart can be as challenging as it is time consuming. Because of this, this How-To article will layout some handy commands. In theory you could run them from any given node (data, client or master), however I’d recommend running them from a master node.

ssh into any master node (pro-tip: master instances are the ones within the master autoscaling group)

e.g.

1
ssh elastic-master-<instance-id>.<aws-region>.x.y.com

Handy aliases

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
alias tail-es='tail -500f /var/log/elasticsearch/<ES_NAME>/<ES_Name>.log'
alias cat-nodes='curl -s -XGET http://localhost:9200/_cat/nodes?v'
alias cat-shards='curl -s -XGET http://localhost:9200/_cat/shards?h=index,shard,prirep,state,unassigned.reason'
alias cat-allocation='curl -s -XGET http://localhost:9200/_cat/allocation?v'
alias cat-indices='curl -s -XGET http://localhost:9200/_cat/indices?v'
alias get-templates='curl -s -XGET http://localhost:9200/_template?pretty'
alias cat-settings='curl -s -XGET http://localhost:9200/_cluster/settings/?pretty'
alias cat-settings-all='curl -s -XGET '\''http://localhost:9200/_cluster/settings?include_defaults=true&pretty'\'' | jq .'
alias cluster-status='curl -s -XGET http://localhost:9200/_cluster/health?pretty'
alias check-status='while true; do sleep 5; cluster-status | grep status; done'
alias disable-allocation='curl -XPUT localhost:9200/_cluster/settings -d '\''{"transient" : \{"cluster.routing.allocation.enable" : "none"}}'\'''
alias enable-allocation='curl -XPUT localhost:9200/_cluster/settings -d '\''{"transient" : \{"cluster.routing.allocation.enable" : "all"}}'\'''
alias cat-shards-unassigned='curl -s -XGET http://localhost:9200/_cat/shards | grep UNASSIGNED | awk '\''{print $1}'\'''
alias del-shards-unassigned='curl -s -XGET http://localhost:9200/_cat/shards | grep UNASSIGNED | awk '\''{print $1}'\'' | xargs -i curl -s -XDELETE "http://localhost:9200/{}"'
alias cat-snapshot='curl -s -XGET localhost:9200/_snapshot/<S3_BUCKET_FOR_SNAPSHOT_REPOSITORY>/_all?pretty | jq .snapshots[-1]'
Read more »

Introduction

This blog describes performance testing results of the continuous delivery (CD) workflow. Testing was performed to answer two questions:

  1. Which Docker storage engine is optimal for the continuous delivery workflow?
  2. Which EC2 instance types perform best in terms of price and performance?

Test Conditions

All testing was performed with the following conditions:

  • Ubuntu 15.10
  • Docker 1.10.3
  • Docker Compose 1.6.2
  • All required Docker images cached in a local registry mirror to minimise network variations
  • All required Python wheels cached in a local Devpi mirror to minimise network variations
  • Empty docker data volume (/var/lib/docker) as starting state for each test

Testing was performed for the internal web application. The test stage that was executed includes the following tasks:

  • Pull base image
  • Build development image - this includes installing OS packages, building Python wheels and installing Python wheels
  • Run unit and integration tests
Read more »

Document important settings we shouldn’t be missing in elasticsearch clusters.

Overview

Cluster settings for ES

Could use Kibana Dev Tool to update cluster settings

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
{
"persistent" : {
"cluster" : {
"routing" : {
"allocation" : {
"cluster_concurrent_rebalance" : "12",
"node_concurrent_recoveries" : "6",
"disk" : {
"watermark" : {
"high" : "85%"
}
},
"enable" : "all"
}
}
},
"indices" : {
"recovery" : {
"max_bytes_per_sec" : "640mb"
}
}
},
"transient" : {
"cluster" : {
"routing" : {
"allocation" : {
"cluster_concurrent_rebalance" : "12",
"include" : {
"_name" : "elastic-data-*",
"_ip" : ""
},
"node_concurrent_recoveries" : "6",
"balance" : {
"threshold" : "1.0f"
},
"enable" : "all"
}
}
},
"indices" : {
"recovery" : {
"max_bytes_per_sec" : "640mb"
}
}
}
}
Read more »

Firstly we need to create a RDS database account (user) within the database and associate it to 1-N IAM authentication. Below are just policies samples that the module will create behind the scenes. An example of this can be found here.

sample policy granting access to a database instance

1
2
3
4
5
6
7
8
9
10
11
12
13
14
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"rds-db:connect"
],
"Resource": [
"arn:aws:rds-db:us-west-2:123456789012:dbuser:db-12ABC34DEFG5HIJ6KLMNOP78QR/david_lu"
]
}
]
}

sample policy granting access to a cluster

1
2
3
4
5
6
7
8
9
10
11
12
13
14
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"rds-db:connect"
],
"Resource": [
"arn:aws:rds-db:us-west-2:123456789012:dbuser:cluster-CO4FHMOYDKJ7CVBEJS2UWDQX7I/david_lu"
]
}
]
}
Read more »