Bernie Ops

How to Rescue Your Kubernetes Cluster with etcd Backups

Bernie Camejo — Fri, 25 Apr 2025 10:12:02 GMT

How to perform a backup of the etcd datastore

To back up the cluster store, or etcd, we can create a snapshot file using the CLI tool etcdctl. This lab assumes a successful installation of the etcdctl tool and that prior knowledge of what etcd is and its purpose exists.

# First perform the backup with snapshot option 
$ etcdctl snapshot save etcd-backup --cacert /etc/kubernetes/pki/etcd/ca.crt --cert /etc/kubernetes/pki/etcd/server.crt --key /etc/kubernetes/pki/etcd/server.crt

# ... Output omitted
Snapshot saved at etcd-backup

--cacert verifies certificates using the k8s Certificate Authority (CA) --cert identifies secure client using the etcd server certificate --key identifies secure client using the etcd key file

Restoring the etcd backup

To restore the backup we use again the etcdctl CLI tool and the snapshot command. What's key in this task is that the backup will be restored to an etcd directory. That's why we use the --data-dir option with the command.

# Second perform the restore operation
# The command here will restore the backup to the /var/lib/etcd-backup directory
$ etcdctl snapshot restore etcd-backup --data-dir /var/lib/etcd-backup

# ... Output omitted

Change the location of the etcd data

Once the backup and restore operations are completed, the next step is to change the location where Kubernetes looks for the etcd data.

To do this, we need to change the YAML file for the etcd.yaml manifest which is located in /etc/kubernetes/manifests/. Why in this directory? Because any YAML placed in this directory will be scheduled by the kube-scheduler process.

The part of the file that needs to be changed is at the bottom.

volumes:
- hostPath:
    path: /etc/kubernetes/pki/etcd
    type: DirectoryOrCreate
  name: etcd-certs
- hostPath:
    path: /var/lib/etcd-backup # <--- This is the directory where we stored the snapshot
    type: DirectoryOrCreate
  name: etcd-data

With that done, Kubernetes will perform the necessary actions and receive an API response from the server with the new cluster data.

ReplicaSets in Kubernetes: Core Building Blocks for Application Scaling

Bernie Camejo — Fri, 18 Apr 2025 03:06:57 GMT

ReplicaSets are Kubernetes objects, like everything else in k8s. Their primary function is to ensure that a stable number of replica pods are always running the cluster in accordance with the spec in the YAML file. For this reason, they are instrumental in providing high availability for your application. Typically, you don't work directly with ReplicaSets but rather with Deployments which are a higher level method of ensuring the current state matches the desired state of the cluster.

Obtaining information

The most basic method to list the number of ReplicaSets is using kubectl.

$ kubectl get rs
NAME              DESIRED   CURRENT   READY   AGE
new-replica-set   4         4         0       5m12s

Here we can see that there are 4 desired pods in the ReplicaSet spec, and there are 4 pods running. Even if we deleted one of the pods with kubectl delete pod , the ReplicaSet will ensure another pod is running to match the number of replicas, i.e., 4.

Creating a ReplicaSet

Like most objects in k8s, the easiest way to create a ReplicaSet is using a YAML file.

apiVersion: apps/v1
kind: ReplicaSet
metadata:
  name: replica-set-1
spec:
  replicas: 3
  selector:
    matchLabels:
      tier: frontend
  template:
    metadata:
      labels:
        tier: frontend
    spec:
      containers:
      - name: nginx
        image: nginx

Once the YAML has been defined, it's very simple to create the ReplicaSet.

$ kubectl apply -f replica-set.yaml`
replicaset.apps/replicaset-1 created

Scaling a ReplicaSet

They can be scaled up or down to fit your operational demands. The ReplicaSet controller will determined which pod(s) to delete, in case of a scale down event, or where to run a new pod in the case of a scale up event.

$ kubectl scale rs replica-set --replicas=5

The flag --replicas=5 will now scale the ReplicaSet to 5 pods. This could mean you're scaling down or up to 5 pods. The command is the same, which is useful and efficient.

Kubernetes Administration 101: Basic Cluster Tasks Every Admin Should Know

Bernie Camejo — Wed, 16 Apr 2025 11:30:03 GMT

Examining pods

Using kubectl we can examine the pods running in the cluster. The simplest way is to run this command.

# Simple command to examine the pods
$ kubectl get po

# To get information about the pods running in a specific namespace and showing their IP addresses
# in this example the kube-system namespace
$ kubectl get po -n kube-system -o wide

By using the -o wide option we can get additional information, including a column specifying the IP addresses.

Performing a backup of etcd

The datastore etcd, also known as the cluster store, has all the stateful configuration information about our cluster. One common operation is to back up the etcd datastore using a CLI utility called etcdctl.

Pre-requisite

Have etcdctl installed

$ sudo apt update; apt install -y etcd-client
# Following that we can perform a snapshot backup of the etcd datastore
$ etcdctl snapshot save snapshotdb --cacert /etc/kubernetes/pki/etcd/ca.crt --cert /etc/kubernetes/pki/etcd/server.crt --key /etc/kubernetes/pki/server.key
# ... output omitted
Snapshot saved at snapshotdb

The above command creates a snapshot in the current directory called snapshotdb. To check the status of the snapshot you can use $ etcdctl snapshot status snapshotdb --write-out=table. This will generate a table like below

+----------+----------+------------+------------+
|   HASH   | REVISION | TOTAL KEYS | TOTAL SIZE |
+----------+----------+------------+------------+
| e4f4157c |   102366 |        817 |     2.2 MB |
+----------+----------+------------+------------+

Restoring etcd from snapshot

If needed, we can use etcdctl to also restore the backed up version of the datastore. In order to perform the restore operation, we need to specify a directory already accessed by etcd. By checking the etcd.yaml file in the /etc/kubernetes/manifests directory, we can change that directory to something like /var/lib/etcd-restore.

$ etcdctl snapshot restore  --data-dir /var/lib/etcd-restore

Upgrading Kubernetes version

The last task we will perform in this lab is upgrading the version of kubeadm. For example, if in the exam there's a question about a company needing to upgrade the Kubernetes controller to version X, then we know that kubeadm is the tool to use. The first thing to do is to view the version of our control plane components.

# This will show a list of control plane components and their current version
# as well as the version to which we can upgrade
$ kubeadm upgrade plan

COMPONENT                 NODE                      CURRENT    TARGET
kube-apiserver            acing-cka-control-plane   v1.32.2    v1.32.3
kube-controller-manager   acing-cka-control-plane   v1.32.2    v1.32.3
kube-scheduler            acing-cka-control-plane   v1.32.2    v1.32.3
kube-proxy                                          1.32.2     v1.32.3
CoreDNS                                             v1.11.3    v1.11.3
etcd                      acing-cka-control-plane   3.5.16-0   3.5.16-0

After running the command above you will get the command needed to upgrade, as output. $ kubeadm upgrade apply v1.32.3

How to create Kubernetes YAML files the smart way with kubectl

Bernie Camejo — Mon, 14 Apr 2025 10:01:20 GMT

To save time, and ensure the syntax and formatting are correct, a more efficient way to create a YAML file with the specs for a pod is to use kubectl and the --dry-run flag.

$ kubectl run pod --image nginx --dry-run=client -o yaml > nginx-pod.yaml

This will let kubectl write the YAML file. The main benefits of doing it this way are:

You can modify the YAML template before having to deploy it to the Kubernetes cluster.
You can modify the template to add volumes, env variables and other configurations.
The formatting of the YAML file will be correct and you reduce the chances of syntax error.

The -o yaml flag tells kubectl to output the definition in YAML format.

Building a Kubernetes Cluster from Scratch: Part 2 - Installing Container Runtime and Kubernetes Components

Bernie Camejo — Thu, 10 Apr 2025 11:07:55 GMT

This is the second part of my Kubernetes installation series using kubeadm. In Part 1, I covered the environment preparation, including hardware requirements and network configuration, as well as required Linux kernel modules.

In this Part 2, I'll continue by installing and configuring containerd as my container runtime and the installation of the essential Kubernetes componets like kubeadm, kubelet, and kubectl.

Installing the container runtime

Kubernetes is a container orchestration system, and as such it needs a container runtime responsible for running the containers in the cluster. The container runtime pulls the images, starts/stops the containers and report container status back to Kubernetes.

Before installing the container runtime, we need to download the GPG keys to ensure the packages haven't been tampered with, and apt will need this information later as well.

root@cp:~# curl -fsSL https://download.docker.com/linux/ubuntu/gpg \
| gpg --dearmor -o /etc/apt/keyrings/docker.gpg
root@cp:~#
root@cp:~# echo \
"deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] \
https://download.docker.com/linux/ubuntu \
$(lsb_release -cs) stable" | tee /etc/apt/sources.list.d/docker.list > /dev/null

Now we install containerd

root@cp:~# apt-get update && apt-get install containerd.io -y
root@cp:~# containerd config default | tee /etc/containerd/config.toml
root@cp:~# systemctl restart containerd

Installing the Kubernetes software

First, like with containerd, we need to download the public key for the package repositories.

root@cp:~# curl -fsSL https://pkgs.k8s.io/core:/stable:/v1.31/deb/Release.key \
| sudo gpg --dearmor -o /etc/apt/keyrings/kubernetes-apt-keyring.gpg

# Add the appropriate k8s repository
root@cp:~# echo "deb [signed-by=/etc/apt/keyrings/kubernetes-apt-keyring.gpg] \
https://pkgs.k8s.io/core:/stable:/v1.31/deb/ /" \
| tee /etc/apt/sources.list.d/kubernetes.list

Finally, update the repo and install the Kubernetes software

root@cp:~# apt-get update
root@cp:~# apt-get install -y kubeadm=1.31.1-1.1 kubelet=1.31.1-1.1 kubectl=1.31.1-1.1

In the next part of this series, I will set the local DNS alias for my cp Kubernetes server and assign it with the name k8scp. Then I will create a configuration file for the cluster.

Building a Kubernetes Cluster from Scratch - Part 1: Environment Setup

Bernie Camejo — Tue, 08 Apr 2025 11:36:10 GMT

There are multiple tools to install Kubernetes. A community supported tool is kubeadm and this lab uses it to install and build a Kubernetes cluster.

Connecting to future control plane and worker nodes

I'm using AWS to launch two instances that will be my Kubernetes cluster. Both instances are Ubuntu 24.04 with 2vCPUs and 8GB of RAM. Using my AWS key pair, the next step is to connect to both instances using ssh.

Installing Kubernetes

The first step is to connect to the first EC2 instance to update and upgrade the control plane or cp.

$ ssh -i /path/to/my-key.pem ubuntu@instance-ip
$ sudo -i
root@cp:~# apt update && apt upgrade -y

With the system updated, the next step is to install a container runtime. A commonly used option, easy to deploy and lightweight is containerd. Also, Kubernetes has deprecated Docker engine as of 2020, and containerd offers simplicity and focus as it doesn't include the additional stuff that Docker includes. This is purely a container runtime which supports the Container Runtime Interface or CRI as expected by Kubernetes.

We need to install some dependencies.

root@cp:~# apt install apt-transport-https software-properties-common ca-certificates socat -y

Next, two kernel modules need to be loaded due to Kubernetes infrastructure requirements. They enable core container and networking functionality that Kubernetes needs. The overlay module relates to storage management and the br_netfilter module relates to networking and packet routing.

root@cp:˜# modprobe overlay
root@cp:˜# modprobe br_netfilter

Finally, we need to tweak the network routing and policies as Kubernetes relies on iptables rules. Without creating the rules below, network packets may bypass iptables rules and inter-pod communication would fail. The below command creates a configuration file kubernetes.conf at /etc/sysctl.d and it's loaded when the system starts. The two lines below, starting with net.bridge… ensure that IPv4 and IPv6 packets follow iptables rules. Finally, ip_forward enables IP forwarding in the Linux kernel.

root@cp:˜# cat << EOF | tee /etc/sysctl.d/kubernetes.conf
> net.bridge.bridge-nf-call-ip6tables = 1
> net.bridge.bridge-nf-call-iptables = 1
> net.ipv4.ip_forward = 1
EOF

Once those rules have been added, we ensure the changes are loaded by the current kernel.

root@cp:~# sysctl --system

That's part 1 on this series. The next part will deal with the installation of the containerd runtime and the Kubernetes software itself.

Beyond Docker: Setting Up and Managing Linux Containers with LXC

Bernie Camejo — Sat, 05 Apr 2025 11:13:33 GMT

Linux Containers, or LXC, is an interface for Linux kernel vritualisation. You can create Linux containers that enable persistence, and system-level functionality. When comparing LXC to Docker, they both serve different use cases. Docker is more suitable for application containerisation whereas LXC is more relevant for system-level functionality and also persistent environments.

Installing LXC

$ sudo apt update
$ sudo apt install -y lxc

Since we need to create an unprivileged container (for security reasons), the user that will be attached to this container needs to have permissions to create network devices.

$ sudo bash -c 'echo  veth lxcbr0 10 >> /etc/lxc/lxc-usernet'
$ cat /etc/lxc/lxc-usernet

# USERNAME TYPE BRIDGE COUNT
 veth lxcbr0 10

Setting up the config file

The configuration file for lxc may not exist. So create the directory inside the .config directory and copy the default.conf located in /etc/lxc/.

$ mkdir -p ~/.config/lxc
$ cp /etc/lxc/default.conf ~/.config/lxc/default.conf
$ chmod 664 ~/.config/lxc/default.conf

The configuration file has to be updated with the UID and GID of the unprivileged user. They can both be extracted from the /etc/subuid and /etc/subgid files.

$ cat /etc/subuid
ubuntu:100000:65536
:165536:65536

$ cat /etc/subgid
ubuntu:100000:65536
:165536:65536

$ echo lxc.idmap = u 0 165536 65536 >> ~/.config/lxc/default.conf
$ echo lxc.idmap = g 0 165536 65536 >> ~/.config/lxc/default.conf

$ cat ~/.config/lxc/default.conf
lxc.net.0.type = veth
lxc.net.0.link = lxcbr0
lxc.net.0.flags = up
lxc.net.0.hwaddr = 00:16:3e:xx:xx:xx
lxc.idmap = u 0 165536 65536
lxc.idmap = g 0 165536 65536

Setting access control list

To prevent possible permission errors, we need to set up an access control list on our .local directory.

$ sudo apt update
$ sudo apt install -y acl
$ setfacl -R -m u:165536:x ~/.local

This command sets an ACL (Access Control List) for a specific user (UID 165536), recursively on the ~/.local directory, giving that user execute (x) permission.

Creating an unprivileged container

Once the setup is complete, we can create a container using the download template. This gives us all available images designed to work without privileges.

$ lxc-create --template download --name unpriv-cont-user

Once the image index is downloaded, the CLI tool will display the images and await for the user to provide the distro, release and architecture required. In this case, ubuntu, jammy and amd64 will be used.

---

Distribution:
ubuntu
Release:
jammy
Architecture:
amd64

Downloading the image index
Downloading the rootfs
Downloading the metadata
The image cache is now ready
Unpacking the rootfs

---
You just created an Ubuntu jammy amd64 (20250404_20:34) container.

Starting the container

Now the container has been created, we can start it.

$ lxc-start -n unpriv-cont-user -d

With the container running we can interact with its environment.

$ lxc-attach -n unpriv-cont-user
# hostname
unpriv-cont-user
# exit
$ lxc-stop -n unpriv-cont-user

Getting Started with Linux cgroups: A Practical Guide

Bernie Camejo — Fri, 21 Mar 2025 10:40:30 GMT

What are `cgroups`?

They're a Linux kernel feature that allows the allocation, limiting and prioritisation of system resources across processes. They are a foundation for container technologies like Docker and Kubernetes.

The first thing is to install cgroup-tools. This package is a collection of command-line utilities for managing and interacting with Linux control groups or cgroups.

$ sudo apt update
$ sudo apt install -y cgroup-tools

To test things are working we can run a command like lscgroup which lists all the current cgroups configured on the system. The output is in the format

controler:path

Where the cgroup controller, or subsystem, is before the colon, e.g., cpu, memory, and the path in the cgroup hierarchy after the colon. We can also list cgroups by process ID or PID. For example:

$ sudo cat /proc/1/cgroup

This lists the cgroups associated with processes with PID 1, or the first process started at boot.

bernie@ubuntu:~$ sudo cat /proc/1/cgroup
13:memory:/init.scope
12:devices:/init.scope
11:freezer:/
10:cpuset:/
9:rdma:/
8:perf_event:/
7:pids:/init.scope
6:cpu,cpuacct:/init.scope
5:blkio:/init.scope
4:hugetlb:/
3:misc:/
2:net_cls,net_prio:/
1:name=systemd:/init.scope
0::/init.scope

Secure Cloud Computing: How to Deploy and Connect to a Google Cloud VM

Bernie Camejo — Sat, 15 Mar 2025 11:54:43 GMT

Pre-requisites

Have a Google Cloud (GCP) account
Some knowledge of SSH and how it works

Step 1: Create SSH keys

We will connect to a GCP virtual server using SSH to ensure secure communication between our local environment and the remote virtual server.

$ ssh-keygen -t ed25519 -C "containers"

This command will generate an SSH key pair. To learn more, visit this page.

Step 2: Create a new GCP project

On the console, create a new project to create the Google Compute Engine (GCE) virtual server, that we will connect to later.

Step 3: Configure a virtual network

We need to create a Google VPC network. From the console go to the VPC networks page, and click Create VPC network. Name the VPC something like "container-vpc", and select the Automatic subnet creation mode, which only supports IPv4. Don't worry about firewall rules (later step), and click Create.

Step 4: Create a firewall

Once the virtual network has been created, a firewall rule needs to be created and configured to allow inbound SSH traffic.
In our case, we can create a firewall rule named allow-inbound-ssh-traffic which allows traffic from anywhere using 0.0.0.0/0 as the IP range, and ssh as the allowed protocol.

Step 5: Start a new VM instance

When creating a VM on Google Cloud, choosing E2 instances is recommended for standard workloads like web servers, small-to-medium databases, development environments, and microservices that don't require specific hardware features.
In terms of settings, it's important to attach the network created in step 3 to the new instance. For this lab, we have created a Ubuntu LTS 20.04 server, and configured the SSH key created in step 1. Once the settings are finished, we start the instance.

Step 6: Test connectivity

We could connect to the new instance using Google's SSH-in-browser feature. However, this in-browser temrinal may not offer all the features we would use when remote managing a VM server.

$ ssh @
The authenticity of host 'xx.xx.xxx.xx' can't be
established.
ECDSA key fingerprint is
SHA256: 
Are you sure you want to continue connecting (yes/no/[fingerprint])?
$ yes

This last step should connect us to our remote cloud-hosted Ubuntu server.

user@ubuntu:~$

Mastering CloudFormation: Hands-on Detection and Remediation of Infrastructure Drift

Bernie Camejo — Sat, 08 Mar 2025 07:57:29 GMT

Overview

This is a demo of how to create a stack using AWS CloudFormation, detect drift in the stack, and perform a stack update. The demo task is creating an environment for a dev team, who asked for an Apache server with HTTP access. The stack consists of the following:

A dedicated VPC
Single public subnet
Amazon EC2 instance

Pre-requisites

Have AWS CLI installed
Have an AWS account

$ aws --version
$ aws-cli/2.17.18 Python/3.9.20 ...

Details

Step 1: Create the required parameters

Parameters are reusable inputs that allow flexibility when building IaC templates. They are popular to specify property values of stack resources.

InstanceType:
  Description: Webserver EC2 instance type
  Type: String
  Default: t2.nano
  AllowedValues:
    - t2.nano
    - t2.micro
    - t2.small
  ConstraintDescription: must be a valid EC2 instance type.

The ConstraintDescription is interesting because it provides the user with details when a constraint is violated. In this case, if a user tries to create an invalid EC2 instance type, then they get a useful error message.

Step 2: Adding resources to the CloudFormation template

Here the AWS resources that CloudFormation will provision are declared. Below is a demonstration of how to specify a route in a route table. The Type of this resource is AWS::EC2::Route.

Route:
  Type: 'AWS::EC2::Route'
  DependsOn:
    - VPC
    - AttachGateway
  Properties:
    RouteTableId: !Ref RouteTable
    DestinationCidrBlock: 0.0.0.0/0
    GatewayId: !Ref InternetGateway

Note how the Route resource refers to other resources with the !Ref function. In this case, both RouteTableand InternetGateway.

Step 3: Adding an output

Outputs are mainly used to capture important details about the resources in the stack, as they allow a convenient way to store the information in a separate file, or simply make later reference easier using the aws cli utility.

Outputs:
  AppURL:
    Description: New created application URL
    Value: !Sub 'http://${WebServerInstance.PublicIp}'

In this case, we are using !Sub to dynamically insert the public IP of the EC2 instance, and get a full HTTP URL when the stack is provisioned. In other words, using the !Sub function will allow us to retrieve the URL to access the web server.

Step 4: Create the stack

Once the YAML file is finished, it's simple to create the stack.

$ aws cloudformation create-stack --stack-name Demo_Web_Server --parameters ParameterKey=InstanceType,ParameterValue=t2.micro --template-body file://cf_stack.yaml
$ {
    "StackId": "arn:aws:cloudformation:....."
}

It might be useful to query the status of the stack process with the following command.

$ aws cloudformation describe-stacks --stack-name Demo_Web_Server --query "Stacks[0].StackStatus"
$ "CREATE_COMPLETE" # This is the desired output

Stack creation complete

Using outputs during stack creation

Step 5: Testing drift detection in a CloudFormation stack

CloudFormation is powerful because it can be used to detect stack changes not initiated within CloudFormation. In other words, if someone were to make changes to the stack using the AWS Console, CloudFormation can be used to detect and rectify those changes. First, it's necessary to run the 'Detect Drift' stack action, and once that's run, select 'View drift results'.

Using Stack actions to detect drift

Drift detection report

You can also detect drift via the CLI.

$ aws cloudformation describe-stack-resource-drifts ...

Step 6: Rectify the stack using a change set

Once drift has been detected, the resource can be modified to the expected value in the CloudFormation template. To do this, we can use a Change Set, and then implement the changes to the environment.

$ aws cloudformation create-change-set --stack-name Demo_WebServer --change-set-name Demo_Change_set --parameters ParameterKey=InstanceType,ParameterValue=t2.micro, --template-body file://cf_stack-CS.yaml

Once the Change Set has been created, it will appear under the original stack. The next step would be to Execute change set and if the rollback is successful the Change Set is no longer available.

When Ansible Can't See Your EC2 Instances: Resolving AWS Dynamic Inventory Issues

Bernie Camejo — Wed, 26 Feb 2025 19:39:51 GMT

AWS Dynamic Inventory is best practice when using Ansible to configure AWS infrastructure. It allows you to perform automatic discovery of EC2 instances because it'd be difficult to maintain a static inventory of hosts as AWS changes IPs dynamically. It's in fact recommended by Ansible when working with cloud providers.

The workflow for working with Dynamic Inventory is you first provision your infrastructure using IaC (e.g., Terraform). Then you configure the AWS Dynamic Inventory plugin aws_ec2 within the Ansible ansible.cfg file. I ran into problems when testing connectivity between my control node and my AWS infrastructure using Ansible. Here's how I worked through that problem.

# Listing all hosts Ansible can see
$ ansible-inventory --list

# Verifying I have connectivity
$ ansible all -m ping

When running the list flag for my inventory, I was getting an empty dictionary hostvars: {} so right away I knew there was an issue. And this was confirmed when I ran the ping command, because I received an error that only localhost was available. Using AWS CLI to further investigate the problem, I confirmed that my EC2 instances did exist and had the right tags.

aws ec2 describe-instances --filters "Name=tag:Environment,Values=development"

I then investigated my Terraform main.tf file and found the problem. The tag I was using in my Terraform file was Name = DevOpsInstance but the tag I was using in my Ansible aws_ec2.yml dynamic inventory file was Environment = Development. This is what caused the issue, a single tag that had mismatching names and values.

Once the two files contained the same information, I ran the ansible CLI commands again and was able to list my AWS infrastructure. Attention to detail is crucial when working across different tools, and managing large code bases.

Hands-on Infrastructure as Code: AWS Deployment with Terraform

Bernie Camejo — Mon, 24 Feb 2025 10:55:12 GMT

I've been building a DevOps automation project on my GitHub page to showcase how Terraform, Ansible and Docker can be used together to quickly deploy, and automate the configuration of assets using Infrastructure as Code (IaC).

Set up local environment variables

For this project, I'm building on AWS. In order to use Terraform, I need to set up environment variables to store my AWS Access Key IDand then my AWS Secret Access Key. It's super important not to share these, or store them in a script that later gets pushed to GitHub.

$ export AWS_ACCESS_KEY_ID=
$ export AWS_SECRET_ACCESS_KEY=

Main configuration file

Once the local variables are set, two files will be required to create an EC2 instance on AWS through IaC. First, the main.tf file.

terraform {
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 4.16"
    }
  }

  required_version = ">= 1.2.0"
}

provider "aws" {
  region = var.aws_region
}

resource "aws_instance" "app_server" {
  ami           = var.ami_id
  instance_type = var.instance_type

  tags = {
    Name = "DevOpsDemoInstance"
  }
}

Next, it's time to define a variables.tf file that will store the variables used by main.tf during execution. For simple deployments, like this one, it would be enough to declare the variables in the main.tf file. However, for more complex projects, or when better organisation is needed, or the configuration will be reused, it's best practice to use a variables.tf file. For that reason, that's the approach used here since this is a learning exercise.

Variables file

variable "aws_region" {
  description = "AWS region"
  type        = string
  default     = "ap-southeast-2"
}

variable "instance_type" {
  description = "EC2 instance type"
  type        = string
  default     = "t2.micro"
}

variable "ami_id" {
  description = "The AMI ID for the EC2 instance"
  type        = string
  default     = "ami-0b0a3a2350a9877be"
}

Finally, the next thing is to initialise the directory using terraform init, and then create the infrastructure using terraform applyand reviewing what will be built. Once satisfied, select yes and your AWS infrastructure will be provisioned. Once the proof-of-concept is finished, remember to use terraform destroy to tear down your infrastructure.

Stop Wrestling with UNIX Timestamps: A Clean Pandas Solution

Bernie Camejo — Sun, 23 Feb 2025 10:58:58 GMT

During the initial stages of building my International Space Station (ISS) data engineering project, I quickly realised a format issue with the timestamps broadcast by the ISS Open API. The timestamps used by the API were set in the UNIX format, which aren't terribly user friendly for consumers of the data.

My goal was to ensure that my dataframe's timestamps are correctly converted from UNIX to ISO 8601 during ingestion. UNIX timestamps represent how many seconds have passed since midnight 1 January 1970.

An example UNIX timestamp looks like this:

1707568200 → Converts to 2024-02-10T12:30:00 UTC

While interesting from a historical point of view, I didn't want my ETL application to keep timestamps in this format. So after doing research, the pandas library has a useful function called to_datetime(series). This function can then be combined with another pandas function called dt.strftime to finally convert the timestamp to the ISO 8601 format.

def convert_unix_to_iso(series: pd.Series) -> pd.Series:
    """
    Convert Unix timestamps to ISO 8601 formatted datetime strings.

    Args:
        series (pd.Series): Series containing Unix timestamps

    Returns:
        pd.Series: Series with timestamps in ISO format (YYYY-MM-DDTHH:MM:SS)
    """
    return pd.to_datetime(series, unit='s').dt.strftime('%Y-%m-%dT%H:%M:%S')

The above code takes a series right after my ISS ETL script ingests data, and immediately converts the relevant column of timestamps into ISO 8601 format, which is far more useful for end users.

Dockerising an ISS Location Tracker: Lessons from Local Development

Bernie Camejo — Fri, 21 Feb 2025 10:45:57 GMT

I've been building a data engineering project using Python, Docker, and AWS to ingest data from the International Space Station location API. In a previous post, I wrote about troubleshooting the connection to an Amazon RDS Postgres database over the internet.

Once the connection could be established over the public internet, I wanted to first containerise the Python processor, and then test on my local dev system to ensure the data was being written correctly to the Amazon RDS Postgres database.

Dockerfile

FROM python:3.13.2

RUN pip install pandas sqlalchemy requests psycopg2

WORKDIR /app
COPY main.py main.py

ENTRYPOINT ["python3", "main.py"]

Unfortunately, while I was able to build a Docker image using the above Dockerfile, and successfully run the container on my local environment, in the end this didn't work when pushing the container to Amazon ECR. I will get into how I fixed this problem in the next post.

Troubleshooting a connection to an Amazon RDS Postgres database over the internet

Bernie Camejo — Wed, 19 Feb 2025 10:50:24 GMT

I'm building a data engineering project, consisting of an ETL pipeline that serves data to a visualisation application (written in Python). One of the first steps to set up the project is to create the database that will hold the data returned from the API call. For this project, I decided to use an Amazon RDS database running a Postgres implementation as it's a widely implemented open-source database technology.

BLUF

Understanding security groups: Configure inbound rules to enable public access to an RDS Postgres database
Setting up database connections: Use psql CLI for rapid database testing and proof-of-concept

Challenges

During the initial setup and testing, I couldn't access my Amazon RDS instance over the public internet. This had to be fixed as my application was accessing a public API to obtain the current location of the International Space Station (ISS). I followed these steps to logically eliminate the potential cause of the problem.

Look into the security group attached to my RDS database, and inspect the inbound rules. Since the initial connection to the RDS instance was failing, this was the first step to identify the problem. I ensured the inbound rules on the security group were set so that TCP traffic on port 5432 (default port for Postgres) was allowed.

Ensure the RDS is publicly accessible. This was another setting I had to review, as the point is to access the RDS instance over the public internet. So, the RDS database must be publicly accessible.
Finally, I needed to make sure the VPC had an internet gateway attached.

Success

Once those three troubleshooting steps were done, I was able to establish a connection to my Amazon RDS Postgres database using psql.

Route 53 Routing Policies (2/2)

Bernie Camejo — Tue, 18 Feb 2025 02:00:12 GMT

In part 1, I explored the first four routing policies used by Route 53 when responding to DNS queries. These are the rest of the policies:

Geolocation routing: based on the geographic location of the user, Route 53 will respond accordingly. For example, this is useful for compliance or content distribution restrictions.
Geoproximity routing: not to be confused with geolocation, in this case a bias weight is established to AWS resources allocated in specific regions. In other words, this policy takes into consideration the location of the user and the location of the AWS resource being queried for.
IP-based routing: CIDR blocks are allocated to different AWS resources, and depending on the user's IP address, different DNS responses will be provided.
Multi-value routing: based on health checks, Route 53 is able to return multiple records for a single resource. For example, a website with three different IP addresses.

How to reduce friction when pushing code to Git

Bernie Camejo — Mon, 17 Feb 2025 02:00:11 GMT

I was getting tired of doing git add ., git commit -m "Message", and git push. So I created a shell function in my .zshrc to do a single command that executes the three steps.

function gpush() {
    local msg="${1:-Auto commit}"
    git add .
    git commit -m "$msg"
    git push
}

Now, I don't have to repeat the same three steps, but rather call the gpush function.

gpush "Commit message"

And done.

Route 53 Routing Policies (1/2)

Bernie Camejo — Sat, 15 Feb 2025 22:48:37 GMT

Route 53 includes routing policies that determine how it responds to client DNS queries. These policies enable you to configure different ways to route traffic based on conditions or events. There are 8 routing policies that you need to know for the Solutions Architect exam, and this post only covers the first four.

Simple routing: directs traffic to a single resource, like a web browser. If there are multiple IP addresses associated with that resource, Route 53 will return all of the IP addresses, but the client will pick one at random.
Weighted routing: different weights are assigned to individual resources, and Route 53 splits traffic based on the assigned proportions. For example, one EC2 instance gets 70% of the traffic, another 20% and the last one 10%.
Latency-based routing: this is similar to a geographic policy, but it's actually based on how Route 53 calculates latency between users and AWS resources like an Elastic Load Balancer. Usually, clients will be routed to networks with the lowest latency.
Failover routing: health checks must be created, and if they fail then Route 53 will stop sending traffic to the secondary/backup instances if the primary fails those checks.

Fixing incorrect KMS Key Policy when deploying Lambda with an IAM Role

Bernie Camejo — Fri, 14 Feb 2025 23:31:55 GMT

Often you'll want to create environment variables to pass as CLI arguments to a Lambda function. Lambda as a service does not support passing CLI arguments using things like argparse, but rather you have to configure individual environment variables within the AWS console.

Lambda was unable to configure access to your environment variables because the KMS key is invalid for CreateGrant. Please check your KMS key settings. KMS Exception: InvalidArnException KMS Message: ARN does not refer to a valid principal

The above error message is clear in that the IAM Role your Lambda function is using does not have the correct policy, nor existing KMS keys refer to a valid principal. After a lot of troubleshooting, and careful reading of the AWS documentation, I identified that the problem could be by adding an additional statement to the relevant KMS Key Policy.

The solution:

Identify the specific IAM Role that your Lambda function is using to encrypt environment variables using KMS.
Edit the KMS Key Policy to include the IAM Role as an AWS Principal.
Add the following statement to your Key Policy.

      "Principal": {
        "AWS": "arn:aws:iam::471112566722:role/service-role/iss_locations_lambda-role-grcyflva"
      },
      "Action": [
        "kms:Encrypt",
        "kms:Decrypt",
        "kms:ReEncrypt*",
        "kms:GenerateDataKey*",
        "kms:DescribeKey"
      ]

And just like that, my Lambda function was able to use a customer-manager KMS key, customised with the correct key policy, to create and encrypt the required environment variables for the application.

Using environment variables when deploying Lambda containers

Bernie Camejo — Fri, 14 Feb 2025 11:32:39 GMT

While building a simple application that pulls API location data for the International Space Station, I was running the ingestion and transformation script locally on my system, passing the necessary parameters to the CLI. Below is the code snippet of how I was going about this.

def main(params):
    user = params.u
    password = params.p
    host = params.host
    ...


if __name__ == "__main__":
    # Create the CLI parser
    parser = argparse.ArgumentParser(description="Call ISS API and store current position in Amazon RDS")

    # Create two CLI arguments to ask for username and password
    parser.add_argument("-u", help="username for Postgres")
    parser.add_argument("-p", help="password for Postgres")
    parser.add_argument("--host", help="hostname for RDS Postgres server")

    args = parser.parse_args()
    main()

Realising this wasn't probably going to work when deploying the container using a Lambda function, simply because Lambda does not support passing CLI arguments directly to the container. My research pointed me to a better implementation using environment variables to pass the -u, -p, and --host parameters to the container.

def main():
    # Read database credentials from environment variables
    user = os.getenv("POSTGRES_USER")
    password = os.getenv("POSTGRES_PASSWORD")
    host = os.getenv("POSTGRES_HOST")

    ...

    if __name__ == "__main__":
    main()

The result: a more elegant solution that works with a Lambda function calling the container stored in ECR.

Bernie Ops

How to Rescue Your Kubernetes Cluster with etcd Backups

How to perform a backup of the etcd datastore

Restoring the etcd backup

Change the location of the etcd data

ReplicaSets in Kubernetes: Core Building Blocks for Application Scaling

Obtaining information

Creating a ReplicaSet

Scaling a ReplicaSet

Kubernetes Administration 101: Basic Cluster Tasks Every Admin Should Know

Examining pods

Performing a backup of etcd

Pre-requisite

Restoring etcd from snapshot

Upgrading Kubernetes version

How to create Kubernetes YAML files the smart way with kubectl

Building a Kubernetes Cluster from Scratch: Part 2 - Installing Container Runtime and Kubernetes Components

Installing the container runtime

Installing the Kubernetes software

Building a Kubernetes Cluster from Scratch - Part 1: Environment Setup

Connecting to future control plane and worker nodes

Installing Kubernetes

Beyond Docker: Setting Up and Managing Linux Containers with LXC

Installing LXC

Setting up the config file

Setting access control list

Creating an unprivileged container

Starting the container

Getting Started with Linux cgroups: A Practical Guide

What are cgroups?

Secure Cloud Computing: How to Deploy and Connect to a Google Cloud VM

Pre-requisites

Step 1: Create SSH keys

Step 2: Create a new GCP project

Step 3: Configure a virtual network

Step 4: Create a firewall

Step 5: Start a new VM instance

Step 6: Test connectivity

Mastering CloudFormation: Hands-on Detection and Remediation of Infrastructure Drift

Overview

Pre-requisites

Details

Step 1: Create the required parameters

Step 2: Adding resources to the CloudFormation template

Step 3: Adding an output

Step 4: Create the stack

Stack creation complete

Using outputs during stack creation

Step 5: Testing drift detection in a CloudFormation stack

Using Stack actions to detect drift

Drift detection report

Step 6: Rectify the stack using a change set

When Ansible Can't See Your EC2 Instances: Resolving AWS Dynamic Inventory Issues

Hands-on Infrastructure as Code: AWS Deployment with Terraform

Set up local environment variables

Main configuration file

Variables file

Stop Wrestling with UNIX Timestamps: A Clean Pandas Solution

Dockerising an ISS Location Tracker: Lessons from Local Development

Dockerfile

Troubleshooting a connection to an Amazon RDS Postgres database over the internet

BLUF

Challenges

Success

Route 53 Routing Policies (2/2)

How to reduce friction when pushing code to Git

Route 53 Routing Policies (1/2)

Fixing incorrect KMS Key Policy when deploying Lambda with an IAM Role

Using environment variables when deploying Lambda containers

What are `cgroups`?