06. How to configure HI GIO Kunernetes cluster autoscale

Overview

Step-by-step guide on how to configure HI GIO Kubernetes cluster autoscale

Install tanzu-cli
Create cluster-autoscaler deployment from tanzu package using tanzu-cli
Enable cluster autoscale for your cluster
Test cluster autoscale
Delete cluster-autoscaler deployment and clean up test resource

Procedure

Pre-requisites:

Ubuntu bastion can connect to your Kubernetes cluster
Permission for access to your Kubernetes cluster

Step 1: Install tanzu-cli

#Install tanzu-cli to ubuntu
sudo apt update
sudo apt install -y ca-certificates curl gpg
sudo mkdir -p /etc/apt/keyrings
curl -fsSL https://storage.googleapis.com/tanzu-cli-installer-packages/keys/TANZU-PACKAGING-GPG-RSA-KEY.gpg | sudo gpg --dearmor -o /etc/apt/keyrings/tanzu-archive-keyring.gpg
echo "deb [signed-by=/etc/apt/keyrings/tanzu-archive-keyring.gpg] https://storage.googleapis.com/tanzu-cli-installer-packages/apt tanzu-cli-jessie main" | sudo tee /etc/apt/sources.list.d/tanzu.list
sudo apt update
sudo apt install -y tanzu-cli
#Verify tanzu-cli installation
tanzu version

To install tanzu-cli in other environments, please refer to the documentation below:

Installing and Using VMware Tanzu CLI v1.5.x

(Optional) If you want to configure tanzu completion, please run the command below and follow the instructions output

tanzu completion --help

Step 2: Create cluster-autoscaler deployment from tanzu package using tanzu-cli

Switched to your Kubernetes context

kubectl config use-context <your context name>

List available cluster-autoscaler in tanzu package and note the version name

tanzu package available list cluster-autoscaler.tanzu.vmware.com

Create kubeconfig secret name cluster-autoscaler-mgmt-config-secret in cluster kube-system namespace

kubectl create secret generic cluster-autoscaler-mgmt-config-secret \
--from-file=value=<path to your kubeconfig file> \
-n kube-system

Please do not change the secret name (cluster-autoscaler-mgmt-config-secret) and namespace (kube-system)

Create cluster-autoscaler-values.yaml file

arguments:
  ignoreDaemonsetsUtilization: true
  maxNodeProvisionTime: 15m
  maxNodesTotal: 0 #Leave this value as 0. We will define the max and min number of nodes later.
  metricsPort: 8085
  scaleDownDelayAfterAdd: 10m
  scaleDownDelayAfterDelete: 10s
  scaleDownDelayAfterFailure: 3m
  scaleDownUnneededTime: 10m
clusterConfig:
  clusterName: "demo-autoscale-tkg" #adjust here
  clusterNamespace: "demo-autoscale-tkg-ns" #adjust here
paused: false

Required values:

clusterName: your cluster name
clusterNamespace: your cluster namespace

Install cluster-autoscaler

tanzu package install cluster-autoscaler \
--package cluster-autoscaler.tanzu.vmware.com \
--version <version available> \ #adjust the version listed above to match your kubernetes version
--values-file 'cluster-autoscaler-values.yaml' \
--namespace tkg-system #please do not change, this is default namespace for tanzu package

The cluster-autoscaler will deploy into the kube-system namespace.

Run the command below to verify cluster-autoscaler deployment:

kubectl get deployments.apps -n kube-system cluster-autoscaler

Configure the minimum and maximum number of nodes in your cluster
- Get machinedeployments name and namespace

kubectl get machinedeployments.cluster.x-k8s.io -A

Set cluster-api-autoscaler-node-group-min-size and cluster-api-autoscaler-node-group-max-size

kubectl annotate machinedeployment <machinedeployment name> cluster.x-k8s.io/cluster-api-autoscaler-node-group-min-size=<number min> -n <machinedeployment namespace>
kubectl annotate machinedeployment <machinedeployment name> cluster.x-k8s.io/cluster-api-autoscaler-node-group-max-size=<number max> -n <machinedeployment namespace>

Enable cluster autoscale for your cluster

Because this step requires provider permission to perform, please notify the cloud provider to perform this step.

Step 3: Test cluster autoscale

Get the current number of nodes

kubectl get nodes

There is currently only one worker node.

Create test-autoscale.yaml file

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx
  namespace: default
spec:
  replicas: 2
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx
        ports:
        - containerPort: 80
      topologySpreadConstraints: #Spreads pods across different nodes (ensures no node has more pods than others)
      - maxSkew: 1 
        topologyKey: kubernetes.io/hostname
        whenUnsatisfiable: DoNotSchedule
        labelSelector:
          matchLabels:
            app: nginx

Apply test-autoscale.yaml file to deploy 2 replicas of nginx pod in the default namespace (it will trigger to create a new worker node)

kubectl apply -f test-autoscale.yaml

Get nginx deployment

kubectl get pods

kubectl describe pod nginx-589656b9b5-mcm5j | grep -A 10 Events

You can see there is a new nginx pod with a status of Pending and the events shown FailedScheduling and TriggeredScaleUp:

Warning  FailedScheduling  2m53s  default-scheduler   0/2 nodes are available: 1 node(s) didn't match pod topology spread constraints, 1 node(s) had untolerated taint {node-role.kubernetes.io/control-plane: }. preemption: 0/2 nodes are available: 1 No preemption victims found for incoming pod, 1 Preemption is not helpful for scheduling.
Normal   TriggeredScaleUp  2m43s  cluster-autoscaler  pod triggered scale-up: [{MachineDeployment/demo-autoscale-tkg-ns/demo-autoscale-tkg-worker-node-pool-1 1->2 (max: 5)}]

Waiting for a new node to be provisioned, then you can see a new worker node has been provisioned and new nginx pod status is running
Clean up test resource

kubectl delete -f test-autoscale.yaml

After deleting the nginx deployment test. The cluster waits a few minutes to delete the unneeded node (please see scaleDownUnneededTime value in cluster-autoscaler-values.yaml file)

Delete cluster-autoscaler deployment (Optional)

In case you don't want your cluster to auto-scale anymore. You can delete cluster-autoscaler deployment using tanzu-cli:

tanzu package installed delete cluster-autoscaler -n tkg-system -y

Previous05. Deploy demo app with persistence volume and publish app via ingress controller

Last updated 10 months ago