Skip to content

Scaling Class

The Scaling class manages autoscaling configurations in Kubernetes. It provides horizontal pod autoscaling (HPA), vertical pod autoscaling (VPA), and cluster autoscaling capabilities for optimal resource utilization.

Overview

from celestra import Scaling

# Basic horizontal pod autoscaler
scaling = Scaling("app-hpa").target_cpu_utilization(70).min_replicas(2).max_replicas(10)

# Production autoscaling with multiple metrics
scaling = (Scaling("production-hpa")
    .target_cpu_utilization(70)
    .target_memory_utilization(80)
    .min_replicas(3)
    .max_replicas(20)
    .scale_up_delay(60)
    .scale_down_delay(300))

Core API Functions

Horizontal Pod Autoscaler (HPA)

Horizontal Pod Autoscaler

Configure horizontal pod autoscaler.

scaling = Scaling("app-hpa").horizontal_pod_autoscaler()

Target CPU Utilization

Set target CPU utilization percentage.

# 70% CPU utilization target
scaling = Scaling("app-hpa").target_cpu_utilization(70)

# 50% CPU utilization target
scaling = Scaling("app-hpa").target_cpu_utilization(50)

Target Memory Utilization

Set target memory utilization percentage.

# 80% memory utilization target
scaling = Scaling("app-hpa").target_memory_utilization(80)

# 60% memory utilization target
scaling = Scaling("app-hpa").target_memory_utilization(60)

Minimum Replicas

Set minimum number of replicas.

# Minimum 2 replicas
scaling = Scaling("app-hpa").min_replicas(2)

# Minimum 5 replicas for high availability
scaling = Scaling("app-hpa").min_replicas(5)

Maximum Replicas

Set maximum number of replicas.

# Maximum 10 replicas
scaling = Scaling("app-hpa").max_replicas(10)

# Maximum 50 replicas for high traffic
scaling = Scaling("app-hpa").max_replicas(50)

Scale Up Delay

Set scale up delay in seconds.

# 60 second scale up delay
scaling = Scaling("app-hpa").scale_up_delay(60)

# 30 second scale up delay for responsive scaling
scaling = Scaling("app-hpa").scale_up_delay(30)

Scale Down Delay

Set scale down delay in seconds.

# 300 second scale down delay
scaling = Scaling("app-hpa").scale_down_delay(300)

# 600 second scale down delay for stability
scaling = Scaling("app-hpa").scale_down_delay(600)

Vertical Pod Autoscaler (VPA)

Vertical Pod Autoscaler

Configure vertical pod autoscaler.

scaling = Scaling("app-vpa").vertical_pod_autoscaler()

VPA Mode

Set VPA mode (Auto, Initial, Off).

# Auto mode - automatically adjust resources
scaling = Scaling("app-vpa").vpa_mode("Auto")

# Initial mode - only set initial requests
scaling = Scaling("app-vpa").vpa_mode("Initial")

# Off mode - only provide recommendations
scaling = Scaling("app-vpa").vpa_mode("Off")

VPA Update Policy

Set VPA update policy.

# Automatic updates
scaling = Scaling("app-vpa").vpa_update_policy("Auto")

# Manual updates only
scaling = Scaling("app-vpa").vpa_update_policy("Manual")

VPA Minimum Allowed CPU

Set minimum allowed CPU.

scaling = Scaling("app-vpa").vpa_min_allowed_cpu("100m")

VPA Maximum Allowed CPU

Set maximum allowed CPU.

scaling = Scaling("app-vpa").vpa_max_allowed_cpu("2")

VPA Minimum Allowed Memory

Set minimum allowed memory.

scaling = Scaling("app-vpa").vpa_min_allowed_memory("128Mi")

VPA Maximum Allowed Memory

Set maximum allowed memory.

scaling = Scaling("app-vpa").vpa_max_allowed_memory("4Gi")

Custom Metrics

Add Custom Metric

Add custom metric for scaling.

# Custom metric for queue length
scaling = Scaling("app-hpa").add_custom_metric("queue_length", "10", "AverageValue")

# Custom metric for response time
scaling = Scaling("app-hpa").add_custom_metric("response_time", "100ms", "AverageValue")

Add Prometheus Metric

Add Prometheus metric for scaling.

# Prometheus metric for request rate
scaling = Scaling("app-hpa").add_prometheus_metric("http_requests_total", "1000")

Add External Metric

Add external metric for scaling.

# External metric for database connections
scaling = Scaling("app-hpa").add_external_metric("database_connections", "50")

Cluster Autoscaling

Cluster Autoscaler

Configure cluster autoscaler.

scaling = Scaling("cluster-autoscaler").cluster_autoscaler()

Scale Down Enabled

Enable or disable scale down.

# Enable scale down
scaling = Scaling("cluster-autoscaler").scale_down_enabled(True)

# Disable scale down
scaling = Scaling("cluster-autoscaler").scale_down_enabled(False)

Scale Down Delay After Add

Set delay after adding nodes before scaling down.

# 10 minute delay
scaling = Scaling("cluster-autoscaler").scale_down_delay_after_add(600)

Scale Down Unneeded Time

Set time before scaling down unneeded nodes.

# 10 minute unneeded time
scaling = Scaling("cluster-autoscaler").scale_down_unneeded_time(600)

Max Node Provision Time

Set maximum time to provision new nodes.

# 15 minute provision time
scaling = Scaling("cluster-autoscaler").max_node_provision_time(900)

Node Group Configuration

Node Group

Configure node group for autoscaling.

# Node group configuration
scaling = Scaling("cluster-autoscaler").node_group("app-nodes", 3, 10)

Add Node Group

Add node group for autoscaling.

# Add multiple node groups
scaling = (Scaling("cluster-autoscaler")
    .add_node_group("app-nodes", 3, 10)
    .add_node_group("db-nodes", 2, 5))

Node Group Labels

Set labels for node group.

labels = {
    "node-type": "app",
    "environment": "production"
}
scaling = Scaling("cluster-autoscaler").node_group_labels(labels)

Node Group Taints

Set taints for node group.

taints = [{
    "key": "dedicated",
    "value": "app",
    "effect": "NoSchedule"
}]
scaling = Scaling("cluster-autoscaler").node_group_taints(taints)

Advanced Configuration

Namespace

Set the namespace for the scaling configuration.

scaling = Scaling("app-hpa").namespace("production")

Add Label

Add a label to the scaling configuration.

scaling = Scaling("app-hpa").add_label("environment", "production")

Add Labels

Add multiple labels to the scaling configuration.

labels = {
    "environment": "production",
    "team": "platform",
    "tier": "autoscaling"
}
scaling = Scaling("app-hpa").add_labels(labels)

Add Annotation

Add an annotation to the scaling configuration.

scaling = Scaling("app-hpa").add_annotation("description", "Application autoscaler")

Add Annotations

Add multiple annotations to the scaling configuration.

annotations = {
    "description": "Application autoscaler for production",
    "owner": "platform-team",
    "scale-policy": "conservative"
}
scaling = Scaling("app-hpa").add_annotations(annotations)

Behavior

Set scaling behavior configuration.

behavior = {
    "scaleUp": {
        "stabilizationWindowSeconds": 60,
        "policies": [{
            "type": "Pods",
            "value": 2,
            "periodSeconds": 60
        }]
    },
    "scaleDown": {
        "stabilizationWindowSeconds": 300,
        "policies": [{
            "type": "Pods",
            "value": 1,
            "periodSeconds": 60
        }]
    }
}
scaling = Scaling("app-hpa").behavior(behavior)

Metrics Interval

Set metrics collection interval.

# 30 second metrics interval
scaling = Scaling("app-hpa").metrics_interval(30)

Sync Period

Set sync period for autoscaler.

# 15 second sync period
scaling = Scaling("app-hpa").sync_period(15)

Output Generation

Generate

Generate the scaling configuration.

# Generate Kubernetes YAML
scaling.generate().to_yaml("./k8s/")

# Generate Helm values
scaling.generate().to_helm_values("./helm/")

# Generate Terraform
scaling.generate().to_terraform("./terraform/")

Complete Example

Here's a complete example of a production-ready scaling configuration:

from celestra import Scaling

# Create comprehensive scaling configuration
production_scaling = (Scaling("production-hpa")
    .horizontal_pod_autoscaler()
    .target_cpu_utilization(70)
    .target_memory_utilization(80)
    .min_replicas(3)
    .max_replicas(20)
    .scale_up_delay(60)
    .scale_down_delay(300)
    .add_custom_metric("queue_length", "10", "AverageValue")
    .add_prometheus_metric("http_requests_total", "1000")
    .behavior({
        "scaleUp": {
            "stabilizationWindowSeconds": 60,
            "policies": [{
                "type": "Pods",
                "value": 2,
                "periodSeconds": 60
            }]
        },
        "scaleDown": {
            "stabilizationWindowSeconds": 300,
            "policies": [{
                "type": "Pods",
                "value": 1,
                "periodSeconds": 60
            }]
        }
    })
    .metrics_interval(30)
    .sync_period(15)
    .namespace("production")
    .add_labels({
        "environment": "production",
        "team": "platform",
        "tier": "autoscaling"
    })
    .add_annotations({
        "description": "Production autoscaler",
        "owner": "platform-team@company.com",
        "scale-policy": "conservative"
    }))

# Generate manifests
production_scaling.generate().to_yaml("./k8s/")

Scaling Patterns

Conservative Scaling Pattern

# Conservative scaling for stability
conservative_scaling = (Scaling("app-hpa")
    .target_cpu_utilization(70)
    .target_memory_utilization(80)
    .min_replicas(3)
    .max_replicas(10)
    .scale_up_delay(120)
    .scale_down_delay(600))

Aggressive Scaling Pattern

# Aggressive scaling for responsiveness
aggressive_scaling = (Scaling("app-hpa")
    .target_cpu_utilization(50)
    .target_memory_utilization(60)
    .min_replicas(2)
    .max_replicas(50)
    .scale_up_delay(30)
    .scale_down_delay(180))

High Availability Scaling Pattern

# High availability scaling
ha_scaling = (Scaling("app-hpa")
    .target_cpu_utilization(60)
    .target_memory_utilization(70)
    .min_replicas(5)
    .max_replicas(30)
    .scale_up_delay(60)
    .scale_down_delay(300))

VPA Pattern

# Vertical pod autoscaler
vpa_scaling = (Scaling("app-vpa")
    .vertical_pod_autoscaler()
    .vpa_mode("Auto")
    .vpa_update_policy("Auto")
    .vpa_min_allowed_cpu("100m")
    .vpa_max_allowed_cpu("2")
    .vpa_min_allowed_memory("128Mi")
    .vpa_max_allowed_memory("4Gi"))

Cluster Autoscaling Pattern

# Cluster autoscaler
cluster_scaling = (Scaling("cluster-autoscaler")
    .cluster_autoscaler()
    .scale_down_enabled(True)
    .scale_down_delay_after_add(600)
    .scale_down_unneeded_time(600)
    .max_node_provision_time(900)
    .add_node_group("app-nodes", 3, 10)
    .add_node_group("db-nodes", 2, 5))

Best Practices

1. Set Appropriate Target Utilization

# ✅ Good: Conservative target utilization
scaling = Scaling("app-hpa").target_cpu_utilization(70).target_memory_utilization(80)

# ❌ Bad: Too aggressive target utilization
scaling = Scaling("app-hpa").target_cpu_utilization(90).target_memory_utilization(95)

2. Use Scale Down Delays

# ✅ Good: Use scale down delays for stability
scaling = Scaling("app-hpa").scale_down_delay(300)

# ❌ Bad: No scale down delay
scaling = Scaling("app-hpa")  # No delay

3. Set Reasonable Min/Max Replicas

# ✅ Good: Reasonable replica limits
scaling = Scaling("app-hpa").min_replicas(3).max_replicas(20)

# ❌ Bad: Extreme replica limits
scaling = Scaling("app-hpa").min_replicas(1).max_replicas(1000)

4. Use Custom Metrics for Better Scaling

# ✅ Good: Use custom metrics
scaling = Scaling("app-hpa").add_custom_metric("queue_length", "10")

# ❌ Bad: Only CPU/memory metrics
scaling = Scaling("app-hpa")  # No custom metrics

5. Configure Scaling Behavior

# ✅ Good: Configure scaling behavior
scaling = Scaling("app-hpa").behavior({
    "scaleUp": {"stabilizationWindowSeconds": 60},
    "scaleDown": {"stabilizationWindowSeconds": 300}
})

# ❌ Bad: Default behavior
scaling = Scaling("app-hpa")  # Default behavior

6. Use VPA for Resource Optimization

# ✅ Good: Use VPA for resource optimization
scaling = Scaling("app-vpa").vertical_pod_autoscaler().vpa_mode("Auto")

# ❌ Bad: No VPA
scaling = Scaling("app-hpa")  # No VPA

Next Steps