Vertical-scaling

All Posts

ai (15)
security (10)
cybersecurity (7)
microservices (6)
llm (5)
ai-security (5)
artificial-intelligence (5)
cloud (5)
architecture (5)
distributed-systems (5)
kubernetes (5)
devops (5)
devsecops (4)
aisecurity (4)
agenticai (4)
owasp (4)
agents (3)
assistants (3)
machine-learning (3)
fitness (3)
cloud-native (3)
aiengineering (3)
llms (3)
cicd (3)
docker (3)
prompt-injection (3)
open-ai (2)
api (2)
api-security (2)
vector-search (2)
shift-left (2)
focus (2)
time-management (2)
flow (2)
attention (2)
user-experience (2)
ux (2)
seo (2)
privacy (2)
openai (2)
rag (2)
strength-training (2)
zero-trust (2)
compliance (2)
agentic-ai (2)
deep-learning (2)
quantization (2)
models (2)
multi-cloud (2)
performance (2)
llm-security (2)
github-actions (2)
governance (2)
infrastructure-as-code (2)
iac (2)
terraform (2)
pulumi (2)
platform-engineering (2)
workflow (1)
agi (1)
prompt-engineering (1)
ab-testing (1)
performance-testing (1)
hmac (1)
encryption (1)
security-architecture (1)
security-engineering (1)
security-frameworks (1)
vector-databases (1)
recommendation-system (1)
infrastructure (1)
sast (1)
code-scanning (1)
slm (1)
dependency-scanning (1)
vulnerability-scanning (1)
distillation (1)
deepseek (1)
minimax (1)
moonshot-ai (1)
anthropic (1)
ip-theft (1)
distraction (1)
productivity (1)
deep-work (1)
attention-management (1)
inspiration (1)
anomaly-detection (1)
offline-detection (1)
batch-processing (1)
dbscan (1)
weight-loss (1)
plateau (1)
calorie-deficit (1)
tdee (1)
adaptations (1)
fix (1)
service-mesh (1)
design-systems (1)
design (1)
decision-making (1)
leadership (1)
indecision (1)
decision-making-for-leaders (1)
saas (1)
sales (1)
product-led-growth (1)
product-led-sales (1)
plg (1)
pls (1)
product-management (1)
fingerprinting (1)
analytics (1)
personalization (1)
zero-trust-security (1)
zero-trust-architecture (1)
horizontal-scaling (1)
vertical-scaling (1)
ai-infrastructure (1)
health (1)
lifestyle (1)
workouts (1)
micro-workouts (1)
ui (1)
user-interface (1)
dynamic-ui (1)
intent-mapping (1)
inter-service-communication (1)
microservives (1)
synchronous-communication (1)
asynchronous-communication (1)
rest (1)
rpc (1)
message-queues (1)
event-driven-architecture (1)
best-practices (1)
grpc (1)
skaffold (1)
k8s (1)
abac (1)
database-native (1)
data-security (1)
distributed-transactions (1)
saga (1)
design-pattern (1)
web-vitals (1)
web-performance (1)
page-speed (1)
message-brokers (1)
scalability (1)
data-governance (1)
multi-tenancy (1)
appsec (1)
infosec (1)
engineeringleadership (1)
platformengineering (1)
phishing (1)
ai-governance (1)
model-guardrails (1)
onnx (1)
hardware (1)
cloud-storage (1)
cost-optimization (1)
data-protection (1)
minio (1)
tiering (1)
redis (1)
high-throughput (1)
p99 (1)
latency (1)
tuning (1)
opensearch (1)
fluentbit (1)
logging (1)
anigravity (1)
dev-sec-ops (1)
ci (1)
cd (1)
ci-cd (1)
supply-chain-security (1)
data-types (1)
optimization (1)
antigravity (1)
ml (1)
openclaw (1)
platform-security (1)
custom-waf (1)
openresty (1)
web-security (1)
web-application-firewall (1)
deployment-strategies (1)
continuous-delivery (1)
continuous-deployment (1)
feature-flags (1)
software-development (1)
mcp (1)
multi-tenant (1)
claude (1)
nodejs (1)
express (1)
opentofu (1)
automation (1)
distributedsystems (1)
cloudsecurity (1)
systemsdesign (1)
aiinfrastructure (1)
powerlifting (1)
usapl (1)
apis (1)
cyber-security (1)
authorization (1)
function-as-a-service (1)
serverless (1)
openfaas (1)

Published on
January 20, 2026
In-Place Vertical Scaling for LLM Workloads in Kubernetes
LLMs Horizontal-Scaling Vertical-Scaling Kubernetes AI AI-Infrastructure
Stateless microservices are the gold standard when it comes to scaling applications in the cloud. This allows services to scale up and down and handle massive bursts of traffic. LLM inference engines are stateless, but low-latency serving relies on preserving KV cache across requests. In this post, I look at how Kubernetes 1.35 advances in-place vertical scaling for running pods. The approach applies to any workload where you want to scale resources without restarting the process, with LLM inference being a particularly good fit.

Vertical-scaling

vertical-scaling (1)

In-Place Vertical Scaling for LLM Workloads in Kubernetes