Kubernetes
Logging & Monitoring
Plan
- Introduction
- Container technology
- Kubernetes
- Logging architecture
- Monitoring
How Long
from monolith to microservices ?
8 fallacies of distributed computing
- The network is reliable.
- Latency is zero.
- Bandwidth is infinite.
- The network is secure.
- Topology doesn't change.
- There is one administrator.
- Transport cost is zero.
- The network is homogeneous.
Source: wikipedia
RFC 1925 - 12 Networking Truths
Logging & Monitoring:
monolithic app
-vs-
distributed system
Logging | recording events |
Metrics | data combined from measuring events |
Tracing | recording events with causal ordering |
credit @coda
Log Levels for dev
- Info
- Debug
- Warning
- Error
- Fatal
- Zombie-Apocalypse
- Meteor
- π€·ββοΈ don't care
- π€ when necessary
- π€·ββοΈ don't care
- π§ to investigate
- π€·ββοΈ don't care
- π€·ββοΈ don't care
- π€·ββοΈ don't care
- Application errors π where to look
- Business metrics π money
- Latency π user experience
What is a container
Not a real thing. An application delivery mechanism with process
isolation based on several Linux kernel features.
Dev π inside container (build)
Ops π outside container (run)
container = common interface for deploying services
cAdvisor
docker run \
--volume=/:/rootfs:ro \
--volume=/var/run:/var/run:ro \
--volume=/sys:/sys:ro \
--volume=/var/lib/docker/:/var/lib/docker:ro \
--volume=/dev/disk/:/dev/disk:ro \
--publish=8080:8080 \
--detach=true \
--name=cadvisor \
google/cadvisor:latest
localhost:8080Log Levels for dev
- Info
- Debug
- Warning
- Error
- Fatal
- π€·ββοΈ don't care
- π€ stdout
- π€·ββοΈ don't care
- π§ stderr
- π€·ββοΈ don't care
Node level logging
- JSON (no multiline)
/var/log/
- keep previous pod logs
- pod eviction = β no logs
- logrotate script
cluster level logging
Logs lifecycle & storage
independent of nodes, pods, or containers
logging with node agent
- per node agent pod (DaemonSet)
- centralized logging
- fluentd
- logs to stdout/stderr
logging with streaming side car
- logs to shared volumes
- sidecar streams logs to its own stdout
- separate log streams
- double disk usage
- better to directly write to stdout/stderr
logging with sidecar agent
- per pod agent (resources!)
- no
kubectl logs
logging from application
which logs
CPU & RAM should be enough
not really...
docker stats
kubectl top nodes
kubectl top pods
Why monitoring
- detect/prevent outages (alerting)
- entry price for chaos engineering
- auto-scaling (HPA)
- optimize (cost & perfs)
different levels of monitoring
- Infrastructure level - U.S.E
- Application level - R.E.D
USE method: for every resource, check:- utilization
- saturation
- errors
RED method: for every service, check request:- rate
- error rate
- duration (distributions)
are withing SLO/A What to monitor
- request time/rate (if it's fast, it works)
- connections (health check, DB, pods)
- kubernetes pods (CrashLoopBackOff,...)
- kubernetes internals (control plane, kubelet, ...)
- infrastructure (disk space, CPU, RAM, network,...)
Health check
what does "healthy" mean?
Where to monitor
maybe not on the cluster that you are monitoring
don't take my word for it
SLI, SLO, SLAThese measurements describe basic properties of metrics that matter,
what values we want those metrics to have, and how weβll react if we
canβt provide the expected service
THANK YOU
and I'm sorry π
If you had to maintain my code
I hope you learned more by maintaining it
than me by writing it
contact
https://bisconti.cloud/
@julienBisconti
Slides made with Reveal.js and hugo-reveal