Scale on

GCP

Julien Bisconti

Software Engineer
specialized in Google Cloud

previous talks

Content

Introduction
Scale code
Scale teams
Conclusions

introduction

UYKWYAD

(unless you know what you are doing)

it means you understand the tradeoffs

What does it mean to scale

vertically
horizontally
deeply

village (100+) -> city (10k+) -> megacity (1m+)

same same but different

source: https://gist.github.com/hellerbarde/2843375

When to scale

Bottleneck ?

If NO, not a scaling problem.

👉 scaling “people” problems.

For a startup with a product,

a serverless architecture

is a really good place to start (UYKWYAD)

It is serverless the same way WiFi is wireless. At some point, it will hit a wire.
— Gojko Adzic

" we should build our own X "

where "X" is anything not related to the business

Example: framework, in-memory database, queue system, new programming language, etc.

We could build it

BUT

spending time on the business

makes more sense financially

repository

tradeoffs

Developer time (build)
Use a product (buy)

On premise: fixed price and capacity
Cloud: pay for what you use

latency versus throughput

cold start ?

👉 strategy

Simon Wardly - Mapping

Yevgeniy Brikman - Lessons from 300k+ Lines of Infrastructure Code

Archives of the History of American Psychology, The Center for the
History of Psychology, The University of Akron

scaling code

Should we use kubernetes ?

The question people asks me the most after “can I get a free subscription?”

Deployment

Containers: lightweight VMs

12 factor app
easier deploy
reproducible build

but ...

Deployment concerns

Scaling up and down
Redundancy
Scheduling / Orchestration
Service Discovery

Resiliency
Rolling out and back
Health checks
Secret and config

➡️ kubernetes

but ...

Kubernetes concerns

Logging
Tracing
Metrics
Dependency visualisation
Service identity and Auth

Circuit breaking
Traffic flow and policies
Failover
Fault injection
...

➡️ ️ use code?

Logging and microservices

Don’t do it

(UYKWYAD)

in Distributed System, logging is not debugging

💸 : # app x $ (network + storage) x rentention day

Logging (immutable) event. (Selfish traces)
Metrics just statistics over time
Tracing traces provide context in the life of a transaction

They help to narrow down a problem, they will guide you where to investigate.

source: https://cloud.google.com/trace/docs/viewing-details

Observability

What changes in your system ?

Site Reliability engineering
Chaos/Resilience engineering
FinOps 💸
⚠️ Languages proliferation
opentelemetry.io

(see prev talks)

Staging/Test environment

Also call OpsTest
Devs don’t trust it and use it to test deployment
Never mirror production (data? can switch traffic?)
Cost money just to blow air (mining bitcoin?)
Most VMs run at 5% usage

Infra as Code

How often do make changes ?
Is it immutable ?
Used for disaster recovery (chaos eng.)

code containing business logic

VS

all the rest

Total cost of ownership

build OR buy

Restaurants buy, cook and sell food.

Very few do farming and even less are good at both.

build OR buy

whole thread

Scaling

Compute
Memory
Storage
Networking

What is missing ?

Security

The real cloud lock-in.

Level of Access

Organization
Folder
Project
Resources

group
user
service account

Going IPO ?

Compute
Memory
Storage
Networking
Security

What is missing ?

➡️ DATA

Users care about data (information)

Users don’t care about your code

Scaling database

Add more CPU/RAM
Optimize queries
READ replica
Sharding

Spanner: transactions at scale
BigQuery: analytics at scale
Data Studio: data visualization at scale
AI at scale: AI platform / kubeflow
Cloud Asset Inventory / Forseti

Scaling people

Hiring & culture (10 ppl -> 150)
Learning new skills
Onboarding
Communication tools & process
Management & clear objectives
Finance (release once a year)

Conclusions

A tool won’t fix a bad process!
Automating a bad process makes it automatically worse
Time is money: build vs buy
Document architecture decisions
Microservices are meant to ship your organization charts (Conway’s Law)
Know your bottlenecks (observability)
Securing managing access from the beginning
Mindset of always be migrating
Be careful what you wish for

Resources

Designing Data-Intensive Applications - Martin Kelppmann
https://cloud.google.com/files/shifting-left-on-security.pdf

THANK YOU

and I'm sorry 🙏
If you had to maintain my code
I hope you learned more by maintaining it
than me by writing it

contact

https://bisconti.cloud/

@julienBisconti

Slides made with Reveal.js and hugo-reveal