Building a data platform

for machine learning operation

Content

  • Who am I
  • Problem definition
  • MLOps vs DevOps
  • Containers & Kubernetes
  • KubeFlow
  • Conclusions

Julien Bisconti

Software Engineer
specialized in Google Cloud


google cloud professional data engineer certification Google Developer Expert badge kubernetes certifications

previous talks

PART I

problem definition

how to:

  • train
  • build
  • deploy
  • monitor


Machine Learning models

in a repoducible manner

at scale ?

Hidden Technical Debt in Machine Learning Systems

Hidden Technical Debt in Machine Learning Systems paper Source: D. Sculley, et al.: Hidden Technical Debt in Machine Learning Systems

cost of context switching

Hidden Technical Debt in Machine Learning Systems paper

spreadsheet

source link

Mental limitations


  • # decisions / day
  • # things to remember
  • speed of memory / reflexes

Strategy

mapping

Simon Wardly - Mapping

People

mapping

Simon Wardly - Mapping

Assumptions

mapping

Simon Wardly - Mapping

production grade infrastructure

Yevgeniy Brikman - Lessons from 300k+ Lines of Infrastructure Code

build OR buy

Tweet about datacenter

whole thread

We could build it

BUT

spending time on the business

makes more sense financially

no code repository

PART II

MLOPS

vs

DevOps

#thisisdevops

this is devops

Yevgeniy Brikman - Lessons from 300k+ Lines of Infrastructure Code

ML platform assembly kit

data engineer toolbox Source: article by Clemens Mewald

how different is ML

  1. Various hardware
  2. Resources heavy
  3. Various cycles
  4. Many languages
  1. Dependencies
  2. Explainability >< debugging
  3. Composability of models
  4. Huge amount of data

And after a while

More models, more requests and more data

Consistency is key

source

Archives of the History of American Psychology, The Center for the
History of Psychology, The University of Akron

army report uniformity

PART III

Containers & Kubernetes

Containers

container image: zip file of app + dependencies
docker: program that runs the image
each container runs in its own namespace

data engineer toolbox

source link

Deployment

Containers: lightweight VMs

  • 12 factor app
  • easier deploy
  • reproducible build


but ...

how to orchestrate containers across many computers ?

Deployment concerns

  • Scaling up and down
  • Redundancy
  • Scheduling / Orchestration
  • Service Discovery
  • Resiliency
  • Rolling out and back
  • Health checks
  • Secret and config

Kubernetes

kubernetes architecture

KUBEFLOW

The Machine Learning Toolkit for Kubernetes

KF Pipelines goals:

  • End-to-end orchestration
  • Easy experimentation
  • Easy re-use

Deploy kubeflow

GCP AI Platform

Local installation

https://microk8s.io/docs/addon-kubeflow

CONCLUSIONS

  • Consistency is key
  • Context switching is expensive
  • Re-use = able to share = caring
  • More models & data tomorrow than today

Resources

THANK YOU

and I'm sorry 🙏
If you had to maintain my code
I hope you learned more by maintaining it
than me by writing it

contact

https://bisconti.cloud/

@julienBisconti

Slides made with Reveal.js and hugo-reveal