The two main workflows of machine learning are, first, train the model, then deploy the model. The time it takes to go from a Jupyter notebook to a deployed model in production can be months.The tooling around the training workflow is getting better but deploying is still cumbersome.That is why Kubeflow was created.The KubeFlow project is dedicated to making deployments of machine learning workflows on Kubernetes simple, portable and scalable. In this presentation, we are going to see why such a project exists and the challenges Machine Learning Operations (MLOps) brings to the table.
My personal definition of “at scale” means a service handles more than 10,000 RPS (ten thousands requests per second). The reason I came up with that number is when I learned that the C10K problem is a solved problem and hardware is not the bottleneck anymore. Meaning that a single machine can handle 10,000 connections at the same time. If a single machine can handle that much, why build a distributed system ?
This is the feedback that I sent to someone who is trying to make it in the open source world. Today is Sunday, I don’t have a lot of free time, so here it is: Since I know how hard it is to thrive in open source, please allow me to be brutally honest in the feedback. I might use strong words but it’s because I try to make you understand how to succeed in your project.