AI

Enterprise companies find MLOps critical for reliability and performance

Comment

Image Credits: Peter Cade (opens in a new window) / Getty Images

Rish Joshi

Contributor
Rish is an entrepreneur and investor. Previously, he was a VC at Gradient Ventures (Google’s AI fund), co-founded a fintech startup building an analytics platform for SEC filings and worked on deep-learning research as a graduate student in computer science at MIT.

More posts from Rish Joshi

Enterprise startups UIPath and Scale have drawn huge attention in recent years from companies looking to automate workflows, from RPA (robotic process automation) to data labeling.

What’s been overlooked in the wake of such workflow-specific tools has been the base class of products that enterprises are using to build the core of their machine learning (ML) workflows, and the shift in focus toward automating the deployment and governance aspects of the ML workflow.

That’s where MLOps comes in, and its popularity has been fueled by the rise of core ML workflow platforms such as Boston-based DataRobot. The company has raised more than $430 million and reached a $1 billion valuation this past fall serving this very need for enterprise customers. DataRobot’s vision has been simple: enabling a range of users within enterprises, from business and IT users to data scientists, to gather data and build, test and deploy ML models quickly.

Founded in 2012, the company has quietly amassed a customer base that boasts more than a third of the Fortune 50, with triple-digit yearly growth since 2015. DataRobot’s top four industries include finance, retail, healthcare and insurance; its customers have deployed over 1.7 billion models through DataRobot’s platform. The company is not alone, with competitors like H20.ai, which raised a $72.5 million Series D led by Goldman Sachs last August, offering a similar platform.

Why the excitement? As artificial intelligence pushed into the enterprise, the first step was to go from data to a working ML model, which started with data scientists doing this manually, but today is increasingly automated and has become known as “auto ML.” An auto-ML platform like DataRobot’s can let an enterprise user quickly auto-select features based on their data and auto-generate a number of models to see which ones work best.

As auto ML became more popular, improving the deployment phase of the ML workflow has become critical for reliability and performance — and so enters MLOps. It’s quite similar to the way that DevOps has improved the deployment of source code for applications. Companies such as DataRobot and H20.ai, along with other startups and the major cloud providers, are intensifying their efforts on providing MLOps solutions for customers.

We sat down with DataRobot’s team to understand how their platform has been helping enterprises build auto-ML workflows, what MLOps is all about and what’s been driving customers to adopt MLOps practices now.

The rise of MLOps

As enterprises adopt auto-ML workflows, one of the issues they’re commonly seeing is that many of the models built by data scientists never make it into production. There are a number of issues that can stop deployment, including models that underperform in pre-production environments, incompatibilities between production environments and the model-training environment, or inconsistencies with production infrastructure.

This is where MLOps comes in.

The world of MLOps has been shaped a fair bit by the evolution of DevOps, which has rocketed to popularity the past few years. The role of DevOps is to efficiently integrate and deploy source code, and it’s typically managed by a DevOps engineer who works as a bridge between IT and developers.

MLOps is similar, but focuses on the ML model and data sets as opposed to code. These days, data engineers run MLOps, but it’s likely the specialized role of MLOps engineer will come about soon.

There are four components to the modern MLOps workflow:

  • Continuous Integration: In DevOps, this refers to synchronizing new code with the existing code base, whereas in MLOps, this process refers to synchronizing the data and models. This involves checks such as confirming that a model mathematically converges, making sure it does not result in data-type errors, and running tests on sub-methods within the model to ensure they’re working as expected.
  • Continuous Deployment: In DevOps, this refers to moving code into production, and it’s the same with MLOps, except with models instead of code. This involves checks such as ensuring that the libraries required for a model to run exist in the production environment, testing the model with sample input data to verify it’s producing the expected outputs and testing performance metrics in pre-production.
  • Monitoring: Once a model has been deployed, it needs to be actively evaluated to ensure that it’s working as desired, both in terms of accuracy and runtime speed. MLOps solutions look at metrics such as data drift (assessing whether a model is losing its accuracy as input data changes) and performance around run time and latency.
  • Governance: For an enterprise company that would likely have many algorithms in production at once, issues can crop up requiring a data scientist to look into what’s causing a model to not work as expected. Having an end-to-end system that enables tracking by model of which data it was trained on, who built the model and when, and other such factors, can be helpful. Further, maintaining this data is helpful for compliance purposes.

How companies like DataRobot have driven the need for MLOps

DataRobot’s enterprise AI platform helps customers streamline the full ML life cycle across data preparation, model building and model deployment. H20.ai offers a similar solution to DataRobot called H20 Driverless AI, which provides end-to-end automated AI capabilities. One of the key differences between the two platforms lies with their target users, as H20.ai tends to cater to more technical users, whereas DataRobot serves business and IT folks along with data scientists.

Beyond end-to-end AI workflow platforms, the auto-ML market has been flooded with many companies providing tools for various parts of the enterprise AI stack. Cloud providers, including Amazon, Microsoft and Google, have innovated by developing auto-ML capabilities for cloud customers. Specialized platforms such as Domino Data Lab offer solutions for advanced users, and many tools such as TensorFlow and pre-built classifiers are readily accessible to developers for model building.

In the case of end-to-end AI workflow platforms such as DataRobot, some of the key benefits for enterprises have included the automation of various parts of the workflow, particularly around feature engineering and model generation, and the efficiency that comes with consolidating the entire workflow onto a single platform.

That’s perhaps a lot of buzzwords, so let’s consider the case of a security team at a credit card company assessing fraud risk for users. Let’s assume the input data consists of rows pertaining to end customers, with each row containing metadata including the day the customer’s card was activated, the day it expired and the number of fraudulent events identified in that time frame.

In order to effectively model the fraud risk, the security team would need to take the difference between the card activation and card expiration days and tie that to the number of fraudulent events identified. This is called feature engineering, which involves combining the input features in such a way that helps an ML model learn the underlying patterns as best as possible.

This may look simple, but problems often have a large number of input data columns that can greatly increase the number of combinations one has to try — and the relations between different data points may not be easy to discern, either.

Automated feature engineering makes this process simpler by auto-testing many different combinations of input features, quickly and at scale, to help the user pick the best one.

Once a user has finalized the set of features, DataRobot’s automated model generation capability lets them run many different types of models on the data, and see which ones perform best. This saves users the time of building models from scratch, and also gives them the benefit of seeing how different models perform.

Moreover, in situations where the data is rapidly changing, it gives users the ability to rerun the full set of models and re-determine which ones work best based on new data. In the case of the security team at the credit card company, consider a model that was developed in a particular region. If the security team is tasked with understanding fraud risk in another region and further receives some new data columns specific to that region, it’s possible the initial models won’t perform as well as new models that take all the available data into account.

The consolidation of the entire workflow into a single platform also provides several benefits for users. On the model building side, the coupling of data to a variety of models can make experimenting easier and help debug any issues that come up with the models much quicker. On the model deployment side, it helps with tracking source data and model attributes for models in deployment, both for any changes that become necessary and for governance.

Though companies like DataRobot and H20.ai offer end-to-end AI workflow platforms, the drive toward automating these workflows has not solely been confined to a single vendor solution. Given the modularity between data prep, feature engineering and model development, enterprises are often using permutations of a number of different solutions to satisfy their requirements.

In DataRobot’s case, use of their products alongside Snowflake and Tableau has been a popular ask by customers. Customers commonly tend to use ML tools offered by cloud providers in conjunction with DataRobot and H20.ai’s products as well, and both of them provide tight integration with the major cloud providers.

The rapidly expanding MLOps solutions market

The market for MLOps solutions has been growing over the past year as enterprises focused their efforts on model deployment and governance following the widespread adoption of auto-ML tools.

DataRobot recently acquired ParallelM, one of the early entrants in the MLOps space back in 2017, which enables customers to deploy models to infrastructure such as Kubernetes and Spark, either on-premise or on one of the major cloud providers. H20.ai partnered last year with ParallelM’s MLOps solution, as well.

The MLOps space is also seeing open-source solutions prop up. KubeFlow is an open-source tool that enables MLOps capabilities for deploying to Kubernetes, and, similar to TensorFlow, it began as a project based on Google’s internal ML pipelines. DataBricks has released an open-source tool called MLFlow, which provides full life cycle workflows for ML development, including MLOps with deployment capabilities to Apache Spark.

The major cloud providers have also made their own forays into this category. Amazon SageMaker has introduced MLOps capabilities by helping customers leverage AWS Lambda and Step Functions for deploying models. Microsoft Azure has enabled tight integration between its auto-ML platform Azure Machine Learning and its Azure DevOps platform to enable MLOps functionality. Google Cloud has similarly moved to providing MLOps capabilities by outlining use of TensorFlow and KubeFlow along with Google Build.

Enterprises deciding on which MLOps solution to use will likely consider the following two factors: the auto-ML platform they’re using, and the orchestration framework to which they plan to deploy. For enterprises using a cloud auto-ML platform such as Amazon SageMaker, the default choice will likely be to use the associated integrations from the cloud provider and string together an MLOps workflow. The same will likely be true for standalone platforms such as DataRobot, which provide auto-ML tools with an associated MLOps capability.

Kubernetes has increasingly been a popular scalable orchestration platform for ML workloads. MLOps solutions such as KubeFlow, which help deploy to Kubernetes, and ParallelM’s MCenter product, which also supports Kubernetes, are likely to see growing adoption, given the widespread use of Kubernetes. Another advantage of Kubernetes is its ability to help streamline hybrid deployments across on-prem and cloud, which many companies demand, such as OpenAI, which uses Kubernetes across on-prem, and Microsoft Azure.

The MLOps market will not likely be a winner-take-all. We’ll likely see continued effort on part of auto-ML providers to create tight integrations that enable MLOps capabilities for their customers, and we’ll also see select deployment practices such as the use of Kubernetes continue to grow as developers begin to prioritize deployment possibilities from the outset as they consider different ML workflow platform providers.

More TechCrunch

Google has joined investors backing Namma Yatri, an open-source ride-sharing app in India that is eroding market share from Uber and Ola with its no-commission model. Namma Yatri, whose parent…

Google backs Indian open-source Uber rival

These messaging features, announced at WWDC 2024, will have a significant impact on how people communicate every day.

At last, Apple’s Messages app will support RCS and scheduling texts

iOS 18 will be available in the fall as a free software update.

Here are all the devices compatible with iOS 18

The tests indicate there are loopholes in TikTok’s ability to apply its parental controls and policies effectively in a situation where the teen user originally lied about their age, as…

TikTok glitch allows Shop to appear to users under 18, despite adults-only policy

Lhoopa has raised $80 million to address the lack of affordable housing in Southeast Asian markets, starting with the Philippines.

Lhoopa raises $80M to spur more affordable housing in the Philippines

Former President Donald Trump picked Ohio Senator J.D. Vance as his running mate on Monday, as he runs to reclaim the office he lost to President Joe Biden in 2020.…

Trump’s VP candidate JD Vance has long ties to Silicon Valley, and was a VC himself

Hello and welcome back to TechCrunch Space. Is it just me, or is the news cycle only accelerating this summer?!

TechCrunch Space: Space cowboys

Apple Intelligence features are not available in the developer beta, which is out now.

Without Apple Intelligence, iOS 18 beta feels like a TV show that’s waiting for the finale

Apple released the public betas for its next generation of software on the iPhone, Mac, iPad and Apple Watch on Monday. You can now test out iOS 18 and many…

Apple’s public betas for iOS 18 are here to test out

One major dissenter threatens to upend Fisker’s apparent best chance at offloading its unsold EVs, a deal that would keep the startup’s bankruptcy proceeding alive and pave the way for…

Fisker has one major objector to its Ocean SUV fire sale

Payments giant Stripe has delayed going public for so long that its major investor Sequoia Capital is getting creative to offer returns to its limited partners. The venture firm emailed…

Major Stripe investor Sequoia confirms $70B valuation, offers its investors a payday

Alphabet, Google’s parent company, is in advanced talks to acquire Wiz for $23 billion, a person close to the company told TechCrunch. The deal discussions were previously reported by The…

Google’s Kurian approached Wiz, $23B deal could take a week to land, source says

Name That Bird determines individual members of a species by identifying distinguishing characteristics that most humans would be hard-pressed to spot.

Bird Buddy’s new AI feature lets people name and identify individual birds

YouTube Music is introducing two new ways to boost song discovery on its platform. YouTube announced on Monday that it’s experimenting with an AI-generated conversational radio feature, and rolling out…

YouTube Music is testing an AI-generated radio feature and adding a song recognition tool

Tesla had internally planned to build the dedicated robotaxi and the $25,000 car, often referred to as the Model 2, on the same platform.

Elon Musk confirms Tesla ‘robotaxi’ event delayed due to design change

What this means for the space industry is that theory has become reality: The possibility of designing a habitation within a lunar tunnel is a reasonable proposition.

Moon cave! Discovery could redirect lunar colony and startup plays

Get ready for a prime week of savings at TechCrunch Disrupt 2024 with the launch of Disrupt Deal Days! From now to July 19 at 11:59 p.m. PT, we’re going…

Disrupt Deal Days are here: Prime savings for TechCrunch Disrupt 2024!

Deezer is the latest music streaming app to introduce an AI playlist feature. The company announced on Monday that a select number of paid users will be able to create…

Deezer chases Spotify and Amazon Music with its own AI playlist generator

Real-time payments are becoming commonplace for individuals and businesses, but not yet for cross-border transactions. That’s what Caliza is hoping to change, starting with Latin America. Founded in 2021 by…

Caliza lands $8.5 million to bring real-time money transfers to Latin America using USDC

Adaptive is a platform that provides tools designed to simplify payments and accounting for general construction contractors.

Adaptive builds automation tools to speed up construction payments

When VanMoof declared bankruptcy last year, it left around 5,000 customers who had preordered e-bikes in the lurch. Now VanMoof is up and running under new management, and the company’s…

How VanMoof’s new owners plan to win over its old customers

Mitti Labs aims to transform rice farming in India and other South Asian markets by reducing methane emissions by 50% and water consumption by 30%.

Mitti Labs aims to make rice farming less harmful to the climate, starting in India

This is a guide on how to check whether someone compromised your online accounts.

How to tell if your online accounts have been hacked

There is a general consensus today that generative AI is going to transform business in a profound way, and companies and individuals who don’t get on board will be quickly…

The AI financial results paradox

Google’s parent company Alphabet might be on the verge of making its biggest acquisition ever. The Wall Street Journal reports that Alphabet is in advanced talks to acquire Wiz for…

Google reportedly in talks to acquire cloud security company Wiz for $23B

Featured Article

Hank Green reckons with the power — and the powerlessness — of the creator

Hank Green has had a while to think about how social media has changed us. He started making YouTube videos in 2007 with his brother, novelist John Green, at a time when the first iPhone was in development, Myspace was still relevant and Instagram didn’t exist. Seventeen years later, posting…

Hank Green reckons with the power — and the powerlessness — of the creator

Here is a timeline of Synapse’s troubles and the ongoing impact it is having on banking consumers. 

Synapse’s collapse has frozen nearly $160M from fintech users — here’s how it happened

Featured Article

Helixx wants to bring fast-food economics and Netflix pricing to EVs

When Helixx co-founder and CEO Steve Pegg looks at Daisy — the startup’s 3D-printed prototype delivery van — he sees a second chance. And he’s pulling inspiration from McDonald’s to get there.  The prototype, which made its global debut this week at the Goodwood Festival of Speed, is an interesting proof…

Helixx wants to bring fast-food economics and Netflix pricing to EVs

Featured Article

India clings to cheap feature phones as brands struggle to tap new smartphone buyers

India is struggling to get new smartphone buyers, as millions of Indians don’t go for an upgrade and continue to be on feature phones.

India clings to cheap feature phones as brands struggle to tap new smartphone buyers

Roboticists at The Faboratory at Yale University have developed a way for soft robots to replicate some of the more unsettling things that animals and insects can accomplish — say,…

Meet the soft robots that can amputate limbs and fuse with other robots