DEVELOPMENT OF MLOPS PIPELINE TO SUPPORT AI SYSTEMS

Authors

  • Volodymyr Rusinov NTUU "Ihor Sikorski KPI", Ukraine

Abstract

Relevance of the research topic. Scientific and engineering literature shows that the use of Machine Learning Operations and Machine Learning Pipeline (MLOps) methods in the creation of AI-platforms for embedded systems allows to improve the efficiency of the development and implementation of artificial intelligence. MLOps provides the ability to automate the processes of developing, training and operating machine learning models, providing a quick response to changes in the environment or input data. This helps to reduce the time to deploy and maintain the system as a whole.

To measure the overall usability of the system with a certain degree of accuracy we will introduce several metrics that will help us establish the parameters of the system and help us compare it with the alternative solutions.

Target setting. Deployment of AI software must be accompanied by a thorough analysis of the target system. In this article a suite of tests will be performed to quantify the performance of the system. To outline the aspects of the system that need to be modified, several DORA metrics will be compared.

Actual scientific researches and issues analysis. Current scientific research suggests that MLOps is a novel way of deployment of AI systems that inherits its methods from DevOps, that has shown positive results in various software systems. Various articles propose different variation of tools and CI/CD pipeline designs that enable AI tasks to be performed at scale and increase the overall fault tolerance of the system. These solutions also propose a method to automate any processes that would require additional human resources to achieve, such as monitoring, logging and reconfiguration.

Uninvestigated parts of general matters defining. Contemporary solutions provide minimal insight into the performance of such systems in comparison with other approaches. MLOps principles have not been put into testing in end-to-end pipelines to quantify their performance and outline areas of improvement.

The research objective. The goal of the article is to investigate the MLOps principles and quantify their impact on the deployment.

Presentation of the main material. Machine Learning Operations (MLOps) originates as a concept based on DevOps methods. It is an attempt to create a collection of procedures that combine tools, deployment procedures, and workflows to quickly and affordably build machine learning models [1]. MLOps recommends automating and controlling every step of creating and implementing machine learning systems. [2] MLOps practices are perceived as a complex coordination of a set of software components that are used in an integrated manner to perform at least five tasks: data collection; data conversion; continuous learning of the machine learning model; continuous implementation of the model; and presentation of results to the end user. This is achieved by using MLOps principles.

Conclusion. Machine Learning Operations is a DevOps concept that is proven to create a streamlined process for creating machine learning models. It involves automating and monitoring each step of the process, including data collection, transformation, training, implementation, and end-user results presentation. To ensure the effectiveness of MLOps, organizations should choose appropriate tools and technologies, such as cloud platforms like AWS or Microsoft Azure.

Continuous integration and Continuous Delivery are a crucial for efficient deployment and testing for application readiness. It automates the process of deployment through creation of a pipeline that continuously tests and builds ML software systems. Machine Learning (ML) systems require an experimental model, which is analyzed, cleaned, and pre-processed before training, testing, and verification. The deployment phase ensures model integration and stability, with continuous monitoring by MLOps engineers.

Testing is performed using DORA metrics. A comparison of the solution proposed in this article with other solutions shows. The results table compares metrics for a modified approach, focusing on GitHub upload and lead time to changes, comparing them for further testing. The proposed method is less sophisticated than its alternatives, however it is more lightweight therefore requiring less time to build and reconfigure after faults.

Uninvestigated parts. To increase the efficiency of this approach, additional research may be required in the fields of container optimization, AI optimization as well as networking and administration. This article aims to explore the ways MLOps is used and to propose a solution that is lightweight and retains the most important properties and adheres to the outlined principles. However, it is important to note that in real-life scenarios, additional changes may be required to increase the fault tolerance of the system and to increase its capabilities to scale.

Published

2024-09-28

Issue

Section

Machine learning, Big Data