To achieve this goal, Elder Research meticulously planned the building and migration of ML pipelines using Miro diagrams to map workflows, breaking them into smaller, manageable tasks with defined dependencies.
These tasks were tracked in Jira and executed through monthly sprints, ensuring a systematic and agile approach to implementing a comprehensive MLOps framework.
The framework allowed us to successfully deploy end-to-end ML processes on Azure Databricks and AWS Managed Airflow platforms. Azure Databricks was particularly favored over Apache Airflow for some of the ML pipelines because of the platform’s Spark-based parallel computing, which enabled faster processing and reduced costs.
We used the tools in the client’s existing environment to implement the MLOps framework. Where tools were lacking, our team recommended cost-effective options such as free open-source tools.
Collaborating closely with the ML infrastructure platform team, Elder Research integrated Amazon CloudWatch for logging and monitoring and Docker for containerization, ensuring consistent environments across development and production.
Amazon S3 was used for data storage and Redshift for optimized data warehousing. To improve robustness and maintainability, the team incorporated logging with Amazon CloudWatch and Databricks’ logging functionality, implemented error handling, and used Docker to provide consistent deployment environments.
MLFlow facilitated model versioning and lifecycle management, enhancing reproducibility.
Continuous integration and deployment (CI/CD) were established using GitHub Actions, automating the deployment and updates to production environments.
Data validations were conducted with Great Expectations, ensuring data integrity at every pipeline stage.
Our team collaborated closely with our data scientists and the client’s in-house data scientists to perform QA testing, ensuring consistent outputs and addressing discrepancies, which included fixing issues related to asynchronous data refreshes and ensuring model reproducibility through version control. Comprehensive documentation and comments were added to facilitate troubleshooting and long-term maintainability.