Tracking Machine Learning Experiments with MLflow

November 29, 2022

Introduction

 (MLOps), which is similar to Developer Operations (DevOps) in software engineering, are becoming more common in organizations that want to build and deploy end-to-end machine learning models. Before MLops became popular, engineers struggled to create reproducible and shareable machine learning solutions. As a result, collaboration between machine learning teams became difficult.

The  of MLops was accompanied by a suite of tools that addressed these issues faced by engineering teams early on. These tools enabled teams to track, orchestrate, deploy, and monitor machine learning models in production, as well as identify and correct problems. Making machine learning in software applications much easier to implement.

In this article, you will look at experiment tracking with MLflow under the following headings:

  • What is Experiment Tracking?
  • Tools for Experiment Tracking.
  • Code Demo: Tracking Experiments using Python and MLflow.

And the following prerequisites are required to follow along with this tutorial:

  • Familiarity with Python.
  • The ability to create a model with Scikit-Learn.

Let’s get started.

What is Experiment Tracking?

 is the process of logging/recording parameters, metrics, and metadata created and/or used when developing a model to facilitate the reproduction of models and collaboration between teams.

What is the significance of experiment tracking? Unlike software engineering, which involves writing code to create solutions, machine learning deals with both code and data. When data is involved, machine learning becomes an experimental process that requires many iterations to produce good results. Scientists/engineers may forget the hyperparameters and data used to arrive at a particular model while experimenting, making it difficult to confirm what went wrong and what went right.

Engineers can use experiment tracking to gain insight into why a model has a specific metric/result based on its , and then tweak it to produce the desired result or change their approach entirely.

There are several methods for tracking machine learning, including:

  • Using a spreadsheet.
  • For each experiment, use a Git branch.
  • Making use of a specialized experiment tracking tool.

Using a dedicated experiment tracking tool is the best option because it requires less effort, has a UI for easily visualizing results, and handles the majority of the heavy lifting.

Here are a few of the many experiment tracking tools available on the market.

Tools for Experiment Tracking

Engineers commonly use the following tools to track machine learning experiments:

  •  — used by Facebook, Databricks, Microsoft, Accenture, Booking.com, among others.
  • Comet.io — used by Uber, Shopify, Etsy, AssemblyAI, Zappos.com, among others.
  •  (WanB) — used by Pfizer, Valo health, Roche, among others.
  • Neptune.io -used by Continuum Industries, Hypefactors, ReSpo.Vision, among others.

Code Demo: Tracking Experiments using Python and MLflow

MLflow is an open-source platform for developing and managing end-to-end machine learning solutions. It currently has four components:

When using the MLflow tracking component, it records the following:

  • Metrics: Values that indicate the success of your model, such as mean squared error (MSE)
  • Parameters: Values that control your model’s learning process.
  • Artifacts: Related files used in the development of your models, such as a YAML or pickle file.
  • Start and End time: The start and end run time of your code.

Let’s jump into code.

First, get the data ;

The dataset is a popular iris dataset that divides iris flower species into three classes based on sepal and petal width and length.

Next, train the model using LogisticRegression.

import numpy

import pandas as pd

from sklearn.model_selection import train_test_split

from sklearn.linear_model import LogisticRegression

from sklearn.metrics import accuracy_score, precision_score

df = pd.read_csv(‘Iris.csv’)

df = df.drop(columns = [‘Id’])

X = df.drop(columns=[‘Species’])

y = df[‘Species’]

y =y.map({

‘Iris-setosa’ : 0,

‘Iris-versicolor’: 1,

‘Iris-virginica’ : 2

})

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42)

model = LogisticRegression()

model.fit(X_train, y_train)

y_pred = model.predict(X_test)

precision = precision_score(y_test,

y_pred,

average=’weighted’)

accuracy = accuracy_score(y_test,

y_pred)

print(f’Precision Score — {precision} \nAccuracy Score — {accuracy}’)

Output.

Precision Score — 1.0

Accuracy Score — 1.0

The preceding code follows the standard pattern for writing code to create a baseline model.

  • The required libraries have been imported.
  • The data is then read using Pandas, and unwanted columns are removed (Id). The data is split into features and labels.
  • The label’s categorical values are converted to numerical values. Train and validation sets of features and labels are created.
  • A learning algorithm instance is created and fitted to the training samples. Using the test set, the model generates predictions.
  • The model’s performance is evaluated by comparing the true and predicted values.

According to the above results, the model produced a precision and accuracy score of 1.0, indicating that the model is 100% accurate. Because the sample size is small, we may have reason to doubt this result.

Next, tune the model hyperparameters and re-evaluate the accuracy and precision score.

# model hyperparamters

penalty = ‘l1’

solver = ‘liblinear’

C = 0.8

model = LogisticRegression(penalty=penalty, solver=solver, C=C)

model.fit(X_train, y_train)

y_pred = model.predict(X_test)

precision = precision_score(y_test,

y_pred,

average=’weighted’)

accuracy = accuracy_score(y_test,

y_pred)

print(f’Precision Score — {precision} \nAccuracy Score — {accuracy}’)

Output

Precision Score — 1.0

Accuracy Score — 1.0

The code above tunes the hyperparameters for the model, setting the:

  • Penalty/regularization to l1
  • Solver to liblinear due to small sample space
  • C to 0.8 to enforce a bit strong regularization.

The result is the same. It yields a precision and accuracy score of 1.0 (100%). Then, using MLflow, track the hyperparameters and metrics.

Install MLflow with pip install mlflow and import it into your code to get started.

A tracking uri, or a storage location containing all the experiments, is required for MLflow to recognize your runs and record your experiments locally.

To begin, create an experiment and set the tracking uri.

import mlflow

# setting up experiments

mlflow.set_tracking_uri(‘mlruns’)

mlflow.set_experiment(‘mlflow-tutorial’)

The code above creates an experiment with the name mlflow-tutorial and sets the location for the experiment values to mlruns , which is the default location.

Next, go on to log the model’s metrics and paramters

with mlflow.start_run():

# model hyperparamters

penalty = ‘l1’

solver = ‘liblinear’

C = 0.8

model = LogisticRegression(penalty=penalty, solver=solver, C=C)

model.fit(X_train, y_train)

y_pred = model.predict(X_test)

precision = precision_score(y_test,

y_pred,

average=’weighted’)

accuracy = accuracy_score(y_test,

y_pred)

# logging the model’s metric

mlflow.log_metric(‘precision score’, precision)

mlflow.log_metric(‘accuracy score’, accuracy)

#logging the model’s hyperparameters

mlflow.log_param(‘penalty’, penalty)

mlflow.log_param(‘C’, C)

mlflow.log_param(‘solver’, solver)

To view the results of this run, go to your terminal and type mlflow ui in your project directory with the mlruns folder.

Output.

There are two experiments in the image above, one recorded twenty-two minutes ago and the other two minutes ago. The metrics and parameters generated and used for each run are assigned to it.

That’s it for experiment tracking with MLflow and Python.

Conclusion

Proper recording and tracking of machine learning experiments, as demonstrated above with MLflow, will assist developers and organizations in fast-tracking model building and deployment by allowing for easy replication of machine learning models and collaboration among machine learning teams.

If you want to learn more about MLflow, check out the .

Also, if you want to learn more about machine learning operations,check out the following resources: