MLOps - Continous Delivery for Machine Learning Models using Azure and Github Actions

Harshit Singhai

July 23, 2021

In this article, we're going to take a look at how to automatically package a machine learning model using Azure and Github Actions.

Github Code

Before we start, the entire code is uploaded here for your reference.

https://github.com/harshitsinghai77/machine-learning-model-continuous-delivery-using-azure

Packaging ML Models

We will be using ONNX model and package it within a container that serves a Flask app that performs the prediction. I will use the RoBERTa-SequenceClassification ONNX model, which is very well documented. After creating a new Git repository, the first step to work with is to figure out the dependencies needed. After creating the Git repository, start by adding the following requirements.txt file:

simpletransformers==0.4.0
tensorboardX==1.9
transformers==2.1.0
flask==1.1.2
torch==1.7.1
onnxruntime==1.8.1

Next, create a Dockerfile that installs everything in the container:

FROM python:3.8

COPY ./requirements.txt /webapp/requirements.txt

WORKDIR /webapp

RUN pip install -r requirements.txt

COPY webapp/* /webapp/

ENTRYPOINT [ python ]

CMD [app.py]

The Dockerfile copies the requirements file, creates a webapp directory and copies the application code, a single app.py file. Create the webapp/app.py file to perform the sentiment analysis.

from flask import Flask, request, jsonify
import torch
import numpy as np
from transformers import RobertaTokenizer
import onnxruntime

app = Flask(__name__)
tokenizer = RobertaTokenizer.from_pretrained("roberta-base")
session = onnxruntime.InferenceSession(
    "roberta-sequence-classification-9.onnx")

def to_numpy(tensor):
    return tensor.detach().cpu().numpy() if tensor.requires_grad else tensor.cpu().numpy()


@app.route("/predict", methods=["POST"])
def predict():
    """
    Input sample:
        [ "Containers are good" ]
    Output sample:
        {"postive": True}
    """
    input_ids = torch.tensor(
        tokenizer.encode(request.json[0], add_special_tokens=True)
    ).unsqueeze(0)

    inputs = {session.get_inputs()[0].name: to_numpy(input_ids)}
    out = session.run(None, inputs)

    result = np.argmax(out)

    return jsonify({"positive": bool(result)})


if __name__ == "__main__":
    app.run(host="0.0.0.0", port=5000, debug=True)

The predict() function is a Flask route that enables the /predict URL when the application is running. The function only allows POST HTTP methods. There is no description of the sample inputs and outputs yet because one critical part of the application is missing: the ONNX model does not exist yet.

Download the RoBERTa-SequenceClassification ONNX model from here locally, and place it at the root of the project.

The ONNX model exists at the root of the project, but the application wants it in the /webapp directory, so move it inside that directory so that the Flask app doesn’t complain

$ mv roberta-sequence-classification-9.onnx webapp/

Create a new virtual environment, activate it, and install all the dependencies:

$ python3 -m venv venv
$ source venv/bin/activate
$ pip install -r requirements.txt

Now run the application locally by invoking the app.py file with Python:

$ cd webapp
$ python app.py
* Serving Flask app &quot;app&quot; (lazy loading)
 * Environment: production
   WARNING: This is a development server.
   Use a production WSGI server instead.
 * Debug mode: on
 * Running on http://0.0.0.0:5000/ (Press CTRL+C to quit)

Next, the application is ready to consume HTTP requests. So far, I’ve not shown what the expected inputs are. These are going to be JSON formatted requests with JSON responses. Use the curl program to send a sample payload to detect sentiment:

curl -X POST -H "Content-Type: application/JSON"
--data '["Containers are more or less interesting"]'
http://0.0.0.0:5000/predict

{ "positive": false }

curl -X POST -H "Content-Type: application/json"
--data '["MLOps is critical for robustness"]'
http://0.0.0.0:5000/predict

{ "positive": true }

Optional: Create a Makefile at the root of the project to check model inference. https://github.com/harshitsinghai77/machine-learning-model-continuous-delivery-using-azure/blob/master/Makefile

Build it locally with Docker

Now that you’ve verified that the application runs and that the live prediction is functioning properly, it is time to create the container locally to verify all works there. Create the container, and tag it with something meaningful:

docker build -t deploy/roberta .

Now run the container locally to interact with it in the same way as when running the application directly with Python. Remember to map the ports of the container to the localhost:

$ docker run -it -p 5000:5000 --rm alfredodeza/roberta
 * Serving Flask app &quot;app&quot; (lazy loading)
 * Environment: production
   WARNING: This is a development server.
   Use a production WSGI server instead.
 * Debug mode: on
 * Running on http://0.0.0.0:5000/ (Press CTRL+C to quit)

Send an HTTP request in the same way as before, you can use the curl program again

make check_inference

Continuous Delivery of ML Models

The contents of the project should be pushed in Git repository without the ONNX model.

This is because the ML/DL model are usually very large and shouldn't be uploaded in a version control system. Git is not meant to handle versioning of binary files and has the side-effect of creating huge repositories because of this.

All the heavy lifting to perform the (local) live inferencing is done, so create a new Github repository and add the project contents except for the ONNX model. Remember, there is a size limit for files in Github, so it isn’t possible to add the ONNX model into the Github repo. Create a .gitigore file to ignore the model and prevent adding it by mistake:

Create a .gitignore file in the root of the directory

*onnx

Once registered you will get something like this

After pushing the contents of the Git repository without the ONNX model, we are ready to start automating the model creation and delivery.

Next, since the ONNX model doesn’t exist locally, we need to retrieve it from Azure, so we must authenticate using the Azure action. After authentication, the az tool is made available, and we must attach the folder for my workspace and group. Finally, the job can retrieve the model by its ID.

To do this, we will use Github Actions, which allows us to create a continuous delivery workflow in a YAML file, that gets triggered when configurable conditions are met. The idea is that whenever the repository has a change in the main branch, the platform will pull the registered model from Azure, create the container, and lastly, it will push it to a container registry.

Start by creating a .github/workflows/ directory at the root of your project, and then add a main.yml that looks like this:

name: Build and package RoBERTa-sequencing to Dockerhub

on:
  # Triggers the workflow on push or pull request events for the main branch
  push:
    branches: [master]
    paths:
      - "**.py"

  # Allows you to run this workflow manually from the Actions tab
  workflow_dispatch:

jobs:
  build:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v2

      - name: Authenticate with Azure
        uses: azure/login@v1
        with:
          creds: ${{secrets.AZURE_CREDENTIALS}}

      - name: set auto-install of extensions
        run: az config set extension.use_dynamic_install=yes_without_prompt

      - name: attach workspace
        run: |
          az extension add --name azure-cli-ml
          az ml folder attach --workspace-name "machine-learning-deployment" --resource-group "cloud-shell-storage-centralindia"
      - name: retrieve the model
        run: az ml model download -t "." --model-id "roberta-sequence:2"

      - name: Authenticate to Docker hub
        uses: docker/login-action@v1
        with:
          username: ${{ secrets.DOCKER_HUB_USERNAME }}
          password: ${{ secrets.DOCKER_HUB_ACCESS_TOKEN }}

      - name: build flask-app container
        uses: docker/build-push-action@v2
        with:
          context: ./
          file: ./Dockerfile
          tags: harshitsinghai77/flask-roberta:latest
          push: true

Steps to package the RoBERTa-Sequence model:

Checkout the current branch of the repository
Authenticate to Azure Cloud
Configure auto-install of Azure CLI extensions
Attach the folder to interact with the workspace
Download the ONNX model
Build the container for the current repo

Configure GitHub Actions with Azure Machine Learning

Since the ONNX model doesn’t exist locally, we need to retrieve it from Azure, so we must authenticate using the Azure action. After authentication, the az tool is made available, and we must attach the folder for my workspace and group. Finally, the job can retrieve the model by its ID.

Steps to configure Github Actions to pull the registered model from Azure.

Generate deployment credentials:

To get the credentials, go to Azure Cloud Shell

az ad sp create-for-rbac --name &quot;myML&quot; --role contributor --scopes /subscriptions/&lt;subscription-id&gt; resourceGroups/&lt;group-name&gt; --sdk-auth```

The output is a JSON object with the role assignment credentials that provide access to your App Service app similar to below. Copy this JSON object.

Github App Secrets

In GitHub, browse your repository, select Settings > Secrets > Add a new secret. Paste the entire JSON output from the Azure CLI command into the secret's value field. Give the secret the name AZURE_CREDENTIALS.

Commit and push your changes to your repository and then head to the Actions tab. A new run gets immediately scheduled and should start running in a few seconds. After a few minutes, everything should’ve completed.

There is one final item missing here, though, and that is to publish the container after it builds. Different container registries will require different options here, but most do support Github Actions which is refreshing. Docker Hub is straightforward, and all it requires is to create a token and then save it as a Github project secret, along with your Docker Hub username. Once that is in place, adapt the workflow file to include the authentication step before building:

- name: Authenticate to Docker hub
  uses: docker/login-action@v1
  with:
    username: ${{ secrets.DOCKER_HUB_USERNAME }}
    password: ${{ secrets.DOCKER_HUB_ACCESS_TOKEN }}

Lastly, update the build step to use push: true.

That is

- name: build flask-app container
    uses: docker/build-push-action@v2
    with:
        context: ./
        file: ./Dockerfile
        tags: harshitsinghai77/flask-roberta:latest
        push: true

Conclusion

Once everything is configured correctly, upload the code in your Github repository. On every pull request to master branch, Github Actions will authenticate azure, fetch the ML model from Azure Machine Learning Studio, create a container and push the container to DockerHub.

Example: https://github.com/harshitsinghai77/machine-learning-model-continuous-delivery-using-azure/actions

https://hub.docker.com/repository/docker/harshitsinghai77/flask-roberta

Github Code

https://github.com/harshitsinghai77/machine-learning-model-continuous-delivery-using-azure

That’s it for today, see you soon. :)

Fictionally Irrelevant.

MLOps - Continous Delivery for Machine Learning Models using Azure and Github Actions

Github Code

Packaging ML Models

Build it locally with Docker

Continuous Delivery of ML Models

Steps to package the RoBERTa-Sequence model:

Configure GitHub Actions with Azure Machine Learning

Github App Secrets

Conclusion

Github Code