Skip to content

TensorFlow Serving ModelServer

David Salek edited this page Nov 15, 2017 · 43 revisions

This wiki describes how to serve Inception-v3 and SSD MobileNet v1 models with TensorFlow Serving.

Inception-v3 is trained for the ImageNet Large Visual Recognition Challenge using the data from 2012. This is a standard task in computer vision, where models try to classify entire images into 1000 classes. These classes are not really suitable for a security camera since human-related labels are missing (see http://image-net.org/challenges/LSVRC/2014/browse-synsets).

Links:

SSD MobileNet v1 is a small, low-latency, low-power model parameterized to meet the resource constraints of on-device or embedded application. The model used here is trained on the COCO dataset containing 100 common classes.

Links:

The model serving is done using a Docker container, running on a desktop computer. I closely followed the instructions given in the official documentation here: https://www.tensorflow.org/serving/serving_inception

The Docker container is available at https://hub.docker.com/r/salekd/inception_serving/ and it can be deployed automatically using Kubernetes. Therefore, you can skip the initial instructions on how to create such container and proceed directly to sections https://github.com/salekd/rpizero_smart_camera2/wiki/TensorFlow-Serving-ModelServer#create-kubernetes-deployment-and-service and https://github.com/salekd/rpizero_smart_camera2/wiki/TensorFlow-Serving-ModelServer#tensorflow-serving-with-mobilenet

I tried running TensorFlow Serving ModelServer on a Raspberry Pi Zero directly, but failed to install it. For more details on this endeavour, see the notes in https://github.com/salekd/rpizero_smart_camera2/wiki/Notes


Get TensorFlow Serving ModelServer via Docker

Install Docker and make sure it is running.

brew install ruby
brew cask install docker

You will need to increase the memory resources dedicated to Docker from default 2GB to something like 8GB. This is to avoid the error during building the TensorFlow Serving code with Bazel described in https://github.com/tensorflow/serving/issues/590

The memory usage can be monitored in a separate terminal by

docker stats

Copy Dockerfile.devel from https://github.com/tensorflow/serving/blob/master/tensorflow_serving/tools/docker/Dockerfile.devel and build a container:

docker build --pull -t $USER/tensorflow-serving-devel -f Dockerfile.devel .

Run the container:

docker run --name=inception_container -it $USER/tensorflow-serving-devel

Clone and configure TensorFlow Serving in the running container and build the example code:

git clone --recurse-submodules https://github.com/tensorflow/serving
cd serving/tensorflow
./configure
cd ..
bazel build -c opt tensorflow_serving/example/...

Build a ModelServer binary:

bazel build -c opt tensorflow_serving/model_servers:tensorflow_model_server

Export the Inception-v3 model and commit Docker image for deployment

curl -O http://download.tensorflow.org/models/image/imagenet/inception-v3-2016-03-01.tar.gz
tar xzf inception-v3-2016-03-01.tar.gz
bazel-bin/tensorflow_serving/example/inception_saved_model --checkpoint_dir=inception-v3 --output_dir=/tmp/inception-export

Detach from the container using Ctrl+p + Ctrl+q and commit all changes to a new image $USER/inception_serving.

docker commit inception_container $USER/inception_serving
docker stop inception_container

Push the new image to Docker Cloud.

export DOCKER_ID_USER="salekd"
docker login
docker tag $USER/inception_serving $DOCKER_ID_USER/inception_serving
docker push $DOCKER_ID_USER/inception_serving

Test running in a local Docker container

Test the serving workflow locally using the built image.

docker run -it $USER/inception_serving

Start the server:

cd serving
bazel-bin/tensorflow_serving/model_servers/tensorflow_model_server --port=9000 --model_name=inception --model_base_path=/tmp/inception-export &> inception_log &

Query the server with inception_client.py

wget https://github.com/salekd/rpizero_smart_camera/raw/master/camera.JPG
bazel-bin/tensorflow_serving/example/inception_client --server=localhost:9000 --image=camera.JPG

Create Kubernetes Deployment and Service

Install VirtualBox from https://www.virtualbox.org/wiki/Downloads

Since we wish to run a server locally, install Minikube as described here:

curl -Lo minikube https://storage.googleapis.com/minikube/releases/v0.23.0/minikube-darwin-amd64 && chmod +x minikube && sudo mv minikube /usr/local/bin/

Start Minikube:

minikube start

Install Kubernetes command-line tool, kubectl, to deploy and manage applications on Kubernetes:

brew install kubectl

The deployment consists of 3 replica of inception_inference server controlled by a Kubernetes Deployment. The replicas are exposed externally by a Kubernetes Service along with an External Load Balancer. We create them using the example Kubernetes config in https://github.com/tensorflow/serving/tree/master/tensorflow_serving/example/inception_k8s.yaml

kubectl create -f inception_k8s.yaml

To view status of the deployment and pods:

kubectl get deployments
kubectl get pods

To view status of the service:

kubectl get services
kubectl describe service inception-service

The service is running now. However, it is only accessible from your local machine. The next section describes how to enable remote access through port forwarding.

Do not forget to stop Minikube with minikube stop once the server is no longer in use, otherwise it will keep using resources.


Port forwarding

To enable port forwarding to the inception-deployment pod, we need to get its name first.

kubectl get pods
NAME                                    READY     STATUS    RESTARTS   AGE
inception-deployment-5fc48768d6-xctpf   1/1       Running   0          42m

Listen on port 9000 locally, forwarding to 9000 in the pod:

kubectl port-forward inception-deployment-5fc48768d6-xctpf 9000:9000
Forwarding from 127.0.0.1:9000 -> 9000

We can now query the service from a Raspberry Pi Zero as described in https://github.com/salekd/rpizero_smart_camera2/wiki/TensorFlow-Serving-API


TensorFlow Serving with MobileNet

This section describes how to prepare a different model for TensorFlow Serving, for example SSD MobileNet v1. The instructions below are based on the discussion here: https://stackoverflow.com/questions/45229165/how-can-i-serve-the-faster-rcnn-with-resnet-101-model-with-tensorflow-serving

wget http://download.tensorflow.org/models/object_detection/ssd_mobilenet_v1_coco_11_06_2017.tar.gz
tar -zxvf ssd_mobilenet_v1_coco_11_06_2017.tar.gz

python models/object_detection/export_inference_graph.py \
--pipeline_config_path=models/object_detection/samples/configs/ssd_mobilenet_v1_coco.config \
--trained_checkpoint_prefix=ssd_mobilenet_v1_coco_11_06_2017/model.ckpt \
--output_directory /tmp/mobilenet-export/1 \
--export_as_saved_model --input_type=image_tensor

Run the model server:

tensorflow_model_server --port=9000 —model_name=ssd_mobilenet_v1_coco --model_base_path=/tmp/mobilenet-export --enable_batching=true

You can skip the instructions above as the exported model can be found in this git repository and it is also part of this Docker image https://hub.docker.com/r/salekd/inception_serving/

Serving of SSD MobileNet v1 can be deployed directly with Kubernetes using this file https://github.com/salekd/rpizero_smart_camera2/blob/master/mobilenet_k8s.yaml

kubectl create -f mobilenet_k8s.yaml