Who’s afraid of Machine Learning? Part 4 : Going Mobile! ML-Kit Why & How?

Intro to ML & ML-Kit for mobile developers

Britt Barak
Google Developer Experts

--

Previous posts gave some idea about machine learning, and how does it work. Now it’s time to start applying it as mobile developers!

For some orientation, in this series:

  1. Part 1: What do they all talk about?
  2. Part 2: How to create a machine that learns?
  3. Part 3: About that learning..
  4. Part 4: Going Mobile! ML-Kit why and how? (← you are here 🍓)
  5. Part 5: Using a Local Model (coming soon ✨)
  6. Part 6: Using a Cloud Model (coming soon ✨)
  7. Part 7: Using a Custom Model (coming soon ✨)

Why On Device?

If you read my previous posts, you already know that I love strawberries 🍓. Also, of course, I love mobile apps 📱!!

Next, I will apply a strawberry detector machine learning model (as previously described) in a mobile app. Before we discuss how to do it, there’s an important question:

Why do we need to apply a model on a device? 🤔

I mean, as mobile developers we know that mobile device resources are very limited. Often we tend to ask the server side to do complex computation and heavy lifting for us. So why not do the same here? Why not send an image to the server, let the server do the processing, run the model, and send a result back to the client?

There are a few drawbacks to running the model that way, on cloud only. To name a few:

  • Latency: even if the processing is extremely fast, sending the data to the server and waiting for the response would take some time. What if need real time processing? A use case, for example, can be detecting face features during a video call. By the time we send one frame to the server and wait for the result, it’s likely to be irrelevant for the user, as that frame is not visible anymore.
  • Bandwidth: often bandwidth is expensive. Even more so if the data we send is heavy, such as photos.
  • Offline support: mobile devices often roam, the network quality varies, and often they get disconnected. An app that is unable to function without internet connection often provides a bad user experience.
  • Performance: extensive network requests are the very likely defendant for battery drain.
  • Privacy and Security: it makes sense that the user would appreciate keeping locally private data, such as personal photos, medical information, financial details and more. Otherwise, there would be many security challenges we would need to consider

Hopefully, those are enough to convince you that finding an on-device mobile-tailored solution is worthwhile.

from the official website

Enters ML-KIt!

One of the most exciting announcements of Google I/O 2018, for me, was MLKit. Not too long ago, ML used to sound like a magic word 🧞‍️. What can it be used for and how were quite vague for many of us for a while?

MLKit takes some common ML use cases and wraps them up with a nice API. In addition, if you have a specific use case, a custom model can be used as well. Then, we’d be able to do version management and define under which conditions model updates will be downloaded to the devices, without needing to roll out a new .apk. Almost needless to add that it’s cross-platform, so can be used on both Android & iOS.

MLKit is still on beta, but extremely cool and fun to use! I highly recommend starting playing around and figure out which use cases can be beneficial for your product.

Among the out-of-the-box use cases we can find some everyday cases. Currently they all have to do with processing and conclusion making from images. Therefore, we’ll find them all under FirebaseVision library.

Each model is supported by on-device and/or on the cloud.

MLKit use cases:

  • Image labeling [on device & cloud] — pretty much what the did so far. Taking an image, and detecting entities it contains, such as objects, animals, fruits, activities and more. You can use these labels in order to perform content moderation, filtering or a search. Also, being able to work with metadata rather than an entire photo can help with matters of bandwidth, offline support, privacy and more, as discussed above. For example, on a chat app, being able to send only labels and not an entire photo, can benefit a lot.
  • Text recognition [on device & cloud] — detecting the text from an image. It can be used for photos of signs, labels, documents and so on… In addition to the benefits mentioned above of working with metadata rather than images, it is very useful for many use case. For example: pairing text to speech services for a11y; performing translation for i10n; or using sentiment analysis service to customize the experience; The API is quite nice as it allows us to know the location of each word, so you can ask: what is the second word in the third sentence, for example.
  • Face detection [on device] — not to be confused with face recognition, which can recognize who is the person in the image, or knowing that we see the same person on multiple photos. This is about detecting face features, their position on the screen, their angle, whether the mouth is smiling and so on.. the processing is on-device and is quite fast so that it works nicely to process frames for real-time video chat or games.
  • Barcode Scanning [on device] — extracts the fields that a barcode encapsulates, from an image. Once again, working with just the metadata and not a full image can be such a great advantage for so many use cases.
  • Landmark recognition [cloud] — can tell you details about famous landmarks that are in the photo.
  • Custom models [on device] — whichever TensorFlow Lite model you have. Will be discussed later.

It’s nice to notice that “detection” usually means to find some object on the photo, with details about the object as it appears on the photo (like face detection that recognises the features of the face and where are they positioned on the photo). Whereas “recognition” means to find an object, and understand it in a broader context, outside of the specific photo. (Like landmark recognition, where a photo is given a context of a spot in the world. Or the face recognition that MLKit doesn’t do, of knowing the person that this face belongs to, or matching it as the same person on other photos)

Let’s continue with our Strawberry-Or-Not app example. Let’s see how to implement it with MLKit with 3 models:

  • Local model, from the out-of-the-box use case
  • Cloud-based model, from the out-of-the-box use case
  • Custom TensorFlow Lite model

Each use case, either a local or an on the cloud, is implemented by a different model. Meaning, it expects a different input, it produces a different output and… it works differently! so expect different results.

It might sound scary.. but it’s really not. Using each model basically consists of 4 steps:

1. Setup the detector

2. Process the input

3. Run the model

4. Process the output

Our Demo App

In the upcoming posts, we’ll explore how we can classify an image with MLKit in 3 different ways: using a local model, a cloud model, and a custom model. I’ll elaborate later on each. I’ll show how to use these models using a simple demo app.

Our app will hold a list of static images. After choosing an image, clicking a button below will run the corresponding model in order to classify the image.

The model outputs possible labels for the image, we‘ll display the top 3 labels with the highest probability.

To have our mind organized, let’s talk about a bit about the structure of the demo app we’ll write on the upcoming posts:

The UI (MainActivity) sends the image (Bitmap) to ImageClassifier class, which will send the image to a specific Classifier for classification (LocalClassifier , CloudClassifier or CustomClassifier). Each Classifier is the only class that knows the respected ML model. It will process the input, run the model, process the output if needed, and then send the result to ImageClassifier that will prepare it to make it as easy as possible for the UI to display.

  • Note: in this demo, the user asks for a specific model to run. Often, on a real production use case, ImageClassifier can infer which model to use, according to network state and other factors.

Perfect! This is the structure. On the next posts, we’ll implement the image classification using the different models. Join me there!

--

--

Britt Barak
Google Developer Experts

Product manager @ Facebook. Formerly: DevRel / DevX ; Google Developer Expert; Women Techmakers Israel community lead.