Description
FEATURES:
ML Image Identifier is an educational app that allows your iOS device to identify images in real-time, as you move the camera around your environment. It can scan for 3 categories of images ("Objects", "Cars", and "Food") and recognize "Text" (character boxes, OCR) and "People" (facial landmarks, upper bodies, facial segmentation, depth map).
The app automatically throttles the image processing to work on older devices.
For the categorized images, the app displays the top-5 predicted matches, based on the neural networks' confidence levels as percentages.
BACKGROUND:
Once merely a subject of science-fiction, machine learning has permeated our lives in recent decades. We see it in numerous uses, such as handwriting recognition, facial recognition, image tagging, AI in games, targeted advertisements, predictive typing, and many automated tasks. Social networks are free because the data you provide (e.g. posts, surveys, photos, etc.) can be valuable for numerous purposes, turning the users into the products to sell. In short: Knowledge is power.
With the release of iOS 11, Apple brought machine learning to the masses with CoreML, making it possible to run neural networks and other ML-related tools via hardware acceleration on any iOS device. Each subsequent iOS version added to the featureset.
This app is a demonstration of some possibilities - and some deficiencies - of machine learning. Modeling a neural network is only one part of the task. For a ML model to work, it must be fed massive amounts of test data, similarly to how it takes a living creature numerous stimuli to learn. Good test data can yield good results; poor test data can yield poor results. Sometimes, biases of those creating the tests can come into play, since they may unknowingly weigh certain test values over others.
SPECIFICS:
ML Image Identifier makes use of 3 ML models (all MIT- or Apache- licensed) and Apple's own Vision framework to serve as examples:
"MobileNet" - This scans general objects. It works fairly well with household items. It cannot identify people. This ML model is an example of fairly high-quality results in image recognition and is much more compact than similar ML models that can be as large as 500MB.
"CarRecognition" - This scans for makes and models of vehicles. It is very hit-or-miss and seems to heavily match automobiles from specific regions of the world. Most matches are the right body type but wrong make. This ML model is an example of mixed results in image recognition.
"Food101" - This scans for prepared foods. It rarely works with general food items and seems to focus on foods that most people will not have in their houses, such as caviar and lobster. It also returns many false-positives for desserts. This ML model is an example of poor results in image recognition when used outside of very specific cases.
The "Text" mode looks for all potential text in view and highlights the words and individual characters in those words for easy viewing. It displays the top 5 rows of text by descending height.
The "People" mode looks for all potential human or human-like faces. Of those found, the app highlights the facial landmarks, such as eyes, nose, jawline, etc. This mode in particular works better on a newer device at a usable framerate, due to the hardware required for real-time image processing. It also supports upper torso detection (back camera) and facial segmentation or depth map (front camera).