A paradigm shift in the accessibility of Artificial Intelligence (AI) driven Computer Vision technologies means now is the time for business leaders to consider how these can deliver competitive advantage.
What used to be the domain of expensive dedicated products is fast becoming a standardised workflow, supported by huge bets from Cloud services providers. This, together with access to cheap, powerful hardware, is driving a reduction in costs and complexity of AI implementations, enabling organisations to efficiently deploy computer vision tools targeted at their unique challenges.
This article aims to help tech-oriented business leaders identify applications for AI-driven computer vision, in particular, those real-time applications that are enabled by the new generation of powerful, accelerated AI hardware. Techniques to deal with large data sets and the application of Predictive Analytics will be discussed in a later article.
Paradigm shift – ‘train your own’ AI
Instead of buying ‘off the shelf’ dedicated AI products for a specific problem (people counting, product sorting, etc.), organisations can now think more generally about a problem they’d like to solve and train their own model using ‘machine learning’ services. Where this used to involve specialised development, it can now leverage defined processes supported by all the major Cloud service providers. Amazon, Microsoft Azure, and Google have all be investing heavily in tooling to make the process of training and deploying AI models as easy to access as any other Cloud service. The goal is to provide flexible, adaptable solutions, targeted at your application, at significantly reduced costs.
The machine learning approach to Computer Vision solutions begins with the same set of general requirements. You’ll need some way of collecting raw data through:
- Video/image capture (fixed or mobile camera, or even a drone)
- Local or Cloud processing to support model training
- Tooling and hardware to deploy and run the AI model, and (probably)
- Some support from a provider experienced in integrating AI solutions
What $200 of hardware can do
We’ve been experimenting with Google’s beta Coral accelerator hardware in combination with a Raspberry Pi and camera module for a few months. With the Coral programme coming out of beta recently, it is now possible for anyone to run powerful, hardware-accelerated AI computer vision applications on hardware costing under $200. Smart (‘deep-learning’) camera systems are also available from Amazon (AWS DeepLens) and Microsoft (Vision AI Developer Kit).
Using smart, local ‘edge’ devices means that the need for high bandwidth connections to servers or cloud infrastructure (and costly server-side processing) is greatly reduced. For instance, if an AI-enabled camera can count pedestrian movements using local hardware it only needs to send tiny snippets of data (pedestrian movements in a given period) through an inexpensive IOT channel, rather than stream video continuously to a server. Another key advantage of performing machine learning ‘inferences’ locally is keeping user data private – there is no need to stream or store personally identifying video information. Collectively, this shift to powerful, low-cost field devices adds up to a step-change in the industry.
In the video below you can see cars captured and tracked in real-time; note that each individual vehicle is recognised as a unique instance that could be tracked across multiple video feeds. The algorithm can be updated and retrained to add tracking of bicycles, pedestrians, or any other object. Adding additional functionality such as counting vehicles heading in each direction becomes straight forward once objects can be reliably identified.
Figuring out potential applications
Imagine a pool of resources, able to observe interactions, conduct quality control, and collect data, as well as raising alerts and providing inputs to other systems. Furthermore, the resources are reliable, fast to train and happy to work 24 hours a day for free. These types of AI-driven computer vision technologies have been successfully deployed for several years, demonstrating that what a human can visually detect and react to in many environments can be automated and, in many cases, improved on. In the past, these have only been available to those with significant budgets. Now the playing field has changed and this is essentially available to anyone.
When evaluating potential use cases in your context, the broad question is:
This could cover a wide range of applications so it’s useful to list a few potential use cases:
- Tracking and counting people, vehicles, or other objects moving through a scene
- Grading product quality, or detecting flaws
- Detecting scenarios that increase the probability of an event or accident occurring
- Estimating physical characteristics such as size, weight or velocity of objects
- Detecting specific events, and pushing a notification to other systems
As well as new applications it’s worth considering options to add intelligence to existing products or processes. For instance, if you build monitored security cameras you might want to add smarts to highlight those feeds where people or vehicles are present.
More specific examples include:
- Detection of traffic accidents at intersections
- Recognising altercations or suspicious activity (e.g. shoplifting) from security video feeds
- Monitoring distraction for drivers or machinery operators
- Face or voice recognition for identity verification – see examples of MS cognitive services
- Licence plate recognition
- Crop or stock monitoring from drone feeds in agricultural applications
- Species recognition, including weed, disease or pest detection
If you can see potential opportunities, it’s useful to evaluate your current infrastructure and figure out if you can leverage existing models. If you’re training a new model it helps to have plenty of recorded examples of what you want to detect.
- Do you have existing video data or live streams from existing infrastructure?
If you’re already set up with raw data feeds you can leverage these and potentially don’t need to install new devices.
- Can you utilise an existing trained data-set?
If you want to detect or track something common like people or cars, chances are you can access an existing model. If you want to look for blemishes on a specific type of fruit you may need to train your own model.
- If you decide to build your own model, do you have (or can you capture) plenty of examples of the object you want to track or the event you want to detect in order to build a training data-set?
Note that recognition techniques in machine learning are not limited to video; similar approaches can be applied to audio data or other data sets, recognising any kind of recurring pattern. You can find more application examples at Google’s coral.ai site here.
Standard trained models are very good at recognising particular object(s) or object characteristics in video or image data. Sometimes additional image processing techniques are required; for instance, standard object recognition may be good for identifying license plates in an image, but you may want to apply a different algorithm to automate recognising the actual license plate number. In these cases, it may be necessary to ‘overlay’ different techniques to extract the information that’s most valuable, including leveraging conventional image processing techniques.
At Intranel, we’re working on generalising not just ‘object’ recognition, but tracking individual instances of moving objects, and ‘event’ recognition – being able to detect defined events in the video. This might include an abnormal event in a production environment, a collision between vehicles, or specific interactions between people.
Integrating AI with other systems
AI vision systems generate data so you’ll also need a plan to optimally use this – how will it integrate with your existing business processes and systems? This is an area your software team or Intranel can help with as ultimately the data is just another system input.
Computer vision AI systems may generate large amounts of data that may require other AI techniques to extract maximum value – e.g. predictive analytics. This topic will be discussed in a later article.
- Resolving challenges around data management and maturity
- Extracting ‘hard’ data from Computer vision AI models – e.g. people or vehicle counting or tagging specific events
- UX, dashboards, and visualisations
- Integrations between Computer Vision, IoT and cloud systems
- Integration with client-facing web and mobile services
AI tooling for computer vision applications is going mainstream, with the costs and complexity of hardware and applications around the technology shrinking rapidly.
With modest budgets, organisations can now train and deploy dedicated computer vision algorithms exactly matched to their application(s). Previously they were limited to off the shelf products or major internal product development.
Applications are broad and not always immediately obvious, which means business leaders need to carefully consider potential use cases within their sector.
We’ll discuss in a later article the case for AI services in realising value from large data-sets through predictive analytics and other techniques.