Imagine walking through the streets of Brussels as a tourist…
The first thing you’re looking for in the city is a tourist office to find more information about things to visit. But what if you could find all this information by talking to a virtual assistant on a smart billboard? That’s what I’m trying to build during my internship at Craftworkz.
When walking by the smart billboard a virtual assistant, called “Billy”, will try to get your attention and give recommendations about events, promotions, restaurants, sights to visit, ... by using machine learning and visual recognition to assess your personal profile based on parameters like gender, age category, color of clothes…
Communication with the smart billboard will be possible through speech-to-text (SST) and text-to-speech (TTS) technology, so all kind of questions can be asked about the city.
The image below presents a visual representation of the project.
The project itself is divided in three parts:
- API → Getting all the information
- Chatbot → Interaction
- iOS app → Visual/Voice recognition and interface
API
For my API I made a Kitura app. Kitura is a server-side Swift framework designed by IBM based on express.js. My experience with this framework so far has been great. The only downside I can point out is the lack of documentation online. There is, however, a Slack channel called Swift@IBMwhere developers of the framework help you almost instantly!
The API makes calls to different external API’s (Google Places, Wolfram Alpha, Google Distance Matrix, Wikipedia and Apixu) and returns useful information about places, weather, facts, … .
I ended up deploying my Kitura API on IBM Cloud. And since a Kitura API is recognized by IBM Cloud, you get all the API Management features, including authentication, which I used to secure my API.
Chatbot “Billy”
For the actual billboard-human interaction I opted for a chatbot approach using the Craftworkz in-house developed Oswald conversational framework. “Billy”, will talk with my API to get the data to the users and enable interaction.
For more information about Oswald go to the Oswald website.
iOS app
Machine Learning
When iOS 11 came out on WWDC 2017, Apple introduced a new framework, called CoreML. This framework gives you the possibility to use pre-trained machine learning models which can be used by your application. You can find a list of downloadable CoreML models on Apple’s website. These models can be dragged and dropped in your Xcode project and are then ready to be used in your app.
Since CoreML is fairly new, most of the open source models online are quite limited and not always as accurate. So, to make a more accurate model, I had to make one myself using Turi Create. This is a Python library developed by Apple to make training machine learning models easier. It supports exporting to a CoreML model by default.
To do this, I had to download a dataset for my model. Then I followed a tutorial to figure out how to create my first CoreML model. For training my models I used the resnet-50 neural network, since it keeps the model size down and is still quite accurate. I then ended up with two working models, one for the gender and one for the age category.
The 20th of March 2018, IBM came with some good news! They announced IBM Watson Services for Core ML, a collaboration with Apple, to help making your own CoreML models easier. Since my CoreML models could still be improved I will use this service to recreate all my models. The big advantages of this service are the computing power of Watson and the ability to let your models train continuously and to let them improve over time. IBM Watson Services for Core ML will then make updated versions of the CoreML models.
Vision Framework
For detecting faces in my iOS app I’m using Apple’s own Vision framework. Vision makes it easy to detect faces, facial landmarks and objects. Now it’s also compatible with CoreML, so applying a machine learning model on a face or object is now a piece of cake!
Speech-to-text & text-to-speech
My iOS app also has STT and TTS implemented to interact with “Billy”. These two technologies are already baked into iOS, so except for an import and some code, it’s not that difficult to implement this in your own app. And it works well in multiple languages.
Conclusion of the smart billboard
For everyone familiar with express.js and Swift I would recommend trying Kitura out, it will help you make a fully featured API in days!
And for those looking for a way to build an iOS app with machine learning without needing to be a machine learning expert be sure to check out CoreML and the IBM Watson Services for Core ML!
I’m already looking forward to continue using these new technologies during my internship and hope to learn a lot more the coming months!