Question 1

What was before?

Answer

Most businesses around the globe hold most of their meetings online. Videoconferencing is used for meeting with customers and partners as well as for internal meetings, training, and conferences. Many of them record their meetings for future reference. Companies have accumulated hundreds of meeting recordings and stored terabytes of video data.

Question 2

What was the issue?

Answer

Internal users access video conference recordings for various reasons: to recall details and numbers of somebodies speech, to clarify someone's commitment, and to recall decisions and conclusions on the specific topic discussed during the meeting. Browsing through video conference recordings is very time-consuming as the user has to look and listen to most of the meetings to avoid missing target information.

Trying to find information on a specific topic without knowing the exact day and time of the meeting appears to be impossible as well as finding meetings with the specific participants. Therefore, storing meeting recordings creates much less value than it could.

Question 3

What did we do?

Answer

Vision Systems has created an intelligent video analytics solution based on Computer Vision and Natural Language Processing technology. The system analyzes the video conference recordings and supplements video with metadata to convert video storage into a structured and searchable knowledge base. Machine learning algorithms use face, articulation, and speech recognition technologies to convert audio to text and match particular phrases to speakers. We also built a user portal allowing users to filter recordings by participants, search by keywords pronounced during the meeting, and navigate directly to the targeted moment of recording from the search result page.

Navigation within one video conference recording allowed users to see timestamps of selected speaker activity and jump directly to the required fragment.

Question 4

What was the result?

Answer

Simple storage of video files has turned into a powerful knowledge base allowing users to find information they need easier and faster. This fact leads to making better decisions, faster learning process and making less mistakes. Commercial efficiency on data storage increased many times.

Question 5

How it works?

Answer

Saved video conferences are processed by a few Machine Learning algorithms: #NLP transcribes speech to text, next algorithm performs #FaceRecognition, other identifies speakers activity by lip movement to bind phrases to appropriate speakers. Collected information is structured to metadata related to video and stored in the database.

User self-service portal allows users to sort and filter videos by participants, topics, video name, date, and length. Portal's internal search engine helps users find specific moments inside videos and navigate directly to required video fragments by clicking the item in search results.

The system can evolve by adding new features: gesture recognition for bookmarking or other purposes, specific object recognition or detection, etc. There are many directions for systems growth. For example, we can add the functionality of automatic e-mailing meeting minutes with commitments and meeting conclusions to meeting participants.