Content-adaptive visual metadata extraction and enrichment

 

Concerning the work on specific content analysis and search tools, the project analysed the state of the art in the area of semantic retrieval of multimedia (audio-visual) content beyond text resources, i.e., considering the nature of the content to be retrieved in order to better meet the users’ content needs. Consequently, the characteristics of images, video and audio, are exploited for improving accuracy of retrieval results. The annotation of audio-visual content is viewed from two different yet complementary perspectives: automatic and manual annotation techniques. On the one hand, automatic annotation techniques include visual analysis, concept detection, speech-to-text methods, etc. On the other hand, manual techniques range from repurposing existing metadata and manual annotation tools to different crowdsourcing approaches.

Algorithms were designed for the recognition of events in videos, with the help of supporting text. A semantic annotation prototype for the annotation of scene/shots and face metadata was developed, integrating results of different visual analysis components. For sports video analysis an efficient player detection and tracking algorithm was developed. Logo detection optimized for football content was implemented and successfully evaluated in this context. A set of automatic quality analysis tools was developed. The project has also worked on near-duplicate detection and linking by combining visual and textual similarity.

© 2017 TOSCA-MP - Task-Oriented Search and Content Annotation for Media Production
The research leading to the presented results has received funding from the European Union's
Seventh Framework Programme (FP7/2007-2013) under grant agreement n° 287532. - Imprint