Name: | Description: | Size: | Format: | |
---|---|---|---|---|
2.13 MB | Adobe PDF |
Advisor(s)
Abstract(s)
Over the last decade noticeable progress has occurred in automated computer
interpretation of visual information. Computers running artificial intelligence algorithms are
growingly capable of extracting perceptual and semantic information from images, and
registering it as metadata. There is also a growing body of manually produced image
annotation data. All of this data is of great importance for scientific purposes as well as for
commercial applications. Optimizing the usefulness of this, manually or automatically produced,
information implies its precise and adequate expression at its different logical levels,
making it easily accessible, manipulable and shareable. It also implies the development of
associated manipulating tools. However, the expression and manipulation of computer vision
results has received less attention than the actual extraction of such results. Hence, it has
experienced a smaller advance. Existing metadata tools are poorly structured, in logical terms,
as they intermix the declaration of visual detections with that of the observed entities, events
and comprising context. This poor structuring renders such tools rigid, limited and cumbersome
to use. Moreover, they are unprepared to deal with more advanced situations, such as the
coherent expression of the information extracted from, or annotated onto, multi-view video resources. The work here presented comprises the specification of an advanced XML based
syntax for the expression and processing of Computer Vision relevant metadata. This proposal
takes inspiration from the natural cognition process for the adequate expression of the
information, with a particular focus on scenarios of varying numbers of sensory devices,
notably, multi-view video.
Description
Keywords
Metadata Multi-viewvideo Multimedia annotation Computer vision Cognition
Citation
Publisher
Springer Verlag