Please use this identifier to cite or link to this item:
Title: Intelligent image cropping and scaling
Authors: Deigmoeller, Joerg
Advisors: Itagaki, T
Keywords: Computer vision;Regions of interest;Visual attention;Camera motion estimation;Production metadata
Issue Date: 2011
Publisher: Brunel University School of Engineering and Design PhD Theses
Abstract: Nowadays, there exist a huge number of end devices with different screen properties for watching television content, which is either broadcasted or transmitted over the internet. To allow best viewing conditions on each of these devices, different image formats have to be provided by the broadcaster. Producing content for every single format is, however, not applicable by the broadcaster as it is much too laborious and costly. The most obvious solution for providing multiple image formats is to produce one high resolution format and prepare formats of lower resolution from this. One possibility to do this is to simply scale video images to the resolution of the target image format. Two significant drawbacks are the loss of image details through ownscaling and possibly unused image areas due to letter- or pillarboxes. A preferable solution is to find the contextual most important region in the high-resolution format at first and crop this area with an aspect ratio of the target image format afterwards. On the other hand, defining the contextual most important region manually is very time consuming. Trying to apply that to live productions would be nearly impossible. Therefore, some approaches exist that automatically define cropping areas. To do so, they extract visual features, like moving reas in a video, and define regions of interest (ROIs) based on those. ROIs are finally used to define an enclosing cropping area. The extraction of features is done without any knowledge about the type of content. Hence, these approaches are not able to distinguish between features that might be important in a given context and those that are not. The work presented within this thesis tackles the problem of extracting visual features based on prior knowledge about the content. Such knowledge is fed into the system in form of metadata that is available from TV production environments. Based on the extracted features, ROIs are then defined and filtered dependent on the analysed content. As proof-of-concept, this application finally adapts SDTV (Standard Definition Television) sports productions automatically to image formats with lower resolution through intelligent cropping and scaling. If no content information is available, the system can still be applied on any type of content through a default mode. The presented approach is based on the principle of a plug-in system. Each plug-in represents a method for analysing video content information, either on a low level by extracting image features or on a higher level by processing extracted ROIs. The combination of plug-ins is determined by the incoming descriptive production metadata and hence can be adapted to each type of sport individually. The application has been comprehensively evaluated by comparing the results of the system against alternative cropping methods. This evaluation utilised videos which were manually cropped by a professional video editor, statically cropped videos and simply scaled, non-cropped videos. In addition to and apart from purely subjective evaluations, the gaze positions of subjects watching sports videos have been measured and compared to the regions of interest positions extracted by the system.
Description: This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University, 2011.
Appears in Collections:Electronic and Computer Engineering
Dept of Electronic and Computer Engineering Theses

Files in This Item:
File Description SizeFormat 
FulltextThesis.pdf12.27 MBAdobe PDFView/Open

Items in BURA are protected by copyright, with all rights reserved, unless otherwise indicated.