Compressed Domain Video Zoom Motion Analysis and Saliency Estimation

Pavan, Sandula (2021) Compressed Domain Video Zoom Motion Analysis and Saliency Estimation. PhD thesis.

[img]PDF (Restricted upto 09/09/2024)
Restricted to Repository staff only



The work presented in the thesis is broadly in the domain of compressed domain video analysis. The thesis investigates the camera zoom motion analysis problem, the mixed camera classification problem, and one of the important video applications, namely saliency estimation. The work is motivated by the fact that the zoom motion analysis and the mixed camera classification are relatively less established since the major focus in the video processing community was on investigating the translational motions (pan and tilt) of the camera. Additionally, the saliency estimation in compressed videos is also an open problem that needed attention. The contributions of the thesis begin by investigating the zoom motion analysis problem, which comprises camera zoom motion detection and camera zoom motion classification sub¬problems. In zoom motion detection, the zooming frames are separated from the non¬zooming frames, while zoom motion classification deals with further separation of the zooming frames into zoom¬in and zoom¬out camera types. Towards this goal, the compressed domain block motion vector orientation is modeled utilizing traditional image texture descriptors. Two methods are proposed, the first in which the local ternary patterns are explored for both the zoom motion detection and classification problems and the second in which the local tetra patterns are utilized for the zoom motion detection problem. Such modeling is novel in the sense that the image texture descriptors, which found applications in face recognition and content¬based image retrieval applications, are being explored for the video zoom analysis research problem. Experimental results utilizing block motion vectors extracted from ESME and H.264 compressed videos showed good performance for both methods with a slight advantage to the local tetra patterns. However, the texture descriptors under¬performed when the input block motion vectors were noisy, calling for exploring other localized methods capable of countering motion vector noise. Zoom motion analysis problem is re¬looked by partitioning the inter¬frame block motion vector field into four representative quadrants, which enabled more localized analysis. Two methods are proposed, the first where histogram¬based features, specifically histogram intersection between quadrant histograms for the zoom motion detection problem and KL divergence between quadrant cumulative histograms for the zoom motion classification problem. The second method is on exploring the vector CURL to theoretically model the block motion vector orientation values, followed by extracting features like CURL magnitude for zoom vii motion detection and CURL direction for zoom motion classification problem. Experimental validation showed superior accuracy of detection for the two methods even in the presence of noise, with the CURL method achieving the best results compared to the texture descriptors. The focus in the latter part of the thesis shifted towards exploring the mixed camera motion problem, which consisted of recognizing complex motions, namely panning with tilting, which had not been explored earlier in literature. Inferences drawn from previous methods had suggested that the feature analysis could be improved if some representation scheme could be explored for the block motion vectors instead of directly utilizing the motion vector orientation values. It led to modeling both the orientation and the magnitude of the block motion vectors using the HSI color model. The premise was to pose the camera motion classification problem as a color recognizing task, which was carried out by assigning motion vector orientation to Hue, motion vector magnitude to Saturation while keeping Intensity unchanged. The HSI representation was converted to RGB images. These images were utilized for training a convolutional neural network to classify eleven camera patterns containing seven pure patterns and four mixed camera patterns. Experimental validation along with ablation study demonstrated good accuracy of recognition for the eleven camera patterns even in the presence of noise. The last part of the thesis looked into the compressed domain video saliency problem. Since the texture descriptors were successfully utilized earlier to model the motion vector orientation for the zoom motion analysis problem, an attempt was made to explore if such modeling could also aid in saliency determination. This premise led to exploring two texture descriptors, dual cross patterns and local derivative patterns, for saliency estimation. Two methods are investigated, the first utilizing the dual cross patterns while the second utilizing the local derivative patterns for temporal saliency determination. The spatial saliency was estimated in both methods by modeling the transform residuals using the lifting wavelet transform. The fusion of spatial and temporal saliency maps in both methods was carried out using the Dempster¬Shafer combination rule. Extensive experimental testing using eye tracking data¬set was carried out to benchmark the two proposed methods with state¬of¬the¬art methods

Item Type:Thesis (PhD)
Uncontrolled Keywords:Compressed Domain; Zoom Motion Analysis; Block Motion vector; Saliency Estimation; Video Processing
Subjects:Engineering and Technology > Electronics and Communication Engineering > Image Processing
Engineering and Technology > Electronics and Communication Engineering
Divisions: Engineering and Technology > Department of Electronics and Communication Engineering
ID Code:10290
Deposited By:IR Staff BPCL
Deposited On:09 Sep 2022 21:43
Last Modified:12 Sep 2022 10:49
Supervisor(s):Okade, Manish

Repository Staff Only: item control page