Abstract

The volume of video media on the internet and from other sources is increasing at a rapid rate. The existing analysis, storage, retrieval, indexing and searching techniques are unable to cope with this huge volume of information. Moreover, there is no efficient way of extracting information from this huge pool. In this lieu, we propose Generating Action Tags and Videos (ViTAG), a system for generating scene descriptions of simple car and bike videos. ViTAG aims to describe simple car and bike videos using object and action recognition. We propose a three step process for generating these scene descriptions.

Key Frame Extraction

The nature of video data does not make it a good candidate for conventional retrieval, indexing and storage techniques due to redundancy. Video summarization is a method to reduce this redundancy. In this paper we present a technique for video summary generation using key frame extraction called Key Frame Extractor (KFE). KFE uses 2D auto-correlation, color histogram comparison and moment invariants for key frame extraction. An adaptive formula is used to make KFE partially tolerant to lighting condition changes. KFE allows a tradeoff in computation, memory complexity and accuracy of key frame extraction. The video summarization results compare very well to the TRECVID 2007 benchmark and CMU's submission to TRECVID 2007 [TE-ICPR10].

Object Recognition

Object recognition is one of the most important problems in computer vision, with wide ranging applications such as content based search, automated surveillance, action recognition etc. In this paper we present a framework for object recognition and pose estimation using SURF features. In this framework we make four novel contributions. Our feature-reduction process allows a speed-up of matching speed-up of 634.8% by using only the most repeatable features for matching. The noise-reduction process allows a further increase in matching speed-up reducing the false positive rates by 50%. A modified definition of the second-neighbor in the in the nearest neighbor ratio matching strategy allows matching with increased reliability. We also introduce a hierarchal approach for feature database storage that presents an easy way for pose estimation of objects. [TE-IPCV10]

Scene and Action Recognition in Videos

In progress

[Back]

Framework

framework

Figure 1: Learning Framework for ViTAG

[Back]

Results

This section presents the results of different components of ViTAG.

Key Frame Extraction

	CMU Base 1	CMU Base 2	CMU's Submission	KFE
Inclusion of Significant Events (IN)	0.59	0.58	0.6	0.85
Lack of Redundancy (RE)	3.52	3.50	3.62	3.83
Target (4%) - Summary Time (XD)	-0.15%	-0.06%	0.12%	0.37%
Fraction time taken to evaluate (TT)	1.7	1.65	1.75	2.71
Duration of Summary (DU)	4.15%	4.06%	3.88%	3.63%

Table 1: Comparison of KFE's results with TRECVID 2007 benchmarks (CMU Base 1 and CMU Base 2) and CMU's Submission to TRECVID 2007 using the criteria presented in "The TRECVID 2007 BBC Rushes Summarization Evaluation Pilot"

Figure 2: Key Frames extracted from Office Tour Video
See Original Video here

Key frames extracted from the office tour video are shown above. There are two main reasons for testing on this video (1) It is easy to assess the quality of key frame extraction results (2) The lighting conditions vary greatly in the video allowing to test the system to the limits. The system is able to handle mild changes in lighting conditions (such as the bottom left frame). However, extreme changes in lighting conditions (such as those in the third row) define an upper limit for insensitivity to lighting changes.

Object Recognition

OR-RESULT
Table 2: Object Recognition Results on UK Benchmark Standard Dataset

[Back]

References

[TE-ICPR10] T. Tariq, N. Ejaz. Video Summarization using Key Frame Extraction. Submitted to ICPR 2010.
[TE-IPCV10] T. Tariq, N. Ejaz. Speeded-Up Object Recognition and Pose-Estimation using SURF. Submitted to IPCV 2010.

[Back]

About Us

1. Ahsun Taquveem Chohan	3. Junaid Shafiq
2. Ehtasham-ul-Haq	4. Tayyab Bin Tariq

[Back]