When Gigapixel Videography Meets Computer Vision

Background and Relevance

The proposed GigaVision workshop is associated with ‘PANDA’, the world-first gigaPixel-level humAN-centric viDeo dAtaset, for large-scale, long-term, and multi-object visual analysis. The videos in PANDA were captured by a gigapixel camera and cover real-world large-scale scenes with both wide field-of-view (~1km^2 area) and high resolution details (~gigapixel-level/frame). The scenes may contain 4k head counts with over 100× scale variation, as shown in Fig. 2. PANDA provides enriched and hierarchical ground-truth annotations, including 15,974.6k bounding boxes, 111.8k fine-grained attribute labels, 12.7k trajectories, 2.2k groups and 2.9k interactions. Due to the vast variance of pedestrian pose, scale, occlusion and trajectory, existing computer vision tasks will be highly challenged by both accuracy and efficiency. Therefore, GigaVision workshop aims to bring the community’s attention on the visual analysis of complicated behaviors and interactions of crowd in large-scale real-world scenes. During the workshop, half-day presentations from keynote speakers, and half-day challenges will be held.

Figure 1: A representative video Marathon of PANDA dataset.

Figure 2: Visualization of features of PANDA dataset. (a) The scale variation of pedestrians in a large-scale scene. (b) Three fine-grained bounding box an- notations of the human body. (c) Annotations of five types of human poses. (d) Group information along with the intra-group interactions (TK=Talking, PC=Physical contact), where the circle and short line denote pedestrian and their face orientation.