Home | Gigavision Workshop

GigaVision

 

When Gigapixel Videography Meets Computer Vision

Background and Relevance

Figure 1: Illustration of representative imaging systems. (a) single camera imaging system faces the contradiction between wide FOV and high resolution, (b) single-scale camera array imaging [1, 2] relies on image stitching [3], (c) structured multi-scale camera array (AWARE2 [4]) adopts two-stage optical imaging design, (d) unstructured multi-scale camera array [5] (denoted as UnstructuredCam).

With the development of deep learning theory and technology, the performance of computer vision algorithms including object detection and tracking, face recognition, and 3D reconstruction have made tremendous progress. Deep learning based computer vision algorithms have surpassed the human-level performance for many CV tasks, like object recognition [6] and face verification [7]. However, computer vision technology relies on the valid information from the input image and video, and the performance of the algorithm is essentially constrained by the quality of source image/video. For example, it has been widely observed in object detection systems that the resolution of input images has significant impact on detection accuracy, especially for objects far away [8]. To achieve satisfactory performance in real-world applications, high-quality visual information is demanded which requires image/video with high resolution and high dynamic range in terms of spatial, temporal, angle and spectrum dimensions.

The recent gigapixel videography, beyond the resolution of single camera and human visual perception, aims to capture large-scale dynamic scene with extremely high resolution. Restricted by the spatial-temporal bandwidth product of optical system, the size, weight, power and cost are central challenges in gigapixel video. More explicitly, as shown in Fig. 1(a), the most popular single lens camera is composed by one stage optical imaging system, suffering from the inherent contradiction between high resolution and wide field-of-view. The single-scale multi-camera/camera-array system in Fig. 1(b) solves the contradiction through panoramic stitching pipeline, such as Microsoft ICE [9], Autopano Giga [10], Gigapan [11], Pointgrey ladybug 360 camera, etc. Such stitching based scheme always requires for a certain overlapping region among nearby images/cameras, leading to the redundant usage of CCD/CMOS in the camera array system.

While the recent multiscale optical design [4, 2] adopts a spherical objective lens as the first-stage optical imaging system, and the secondary imaging system uses multiple identical micro-optics to divide the whole FOV into small circular overlapped regions, as shown in Fig. 1(c). It substantially reduced the size and weight of gigapixel scale optical systems, the volume and weight of camera electronics in video operation is more than 10× larger than the optics [4, 13]. More importantly, it usually adopts the delicately structured camera array design, which is faced with the challenges of complicated optical, electronic and mechanical design, laborious calibration, massive data processing etc.

Aiming for the scalable, efficient and economized gigapixel videography, Yaun et al. present a novel gigapixel videography system with unstructured multi-scale camera array design, denoted as ‘UnstructuredCam’ in Fig. 1(d). Here ‘Unstructured’ indicates that the overall structure of our camera array does not follow fixed or particular designs thus without precise assembling and careful calibration in advance. ‘Multi-scale’ means not only the parameters of global-view camera varies from local-view camera, but also the parameters of local-view cameras can be different. For example, in UnstructuredCam, the reference/global camera (with wide-angle lens to capture the global scene) works together with local camera (with telephoto lens to capture local details). Such setting enables gigapixel videography by warping each local-view video to the reference video independently and parallelly, without the troublesome camera calibration among local-view cameras, which further allows flexible, compressible, adaptive and moveable local-view camera setting during data capture.

Figure 2: (a) the prototype of UnstructuredCam, (b) the corresponding multi-scale videos captured by UnstructuredCam.

In addition to the existing gigapixel camera array capturing outdoor large-scale dynamic scene, large-scale imaging of biological dynamics at high spatiotemporal resolution is indispensable in the study of system biology. However, with conventional microscopes, one has to make a compromise between large field-of-view (FOV) and high spatial resolution, resulting from the inherently limited space-bandwidth-product (SBP). In addition, there is lack of imaging system that is of sufficient data throughputs to record such huge information yet. Dai et al. break these bottlenecks by proposing the flat-curved-flat strategy, in which the sample plane is magnified into a large spherical image surface and then is seamlessly conjugated to multiple planar sensors with a relay lens array. Accordingly, they develop a customized objective of globally-uniform 0.92μm resolution across a 10mm×12mm FOV, and an accompanying camera array for high-throughput recording at 5.1 gigapixels per second. They demonstrate the first reported video-rate, gigapixel imaging of biological dynamics at centimeter scale and micron resolution, including brain-wide structural imaging and functional imaging in awake, behaving mice. Given such gigapixel image/video, the corresponding data processing tasks in microscope domain such as image segmentation, tumor detection, cell tracking (illustrated in Fig. 4) etc. remain tough problems, as simply usage of existing computer vision algorithms cannot handle such high-resolution, large-scale, and huge-throughput imaging result.

Along with the emergence of the novel camera array design for the extremely high resolution gigapixel video capture, the corresponding processing such as the compression, transmission, understanding etc. are urgently demanded. In particular, the understanding of gigapixel video via classical computer vision tasks such as detection, recognition, tracking, segmentation etc. remain open questions, regardless the extensive progress in computer vision community over past few years. More specifically, the opportunities and challenges that raised when computer vision meets gigapixel videography are summarized as follows.

  • Huge data throughput: Gigapixel camera system usually captures billions of pixels every second, such a huge mass of data brings great challenges in compression, transmission, and processing. In particular, different from traditional videos, gigapixel videography may have spatial variant video resolution, quality and importance. Therefore, smarter video coding and streaming need to be designed for gigapixel videography.
  • High resolution: the extremely high resolution of the gigapixel videography, giving many problems to existing computer vision applications. For example, image/video with gigapixel-level resolution can hardly be fed into existing neural networks directly. Simply down-sampling leads to severe resolution/information lost, which affects the performance of computer vision tasks significantly, such as face detection/recognition, semantic segmentation, etc. While simply dividing the gigapixel image into blocks cannot guarantee the computational complexity accordingly.
  • Large scale: Benefit from both wide FOV and high resolution characteristics of Gigapixel videography, the large-scale dynamic scene can be well captured, containing sufficient objects and activities to enable more potentially useful information in video surveillance. However, more objects means more occlusions, more complex scene, which brings great challenges to some computer vision algorithms, such as multi-target object tracking, Anomaly detection, etc.

Figure 3: The representative dynamic scene in Tsinghua Campus captured by UnstructuredCam.

References

1. B. Wilburn, N. Joshi, V. Vaish, E.-V. Talvala, E. Antunez, A. Barth, A. Adams, M. Horowitz, and M. Levoy, “High performance imaging using large camera arrays,” ACM TOG, vol.24, no.3, pp. 765–776, 2005.
2. F. Perazzi, A. Sorkine-Hornung, H. Zimmer, P. Kaufmann, O. Wang, S. Watson, and M. Gross, “Panoramic video from unstructured camera arrays,” in CGF, vol. 34, no. 2, 2015, pp. 57–68.
3. M. Brown and D. G. Lowe, “Automatic panoramic image stitching using invariant features,” International journal of computer vision, vol. 74, no. 1, pp. 59–73, 2007.
4. D. Brady, M. Gehm, R. Stack, D. Marks, D. Kittle, D. Golish, E. Vera, and S. Feller, “Multiscale gigapixel photography,” Nature, vol. 486, no. 7403, pp. 386–389, 2012.
5. X. Yuan, L. Fang, Q. Dai, D. J. Brady, and Y. Liu, “Multiscale gigapixel video: A cross resolution image matching and warping approach,” in Computational Photography (ICCP), 2017 IEEE International Conference on. IEEE, 2017, pp. 1–9.
6. K. He, X. Zhang, S. Ren, and J. Sun, “Delving deep into rectifiers: Surpassing human-level performance on imagenet classification,” in 2015 IEEE International Conference on Computer Vision, 2015, pp. 1026–1034.
7. F. Schroff, D. Kalenichenko, and J. Philbin, “Facenet: A unified embedding for face recognition and clustering,” in IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp.815–823.
8. T. Lin, P. Doll’ar, R. B. Girshick, K. He, B. Hariharan, and S. J. Belongie, “Feature pyramid networks for object detection.”
9. M. Research, “Image composite editor: An advanced panoramic image stitcher.”
10. Kolor, “Autopano giga.”
11. “Gigapan,” http://www.gigapan.com/.
12. O. S. Cossairt, D. Miau, and S. K. Nayar, “Gigapixel computational imaging,” in IEEE ICCP, 2011, pp. 1–8.
13. J. Nichols, K. Judd, C. Olson, K. Novak, J. Waterman, S. Feller, S. McCain, J. Anderson, and D. Brady, “Range performance of the DARPA aware wide field-of-view visible imager,” Applied Optics, vol. 55, no. 16, pp. 4478–4484, 2016.

Sponsors