Visual Coding for Humans and Machines

Visual content is increasingly being used for more than human viewing. For example, traffic video is automatically analyzed to count vehicles, detect traffic violations, estimate traffic intensity, and recognize license plates; images uploaded to social media are automatically analyzed to detect and recognize people, organize images into thematic collections, and so on; visual sensors on autonomous vehicles analyze captured signals to help the vehicle navigate, avoid obstacles, collisions, and optimize their movement. The above applications require continuous machine-based analysis of visual signals, with only occasional human viewing, which necessitates rethinking the traditional approaches for image and video compression. This talk is about coding visual information in ways that enable efficient usage by machine learning models, in addition to human viewing. It touches upon recent rate-distortion results in this field, describes several designs for human-machine image and video coding, and briefly reviews related standardization efforts.

This was a keynote talk at the 3rd Workshop on Image/Video/Audio Quality in Computer Vision and Generative AI at the 2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) in Waikoloa, HI, January 2024.

2024_WACVW_keynote.pdf

2024_WACVW_keynote.pdf (179)

Thumbs Up

CITE

Documents

Presentation Slides

Visual Coding for Humans and Machines

2024_WACVW_keynote.pdf

QUESTIONS?