Visual Coding for Humans and Machines
Visual content is increasingly being used for more than human viewing. For example, traffic video is automatically analyzed to count vehicles, detect traffic violations, estimate traffic intensity, and recognize license plates; images uploaded to social media are automatically analyzed to detect and recognize people, organize images into thematic collections, and so on; visual sensors on autonomous vehicles analyze captured signals to help the vehicle navigate, avoid obstacles, collisions, and optimize their movement. The above applications require continuous machine-based analysis of visual signals, with only occasional human viewing, which necessitates rethinking the traditional approaches for image and video compression. This talk is about coding visual information in ways that enable efficient usage by machine learning models, in addition to human viewing. We will touch upon recent rate-distortion results in this field, describe several designs for human-machine image and video coding, and briefly review related standardization efforts.