Spatial Keyframe Extraction of Mobile Videos for Efficient Object Detection at the Edge

Advances in federated learning and edge computing advocate for deep learning models to run at edge devices for video analysis. However, the captured video frame rate is too high to be processed at the edge in real-time with a typical model such as CNN. Any approach to consecutively feed frames to the model compromises both the quality (by missing important frames) and the efficiency (by processing redundantly similar frames) of analysis. Focusing on outdoor urban videos, we utilize the spatial metadata of frames to select an optimal subset of frames that maximizes the coverage area of the footage. The spatial keyframe extraction is formulated as an optimization problem, with the number of selected frames as the restriction and the maximized coverage as the objective. We prove this problem is NP-hard and devise various heuristics to solve it efficiently. Our approach is shown to yield much better hit-ratio than conventional ones.

ICIP2020.pdf

ICIP2020.pdf (324)

Thumbs Up

CITE

Documents

Presentation Slides

Spatial Keyframe Extraction of Mobile Videos for Efficient Object Detection at the Edge

ICIP2020.pdf

QUESTIONS?