Semantic Segmentation in Compressed Videos

Existing approaches for semantic segmentation in
videos usually extract each frame as an RGB image, then apply
standard image-based semantic segmentation models on each
frame. This is time-consuming. In this paper, we tackle this
problem by exploring the nature of video compression techniques.
A compressed video contains three types of frames, I-frames,
P-frames, and B-frames. I-frames are represented as regular
images, P-frames are represented as motion vectors and residual
errors, and B-frames are bidirectionally frames that can be
regarded as a special case of a P frame. We propose a method
that directly operates on I-frames (as RGB images) and P-frames
(motion vectors and residual errors) in a video. Our
proposed model uses a ConvLSTM model to capture the temporal
information in the video required for producing the semantic
segmentation on P-frames. Our experimental results show that
our method performs much faster than other alternatives while
achieving similar performance in terms of accuracies.

Semantic Segmentation in Compressed Videos .pdf

Semantic Segmentation in Compressed Videos .pdf (430)

Thumbs Up

CITE

Documents

Poster

Semantic Segmentation in Compressed Videos

Semantic Segmentation in Compressed Videos .pdf

QUESTIONS?