Documents
Poster
Semantic Segmentation in Compressed Videos
- Citation Author(s):
- Submitted by:
- Yiwei Lu
- Last updated:
- 23 September 2019 - 9:26pm
- Document Type:
- Poster
- Document Year:
- 2019
- Event:
- Presenters:
- Yiwei Lu
- Categories:
- Log in to post comments
Existing approaches for semantic segmentation in
videos usually extract each frame as an RGB image, then apply
standard image-based semantic segmentation models on each
frame. This is time-consuming. In this paper, we tackle this
problem by exploring the nature of video compression techniques.
A compressed video contains three types of frames, I-frames,
P-frames, and B-frames. I-frames are represented as regular
images, P-frames are represented as motion vectors and residual
errors, and B-frames are bidirectionally frames that can be
regarded as a special case of a P frame. We propose a method
that directly operates on I-frames (as RGB images) and P-frames
(motion vectors and residual errors) in a video. Our
proposed model uses a ConvLSTM model to capture the temporal
information in the video required for producing the semantic
segmentation on P-frames. Our experimental results show that
our method performs much faster than other alternatives while
achieving similar performance in terms of accuracies.