Sparse Directed Graph Learning for Head Movement Prediction in 360 Video Streaming

High-definition 360 videos encoded in fine quality are typically too large in size to stream in its entirety over bandwidth (BW)-constrained networks. One popular remedy is to interactively extract and send a spatial sub-region corresponding to a viewer's current field-of-view (FoV) in a head-mounted display (HMD) for more BW-efficient streaming. Due to the non-negligible round-trip-time (RTT) delay between server and client, accurate head movement prediction that foretells a viewer's future FoVs is essential. Existing approaches are either overly simplistic in modelling and predict poorly when RTT is large, or are over-reliant on data-driven learning, resulting in inflexible models that are not robust to RTT heterogeneity. In this paper, we cast the head movement prediction task as a sparse directed graph learning problem, where three sources of relevant information---a 360 image saliency map, collected viewers' head movement traces, and a biological head rotation model---are aggregated into a unified Markov model. Specifically, we formulate a constrained optimization problem to minimize an $l_2$-norm fidelity term and a sparsity term, corresponding to trace data / saliency consistency and a sparse graph model prior respectively. We solve the problem alternately using a hybrid iterative reweighted least square (IRLS) and Frank-Wolfe optimization strategy. Extensive experiments show that our head movement prediction scheme noticeably outperforms existing proposals across a wide range of RTTs.

2069.pdf

2069.pdf (329)

Thumbs Up

CITE

Documents

Presentation Slides

Sparse Directed Graph Learning for Head Movement Prediction in 360 Video Streaming

2069.pdf

QUESTIONS?