Measure of the Quality of Experience

Motivation

Quality of Experience (QoE) is a subjective measure to estimate the overall quality of the service provided from the point of view of the user. The importance of this measure has grown in the last years because of the increasing need for providing a good user experience in many services, especially in video streaming.

There are two different families of tests for video quality assessment, namely subjective tests –using test subjects to obtain QoE evaluation of video sequences– and objective tests –using algorithms that estimate the quality of the displayed video–.

Regarding subjective tests, the International Telecommunication Union (ITU) has published different recommendations that provide a methodology to conduct subjective evaluations formally (such as [1]-[4]). One of the most used techniques to measure QoE is the Mean Opinion Score (MOS), in which different users value their experience with regards to a video playback analyzing specific parameters by using a scale between 1 (lowest satisfaction) and 5 (highest satisfaction). The MOS is then generated as the average over a set of subjective evaluations provided by the test audience.

In general, the main drawback of subjective tests is the time and resources (in terms of number of people) required to carry out the measurements. This motivates the existence of objective tests, which are performed by algorithms that estimate what the opinion of users would be if they were asked for.

Overview

Regarding objective estimation of the QoE in HTTP adaptive streaming (like DASH tecnhnology), the key QoE estimators for evaluating the quality of this kind of services [5] are: the encoding process, the initial loading delay, the ability of HAS to change the quality for each segment, and the inevitable possibility of running out of buffer during the playback.

The literature repeatedly uses these factors to formulate different QoE methods (for example, [6]), although there are works that consider other parameters, such as [7], that studies the impairments related to the frequency and duration of the stalls. These works have a common point in defining the QoE as a formula where the impairments referred to initial delay, playback stalls, and quality changes penalize QoE.

It is worth hightlighting the proposals from the ITU to estimate the QoE, specifically the ITU-T P.1203 [8] and its evolution ITU-T P.1204 [9], which describe a set of objective quality assessment modules that help predict the quality experienced by end users in multimedia streaming applications.

In this sense, the Multimedia Communications Group has a great experience regarding the use of the ITU recommendations. Thus, the COMM has developed a framework for the evaluation of QoE in adaptive video streaming scenarios, which allows to analyze the impact on the user’s Quality of Experience (QoE) using different bandwidth variation patterns (switching frequency, range and type of variation), among other aspects [12].

Figure 1. Framework architecture for measuring QoE

The proposed framework allows performance measurements to be carried out in an automated and systematic way for the evaluation of DASH systems in 2D and 3D video streaming services. It is used used Puppeteer, the Node.js library developed by Google, which provides a high-level API, to automate actions on Chrome Devtools Protocol, such as starting playback, causing bandwidth changes and saving the results of quality change processes, timestamps, stalls and so on. From this data, a processing is made to allow the reconstruction of the visualized video, as well as the extraction of quality metrics and the users’ QoE assessment using the ITU-T P.1203 recommendation.

Also, one of our objectives is to study how asymmetric coding aims to exploit the binocular suppression of the HVS (Human Visual System) getting more efficient video compression by representing one of the two views with a lower quality. Regarding 3D video subjective assessement, previous studies conducted within our group [12] and some other existing in the literature (e.g., [10] and [11]) suggest that averaging the quality of the 2D left and right views well predicts the quality of symmetrically distorted stereoscopic videos but generate substantial prediction bias when applied to asymmetrically distorted stereoscopic videos. According to that, currently we are focus on how from the MOS values for the 2D sequence (Left view, Right view) provided by the implementation of the ITU-T P.1203 recommendation and the objective assessement results, a very good prediction of the MOS of the symmetric and asymmetric stereoscopic sequence could be obtained.

On the other hand, many objective QoE models are based on the bitrate. However, the PSNR (Peak Signal-to-Noise Ratio) or VMAF (Video Multimethod Assessment Fusion) have been proved to be metrics with a closer relationship with the QoE than the bitrate. In this sense, the COMM has proposed three new models to measure the QoE analytically in DASH video services. The first is based on the bitrate of the displayed video segments, whereas the second and the third are based on the PSNR and VMAF of each video segment, respectively. All three are simple QoE models that consider the main parameters that affect QoE (encoding quality, rebufferings, quality switchings, and initial delay).

Conceptually, in the proposed models, the QoE increases as the bitrate/PSNR/VMAF increases, whereas the QoE gets worse when the duration of stalls, quality switches and initial delay increase. A complete survey of these proposed QoE models can be found in [15], where the numerous studies carried out prove that, although the three proposed models offer more coherent than other models existing in the literature, the QoE model most similar to human behavior is the VMAF-based QoE model.

References

[1] International Telecommunication Union, Recommendation ITU-R BT.500-13: Methodology for the subjective assessment of the quality of television pictures. BT Series, Broadcasting service, 2012.
[2] International Telecommunication Union, Recommendation ITU-T P.913: Methods for the subjective assessment of video quality, audio quality and audiovisual quality of Internet video and distribution quality television in any environment, 2016.
[3] International Telecommunication Union, Recommendation ITU-T P.910: Subjective Video Quality Assessment methods for multimedia applications, 2008.
[4] International Telecommunication Union, Recommendation ITU-T P.911: Subjective audiovisual quality assessment methods for multimedia applications, 1998.
[5] L. Skorin-Kapov, M. Varela, T. Hoßfeld, and K.-T. Chen, “A survey of emerging concepts and challenges for QoE management of multimedia services,” ACM Transactions on Multimedia, Computing, Communications, and Applications (TOMM), vol. 14, no. 29, article no. 29, 2018.
[6] X. Yin, V. Sekar, and B. Sinopoli, “Toward a principled framework to design dynamic adaptive streaming algorithms over HTTP,” in Proc. of the 13th ACM Workshop on Hot Topics in Networks (HotNets), Los Angeles, CA, USA, pp. 1-7, Oct. 2014.
[7] Y. Liu, S. Dey, F. Ulupinar, M. Luby, and Y. Mao, “Deriving and validating user experience model for DASH video streaming,” IEEE Transactions on Broadcasting, vol. 61, no. 4, pp. 651-665, 2015.
[8] International Telecommunication Union (ITU-T), Parametric bitstream-based quality assessment of progressive download and adaptive audiovisual streaming services over reliable transport- Recommendation ITU-T P.1203, 2017.
[9] International Telecommunication Union (ITU-T), Video quality assessment of streaming services over reliable transport for resolutions up to 4K with access to full bitstream information- Recommendation ITU-T P.1204.3, 2020.
[10] J. Wang , S. Wang , Z. Wang, “Asymmetrically Compressed Stereoscopic 3D Videos: Quality Assessment and Rate-Distortion Performance Evaluation,” IEEE Transactions on  Image Processing, vol. 26, no. 3, pp. 1330–1343, 2017.
[11] F. Battisti , M. Carli , P. Le Callet , and P. Paudyal, “Toward the assessment of quality of experience for asymmetric encoding in immersive media,” IEEE Transactions on Broadcasting, vol. 64, no. 2, pp. 392–406, 2018.

Related publications

[12] P. Guzmán, P. Arce, and J. C. Guerri, “Automatic QoE evaluation for asymmetric encoding of 3D videos for DASH streaming service,” Ad Hoc Networks, vol. 106, article 102184, 2020.
[13] P. GuzmánP. Arce, and J. C. Guerri, “Automatic QoE Evaluation of DASH Streaming using ITU-T Standard P.1203 and Google Puppeteer,” in Proc. of Int. Symposium on Performance Evaluation of Wireless Ad Hoc, Sensor, & Ubiquitous Networks (PE-WASUN), Miami Beach, FL (USA), Nov. 2019, pp. 79-86.
[14] P. GuzmánP. Arce, and J. C. Guerri, “Evaluación automática de la QoE del streaming DASH utilizando el estándar ITU-T P.1203 y Google Puppeteer,” in Proc. of Jornadas de Ingeniería Telemática (JITEL), Zaragoza (Spain), Oct. 2019.
[15] I. de Fez, R. Belda, and J. C. Guerri, “New objective QoE models for evaluating ABR algorithms in DASH,” Computer Communications, vol. 158, pp. 126-140, 2020.