Depth Enhancement Methods for Centralized Texture?Depth Packing Formats(全文)

开篇：润墨网以专业的文秘视角，为您筛选了一篇Depth Enhancement Methods for Centralized Texture?Depth Packing Formats范文，如需获取更多写作素材，在线客服老师一对一协助。欢迎您的阅读与分享！

Abstract

To deliver three?dimension (3D) videos through the current two?dimension (2D) broadcasting systems, the frame?compatible packing formats properly including one texture frame and one depth map in various down?sampling ratios have been proposed to achieve the simplest and most effective solution. To enhance the compatible centralized texture?depth packing (CTDP) formats, in this paper, we further introduce two depth enhancement algorithms to further improve the quality of CTDP formats for delivering 3D video services. To compensate the loss of color YCbCr 444 to 420 conversion of colored?depth, two efficient depth reconstruction processes based on texture and depth consistency are proposed. Experimental results show that the proposed enhanced CTDP depacking process outperforms the 2DDP format and the original CTDP depacking procedure in synthesizing virtual views. With the help of the proposed efficient depth reconstruction processes, more correct reconstructed depth maps and better synthesized quality can be achieved. Before the available 3D broadcasting systems, which adopt truly depth and texture dependent coding procedure, we believe that the proposed CTDP formats with depth enhancement could help to deliver 3D videos in the current 2D broadcasting systems simply and efficiently.

Keywords

3D videos; frame?compatible; 2D?plus?depth; CTDP

1 Introduction

ver past decades, more and more three?dimensional (3D) videos have been produced in the formats of stereo or multiple views with their corresponding depth maps. People desire to have more truthful and exciting experience through the true 3D visualizations. In order to fit the traditional two?dimensional (2D) television (TV) programs, we need to modify the 3D videos to accommodate the certain constraints. Frame?packing is one of possible solutions to introduce 3D services in the current cable and terrestrial 2D TV systems. There are several well?known formats for packing the stereo views into 2D frame such as side?by?side (SbS), top?and?bottom (TaB), and checkerboard frame?compatible formats [1]-[4]. However, there exist two major problems, which slow down the development of the 3D TV services, in the existing frame?packing methods. The frame?compatible packing 3D videos of the stereo views mean that two texture images are gathered in one frame, which may make serious annoying effects on traditional 2D displays. Besides, stereo packing formats cannot support multi?view naked?eye 3D displays unless the stereo videos are further processed by real?time stereo matching methods [5], [6] and depth image?based rendering (DIBR) algorithms [7], [8]. To support multiview 3D displays, the 2D?plus?depth packing (2DDP) frame?compatible format, which arranges the texture in the left and the depth in the right, is suggested [9]. Once the color texture and depth arranged in the SbS fashion, the 2DDP format will bring even worse annoying visualization in 2D displays than the stereo packing formats. Recently, MPEG JCT?3V team proposed the latest coding standard for 3D video with depth [9]. However, it still needs some time to be deployed in current digital video broadcasting systems, which are with 2D and 3D capabilities.

To deal with the above problems, a novel frame―compatible centralized texture?depth packing (CTDP) formats for delivering 3D video services is proposed [10]. With AVS2 and HEVC video coders, the proposed CTDP formats [10] show better objective and subjective visual quality in 2D and 3D displays than the 2DDP format. In the CTDP format, the sub?pixel is utilized to store the depth information, while the texture information is arranged in the center of the frame to raise the 2D?compatible visual quality. However, the rearrangement will degrade the quality of the reconstructed depth map, especially when the video format with YCbCr space is 420 format with 4 Y components, one Cb component and one Cr component for each 4 color pixels. To further increase the visual quality, an efficient depth reconstruction process is also proposed in this paper. The frame structure of the CTDP method in cooperation with the current broadcasting system is shown in Fig. 1. Without any extra hardware, the 2D TV displays can also exhibit an acceptable 2D visual quality. For glasses or naked?eye 3D displays, we only need a simple CTDP depacking circuit followed by DIBR kernel to synthesize stereo or multiple views if the view?related sub?pixel formation of a naked?eye 3D display is given.

The rest of the paper is organized as follows. The CTDP formats are overviewed in Section 2. The proposed depth reconstruction process is described in Section 3. Experimental results to demonstrate the effectiveness of the proposed system are shown in Section 4. Finally, we conclude this paper in Section 5.

2 Centralized Texture??Depth Packing

Formats

To achieve system compatibility, the basic concept of the CTDP method [10] is similar to frame compatible concept to pack texture and depth information together while keeping the same resolution as 2D videos. To solve the 2D visualization issue, we can arrange the texture in the center and the depth in two sides of the packed frame.

2.1 Colored??Depth Frame

The depth frame is only a gray image with Y components. To pack the depth frame, the colored?depth frame is suggested to represent it [10]. Thus, the colored?depth frame can be treated as the normal color texture frame, which can be directly encoded by any 2D video encoders with three times efficiency. As shown in Fig. 2, three depth horizontal lines are treated as horizontal R, G, and B subpixel lines in the RGB colored?depth frame. Since the nearby depth values are very close, the RGB colored?depth frame will exhibit nearly gray visual sensation. After color subpixels packing in the vertical direction, the vertical resolution of RGB colored?depth frame becomes one third of the original resolution. In Fig. 2, for example, the nine depth lines have been packed into three RGB colored? depth lines. For the most video coders, the coding and decoding processes are conducted in YCbCr color space. Therefore, we apply the RGB to YCbCr color space conversion as

[YCbCr=0.25680.50410.0979-0.1482-0.29100.43920.4392-0.3678-0.0714RGB+16128128] (1)

to transfer it to the YCbCr colored?depth frame [11]. It is noted that the sub?pixels in RGB space are with full resolution of (4, 4, 4). If the YCbCr space is with (4, 4, 4) format, the color space transformation will not change the depth results with about +/- 0.5 error due to the round?off errors in color space conversions. However, for the most video coders, the sub?pixels in YCbCr space could be in (4, 2, 0) or (4, 2, 2) format, where Cb and Cr components will be further downsampled. Even without coding errors, the YCbCr colored?depth frame might have slightly translation errors.

2.2 Centralized Texture??Depth Packing

Without loss of generality for frame?compatible packing, we assume that the vertical CTDP packing formats are desired. Then, we need to reduce the vertical resolutions of texture and depth separately such that the total packed resolution will remind the same, where the original horizontal resolution is H. If the reduction factors for texture and depth resolutions are a and b, we should choose reduction factors to satisfy [α+(1/3)β=1] to achieve the frame compatible requirement [10]. For example, the reduction factors (a = 3/4, b = 3/4) , (a = 5/6, b = 1/2), (a = 7/8, b = 3/8), (a = 11/12, b = 1/4), and (a = 15/16, b = 3/16) will satisfy the above frame compatible requirement. Fig. 3 shows the flowchart of the computation of generating the texure?5/6 CTDP format. First, we downscale the vertical resolution of texture and depth frames into five?sixths and one?second of the original resolution, respectively. By using the colored?depth concept, the resized depth frame with 1/2H can be further represented into RGB subpixels as suggested in Section 2.1 to reduce the vertical size to 1/6H. Then, we can split the depth frame evenly into two separated parts with the size of 1/12H. To make better coding efficiency and better 2D visualization, these two split colored?depth frames should be flipped vertically. The flipped depth frames will have better alignments to the texture frame and better visualization for 2D displays with visual shadow sensation. Finally, we obtain the texture?5/6 CTDP frame by combining the first flipped depth part (1/12H), the resized texture frame (5/6H), and the other flipped depth part (1/12H) from top to bottom sequentially.

The ratio of downscaling can also be changed to generate the other CTDP formats [12]-[15]. For example, the reduction ratio of the texture frame could be 7/8 or 15/16. For texture?7/8 and texture?15/16 reduction ratios, the vertical resolutions of depth frames will be respectively downscaled to 3/8 and 3/16 to satisfy (2). Except the resizing factor, the packing procedures for texture?7/8 and texture?15/16 are similar to that of texture?5/6. If we want to attain horizontal CTDP formats, all the resizing of texture and depth frame, the color?packed depth frame, slipping, and flipping procedures should be performed in the horizontal direction. The packed frame can be obtained by combining the first flipped depth part, the resized texture frame, and the other flipped depth part from left to right sequentially. The outlooks of the original texture, depth, and the CTDP frames with different ratios and different orientations are shown in Fig. 4. It is noted that in the proposed CTDP format, the width/height of the flipped depth part will be always in the horizontal/vertical CTDP format, which helps avoid the compression artifact in texture and depth boundary. Please refer to [13] for more details of the arrangement.

[Ddown(x,y)=Dorigin(?hor×x-(?hor-y),y)], (14)

[Ddown(x,y)=Dorigin(x,?ver×y-(?ver-x))]. (15)

Equ. (14) is utilized to down?sample the depth image in horizontal direction, while the down?sampling of the vertical direction follows (15). The slant line sampling pattern is suitable for down?sampling the depth image both in vertical and horizontal direction, which is shown in Fig. 11 with 2, 4, 8 down?sampling ratios.

With the down?sampling by the direct line pattern, the up?sampling function in de?packing procedure needs to be modified as:

Because of the pattern?based sampling strategy, the pixels of the up?sampled depth are directly copied from the LR depth if there are located at position of the direct line pattern.

4 Experimental Results

4.1 Performance Evaluation of CTDP Format with

Respect to 2DDP Format

In order to verify the coding performances of the proposed CTDP formats with respect to the 2DDP format, we conducted a set of experiments to evaluate performances of packing methods in cooperation with a specific video coder (AVS2) in terms of the peak signal?to?noise ratio (PSNR), bitrate qualities of the depacked texture and depacked depth frames, and their synthesized virtual views. In the experimental simulations, we use five MPEG 3D video sequences, which are Poznan Hall, Poznan Street, Kendo, Balloons, and Newspaper sequences as shown in Figs. 12a-12e, respectively.

The AVS2 coding conditions are followed by the instruction suggested by the AVS workgroup while the QPs are set to 27, 32, 38, and 45 for Intra frames [17]. Under All Intra (ai), Low Delay P (ldp), Random Access (ra) test conditions, Tables 1 and 2 show the average BDPSNR and BDBR [18] performance for different kinds of CTDP formats with respect to the 2DDP format achieved by AVS2. For calculating the PSNR of the 2DDP format, we first separate the texture and depth frames from the 2DDP frame and upsample them to the original image size W×H. By using the recovered texture and depth frames from 2DDP frame and the original uncompressed texture and depth frames, the PSNR can therefore be calculated. Similarly, the PSNR of CTDP format is calculated by using the texture and depth frames recovered from CTDP frame and the original uncompressed texture and depth frames. From Tables 1 and 2, we can see that the proposed texture?5/6, 7/8, and 15/16 CTDP formats have much better PSNR and bitrate saving in texture when comparing with the 2DDP format, which means our CTDP format can achieve better visual quality in 2D displays when only texture frames are viewed. In addition, the depth quality for CTDP formats will become worse while the resizing factors getting bigger. Besides the comparisons of original texture and depth achieved by different packing formats, we also compare the quality of synthesized virtual view with respect to the 2DDP format. It is noted that the reference synthesized virtual view for calculating the PSNR is also obtained by the original uncompressed texture and depth frames. The DIBR setting for virtual view synthesis is shown in Table 3. As to the quality of the synthesized virtual view, the texture?5/6 and 7/8 CTDP formats after the DIBR process show better BDPSNR and BDBR performances than 2DDP format. It is noted that all synthesized views do not perform any depth enhancement and depth preprocessing, and the hole filling used in the DIBR process is the simple background extension.

In summary, the texture qualities BDPSNR and BDBR in Tables 2 and 3 can be treated as the objective quality indices in 2D displays, while the virtual view qualities can be the objective quality indices in 3D displays. The results show that the proposed texture?5/6 and 7/8 CTDP format will be the better choices for the broadcasters. The texture?3/4 CTDP format has better 3D performance while texture?7/8 CTDP format achieves better 2D performance.

4.2 Performance Evaluation of Depth Enhancement for

CTDP Format

To verify the proposed depth enhancement mechanism, we first show the reconstructed depth from original and depth?enhanced CTDP formats. The RD curves for different ratios of CTDP formats are shown in Fig. 13. It can be seen that the proposed refined CTDP format can always achieve better performance. The gains between the depth?enhanced CTDP and the original CTDP formats are increased while the ratio of texture is increased.

For the subjective evaluation, the partial portions of the reconstructed depth for Shark sequence are shown in Fig. 14. It can be seen that the depth can be reconstructed well especially for the edge region by using the depth enhancements.

In the following, we will compare the synthesis results. The partial portions of the generated views are shown in Fig. 15. From the results, the proposed CTDP format can successfully preserve the edges well of the synthesis views without the jaggy noise.

4.3 Comparison with Different Depth Interpolation

Methods

The comparison results of different depth interpolation methods are shown in Table 4 for Shark sequence at all?intra (ai) coding condition with QP=32. The symbols of Bi and BC denote the bilinear and bi?cubic convolution interpolation methods, respectively. The methods of JBU [19] and FEU [20] are the texture?similarity based depth interpolation methods. The proposed depth up?sampling method has better PSNR and SSIM results for reconstructed depth images in vertical?11/12 CTDP and vertical?23/24 CTDP formats. For the vertical?5/6 CTDP format, the proposed depth up?sampling method can also provide better reconstructed depth images.

The comparison results of partial reconstructed depth with different depth interpolation methods are shown in Fig. 16. The reconstructed depth images of bilinear and bi?cubic convolution interpolation methods have serious jaggy noise among the edges. It can be seen that the proposed depth up?sampling method can outperform other methods with better edges.

5 Conclusions

In this paper, we proposed depth enhancement processes for CTDP formats [10]. The CTDP formats can be comfortably and directly viewed in 2DTV displays without the need of any extra computation. However, the CTDP formats slightly suffer from the depth discontinuities for high texture ratios. Comparing to the 2DDP format, the CTDP formats with the same video coding systems, such as AVS2 (RD 6.0) and HEVC [10], show better coding performances in texture and depth frames and synthesized virtual views. To further increase the visual quality, in this paper, the depth enhancement methods, including YCbCr calibration and texture?similarity?based depth up?sampling, are proposed. Experimental results reveal that the proposed depth enhancement can efficiently help to increase the depacking performances of the CTDP formats to achieve better reconstructed depth images and better synthesis views as well. With the aforementioned simulation results, we believe that the proposed depth enhanced CTDP depacking methods will be a greatly?advanced system for current 2D video coding systems, which can provide 3D video services effectively and simply.

References

[1] J.?F. Yang, H.?M. Wang, K.?I. Liao, L. Yu, and J.?R. Ohm, “Centralized texture?depthpacking formats for effective 3D video transmission over current video broadcasting systems,” IEEE Transactions on Circuits and Systems for Video Technology, submitted for publication.

[2] Dolby Laboratories, Inc. (2015). Dolby Open Specification for Frame?Compatible 3D Systems [Online]. Available:

[3] ITU. (2015). Advanced Video Coding for Generic Audio―Visual Services [Online]. Available: www.itu.int

[4] G. Sullivan, T. Wiegand, D. Marpe, and A. Luthra, “Text of ISO/IEC 14496?10 advanced video coding (third edition),” ISO/IEC JTC 1/SC 29/WG11, Redmond, USA, Doc. N6540, Jul. 2004.

[5] G. J. Sullivan, A. M. Tourapis, T. Yamakage, and C. S. Lim, “ISO/IEC 14496?10:200X/FPDAM 1,” ISO/IEC JTC 1/SC 29/WG11, Apr. 2009.

[6] T. Kanade and M. Okutomi, “A stereo matching algorithm with an adaptive window: theory and experiment,” IEEE Transactions on Pattern Analysis and Matching Intelligence, vol. 16, no. 9, pp.920-932, Sept. 1994. doi:10.1109/34.310690.

[7] K. Zhang, J. Lu, and G. Lafruit, “Cross?based local stereo matching using orthogonal integral images,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 19, no. 7, pp.1073-1079, Jul. 2009. doi: 10.1109/TCSVT.2009.2020478.

[8] S.?C. Chan, H.?Y. Shum, and K.?T. Ng, “Image?based rendering and synthesis,” IEEE Signal Processing Magazine, vol. 24, no. 6, pp. 22-33, Nov. 2007. doi: 10.1109/MSP.2007.905702.

[9] T.?C. Yang, P.?C. Kuo, B.?D. Liu, and J.?F. Yang, “Depth image?based rendering with edge?oriented hole filling for multiview synthesis,” in Proc. International Conference on Communications, Circuits and Systems, Chengdu, China, Nov. 2013, vol. 1, pp. 50-53. doi: 10.1109/ICCCAS.2013.6765184.

[10] Philips 3D Solutions, “3D interface specifications, white paper,” Eindhoven, The Netherlands, Dec. 2006.

[11] Studio Encoding Parameters of Digital Television for Standard 4:3 and Wide?Screen 16:9 Aspect Ratios, ITU?R BT.601?5, 1995.

[12] J.?F. Yang, K.?Y. Liao, H.?M. Wang, and Y.?H. Hu, “Centralized texture?depth packing (CTDP) SEI message syntax,” Joint Collaborative Team on 3D Video Coding Extensions of ITU?T SG 16 WP 3 and ISO/IEC JTC 1/SC 29/WG 11, Strasbourg, France, Doc. no. JCT3V?J0108, Oct. 2014.

[13] J.?F. Yang, K.?Y. Liao, H.?M. Wang, and C.?Y. Chen, “Centralized texture?depth packing (CTDP) SEI message,” Joint Collaborative Team on 3D Video Coding Extensions of ITU?T SG 16 WP 3 and ISO/IEC JTC 1/SC 29/WG 11, Geneva, Switzerland, Doc. no. JCT3V?K0027, Feb. 2015.

[14] J.?F. Yang, H.?M Wang, Y.?A. Chiang, and K. Y. Liao, “2D frame compatible centralized color depthpacking format (translated from Chinese),” AVS 47th Meeting, Beijing, China, AVS?M3225, Dec. 2013.

[15] J.?F. Yang, H.?M. Wang, K.?Y. Liao, and Y.?A. Chiang, “AVS2 syntax message for 2D frame compatible centralized color depth packing formats (translated from Chinese),” AVS 50th Meeting, Nanjing, China, AVS?M3472, Oct. 2014.

[16] H. C. Andrews and C. L. Patterson, “Digital interpolation of discrete images,” IEEE Transaction on Computers, vol. 25, no. 2, 1976.

[17] X.?Z. Zheng, “AVS2?P2 common test conditions (translated from Chinese),” AVS 46th Meeting, Shenyang, China, AVS?N2001, Sep. 2013.

[18] G. Bjontegaard, “Calculation of average PSNR differences between RD?curves,” Austin, USA, Doc. VCEG?M33 ITU?T Q6/16, Apr. 2001.

[19] J. Kopf, M. F. Cohen, D. Lischinski, and M. Uyttendaele, “Joint bilateral upsampling,” ACM Transaction on Graphics, vol. 26, no. 3, Article 96, Jul. 2007. doi:10.1145/1275808.1276497.

[20] S.?Y. Kim and Y.?S. Ho, “Fast edge?preserving depth image upsampler,” Journal of Consumer Electronics, vol. 58, no. 3, pp. 971-977, Aug. 2012. doi: 10.1109/TCE.2012.6311344.

Manuscript received: 2015?11?12

Biographies

YANG Jar?Ferr (jefyang@mail.ncku.edu.tw) received his PhD degree from the University of Minnesota, USA in 1988. He joined the National Cheng Kung University (NCKU) started from an associate professor in 1988 and became a full professor and distinguished professor in 1995 and 2007. He was the chairperson of Graduate Institute of Computer and Communication Engineering during 2004-2008 and the director of the Electrical and Information Technology Center 2006-2008 in NCKU. He was the associate vice president for Research and Development of the NCKU. Currently, he is a distinguished professor and the director of Technologies of Ubiquitous Computing and Humanity (TOUCH) Center supported by National Science Council (NSC), Taiwan, China. Furthermore, he is the director of Tomorrow Ubiquitous Cloud and Hypermedia (TOUCH) Service Center. During 2004-2005, he was selected as a speaker in the Distinguished Lecturer Program by the IEEE Circuits and Systems Society. He was the secretary, and the chair of IEEE Multimedia Systems and Applications Technical Committee and an associate editor of IEEE Transaction on Circuits and Systems for Video Technology. In 2008, he received the NSC Excellent Research Award. In 2010, he received the Outstanding Electrical Engineering Professor Award of the Chinese Institute of Electrical Engineering, Taiwan, China. He was the chairman of IEEE Tainan Section during 2009-2011. Currently, he is an associate editor of EURASIP Journal of Advances in Signal Processing and an editorial board member of IET Signal Processing. He has published 104 journal and 167 conference papers. He is a fellow of IEEE.

WANG Hung?Ming (ming@video5.ee.ncku.edu.tw) received the BS and PhD degrees in electrical engineering from National Cheng Kung University (NCKU), Taiwan, China in 2003 and 2009, respectively. He is currently a senior engineer of Novatek Microelectronics Corp., Taiwan, China. His major research interests include 2D/3D image processing, video coding and multimedia communication.

LIAO Wei?Chen () received the BS and MS degrees in electrical engineering from National Cheng Kung University (NCKU), Taiwan, China in 2013 and 2015, respectively. His major research interests include image processing, video coding and multimedia communication.

Depth Enhancement Methods for Centralized Texture?Depth Packing Formats

优秀范文