Sorry, you need to enable JavaScript to visit this website.

facebooktwittermailshare

Segmentation of Text-Lines and Words from JPEG Compressed Printed Text Documents Using DCT Coefficients

Abstract: 

Segmenting a document image into text-lines and words finds applications in many research areas of DIA(Document Image Analysis) such as OCR, Word Spotting, and document retrieval. However, carrying out segmentation operation directly in the compressed document images is still an unexplored and challenging research area. Since JPEG is most widely accepted compression algorithm, this research paper attempts to segment a JPEG compressed printed text document image into text-lines and words, without fully decompressing the image. During JPEG compression, the non-overlapping 8x8 DCT blocks encode text contents of two adjacent text-lines and words without leaving any visible clue for segmentation. This paper proposes two stage algorithms for segmentation of text-lines and words by intelligently analyzing approximate text-line and word boundaries using the DC coefficient during the first stage. In the second stage, AC coefficients of selected DCT blocks are used to extract exact line and word boundaries. The experimental results on a JPEG compressed document dataset(with variable spacing between lines and words, different font sizes and styles) shows a good computational performance.

up
1 user has voted: Bulla Rajesh

Paper Details

Authors:
Mohammed Javed, P Nagabhushan, Watanabe Osamu
Submitted On:
7 April 2020 - 5:04am
Short Link:
Type:
Poster
Event:
Presenter's Name:
Mohammed Javed
Paper Code:
DCC2020-181
Session:
Posters
Document Year:
2020
Cite

Document Files

DCC2020 Paper ID 181

(61)

Subscribe

[1] Mohammed Javed, P Nagabhushan, Watanabe Osamu, "Segmentation of Text-Lines and Words from JPEG Compressed Printed Text Documents Using DCT Coefficients", IEEE SigPort, 2020. [Online]. Available: http://sigport.org/5001. Accessed: Aug. 13, 2020.
@article{5001-20,
url = {http://sigport.org/5001},
author = {Mohammed Javed; P Nagabhushan; Watanabe Osamu },
publisher = {IEEE SigPort},
title = {Segmentation of Text-Lines and Words from JPEG Compressed Printed Text Documents Using DCT Coefficients},
year = {2020} }
TY - EJOUR
T1 - Segmentation of Text-Lines and Words from JPEG Compressed Printed Text Documents Using DCT Coefficients
AU - Mohammed Javed; P Nagabhushan; Watanabe Osamu
PY - 2020
PB - IEEE SigPort
UR - http://sigport.org/5001
ER -
Mohammed Javed, P Nagabhushan, Watanabe Osamu. (2020). Segmentation of Text-Lines and Words from JPEG Compressed Printed Text Documents Using DCT Coefficients. IEEE SigPort. http://sigport.org/5001
Mohammed Javed, P Nagabhushan, Watanabe Osamu, 2020. Segmentation of Text-Lines and Words from JPEG Compressed Printed Text Documents Using DCT Coefficients. Available at: http://sigport.org/5001.
Mohammed Javed, P Nagabhushan, Watanabe Osamu. (2020). "Segmentation of Text-Lines and Words from JPEG Compressed Printed Text Documents Using DCT Coefficients." Web.
1. Mohammed Javed, P Nagabhushan, Watanabe Osamu. Segmentation of Text-Lines and Words from JPEG Compressed Printed Text Documents Using DCT Coefficients [Internet]. IEEE SigPort; 2020. Available from : http://sigport.org/5001