HMM手写识别系统中基于crr驱动的书写者识别与自适应外文翻译资料

 2023-04-05 09:04

OCR-Driven Writer Identification and Adaptation in an HMM Handwriting Recognition System

Abstract

We present an OCR-driven writer identification algorithm in this paper. Our algorithm learns writer-specific characteristics more precisely from explicit character alignment using the Viterbi algorithm and shows significant reduction of close-set writer identification error rates, compared with the GMM-based method. With writersrsquo; identities retrieved, we improve the performance of handwriting recognition using the HMM trained adapted on the training data of that writer. In our system, writer identification and OCR are highly interactive. They improve the performance of each other and thus show close approximation of supervised text-dependent writer identification and writer-dependent HMM handwriting.

Keywords Handwriting recognition, writer identification, hidden markov model [1][2]

Ⅰ.INTRODUCTION

We present an OCR-driven approach to writer identification in handwritten document images. Although both writer identification and HMM handwriting recognition have their own applications (writer identification techniques can be applied to forensic signature verification, whereas handwriting recognition is still a challenge problem in OCR community with limited applications in constrained conditions, and has potential of being applied to automatically transcribing unconstrained handwritten documents in the future), they are in fact two highly interactive problems. On the one hand, writer dependent (WD) HMM systems are known to have significantly better performance than writer independent (WI) HMM systems. On the other hand, with labeled references provided, grouping and aligning instances of the same character become more reliable than unsupervised writer identification methods, and the error rate of writer identities can be reduced. Our idea is motivated by interactivity of writer identification and handwriting recognition. Multiple passes of OCR decoding are deployed in our HMM handwriting recognition system for this purpose (Fig. 1). The input (handwritten documents) is decoded with WI HMM to create preliminary transcriptions, with character boundaries indicated by the optimal HMM state sequence. For each character, writer labels are given by a component classifier of all writers in training. Character-level decisions are further fused at each line or page of text, if applicable. Finally, the input is decoded with the WD HMM of the identified writer. The following is an overview of related works in writer identification and OCR-related applications. In [1], texture feature and k Nearest Neighbor classifiers are applied to writer identification in skew-corrected document images. In [1], image features extracted from macro and micro scales are investigated. The writer similarity score is computed using distance-based measures. With the success of HMM in handwriting recognition, speaker recognition techniques such as GMM [3][4] and GMM-SVM [5] can also be applied to writer identification. In [6], an error rate of 1.5% is obtained from close-set identification of over 650 writers from 1500 pages of the IAM data set [7]. Only a few efforts have been made in using writer identification to improve handwriting recognition or using handwriting recognition to improve writer identification. In [8], a GMM-based writer identification algorithm is applied to selecting WD models for keyword spotting. MAP adaptation is used to build WD models.

In our prior work [9], we presented Arabic handwriting recognition using WD HMM created from MAP adaptation, and described a text-independent writer identification algorithm to select WD codebooks. In this paper, we tested the GMM-based writer identification method using the handwriting recognition system based on WD HMM selection [9] and obtained a significant improvement in writer identification performance and handwriting recognition, compared with the textdependent writer identification method described in [9]. We implemented the GMM –based writer identification method and showed advantage over the global features-based writer identification method [9]. But the OCR-driven textdependent writer identification algorithm described in this paper shows substantially lower identification error rates than both of the text-independent methods. The impact of incorrectly identified writers on writer-dependent OCR performance is investigated and is proved to be negligible by our experiments.

Ⅱ.OCR-DRIVEN WRITER IDENTIFICATION AND WRITER DEPENDENT OCR

A. OCR-driven Writer Identification

Our OCR-driven writer identification method is textdependent, i.e., the reference of handwriting is required to perform identification. Since the transcribed reference is only available for our training data, we decode our test set and take the OCR hypothesis as the reference.

During training, given a line image from the training data and the corresponding reference of the line, first, we get the boundary for every character in the reference. This is done by finding the optimal state sequence for the WI HMM built from the reference using the Viterbi algorithm [12]. Then, we create a component writer classifier for each distinct character using directional element features [13] computed from 16 non-overlapping bins (4 4) of each character image with white space on top and bottom chopped and the Support Vector Machine (SVM) with the radial basis kernel. An SVM classifier only solves two-class problem. For n classes, we need to build n(n-1)/2 binary classifiers and take the label that is obtained for the most of time from these classifiers. Thus, the component classifier of a character

剩余内容已隐藏,支付完成后下载完整资料


附录Y 外文原文

OCR-Driven Writer Identification and Adaptation in an HMMHandwriting Recognition System

Abstract—We present an OCR-driven writer identification algorithm in this paper. Our algorithm learns writer-specific characteristics more precisely from explicit character alignment using the Viterbi algorithm and shows significant reduction of close-set writer identification error rates, compared with the GMM-based method. With writersrsquo; identities retrieved, we improve the performance of handwriting recognition using the HMM trained adapted on the training data of that writer. In our system, writer identification and OCR are highly interactive. They improve the performance of each other and thus show close approximation of supervised text-dependent writer identification and writer-dependent HMM handwriting.

Keywords- Handwriting recognition, writer identification, hidden markov model [1][2]

  1. INTRODUCTION

We present an OCR-driven approach to writer identification in handwritten document images. Although both writer identification and HMM handwriting recognition have their own applications (writer identification techniques can be applied to forensic signature verification, whereas handwriting recognition is still a challenge problem in OCR community with limited applications in constrained conditions, and has potential of being applied to automatically transcribing unconstrained handwritten documents in the future), they are in fact two highly interactive problems. On the one hand, writer dependent (WD) HMM systems are known to have significantly better performance than writer independent (WI) HMM systems. On the other hand, with labeled references provided, grouping and aligning instances of the same character become more reliable than unsupervised writer identification methods, and the error rate of writer identities can be reduced. Our idea is motivated by interactivity of writer identification and handwriting recognition. Multiple passes of OCR decoding are deployed in our HMM handwriting recognition system for this purpose (Fig. 1). The input (handwritten documents) is decoded with WI HMM to create preliminary transcriptions, with character boundaries indicated by the optimal HMM state sequence. For each character, writer labels are given by a component classifier of all writers in training. Character-level decisions are further fused at each line or page of text, if applicable. Finally, the input is decoded with the WD HMM of the identified writer. The following is an overview of related works in writer identification and OCR-related applications. In [1], texture feature and k Nearest Neighbor classifiers are applied to writer identification in skew-corrected document images. In [1], image features extracted from macro and micro scales are investigated. The writer similarity score is computed using distance-based measures. With the success of HMM in handwriting recognition, speaker recognition techniques such as GMM [3][4] and GMM-SVM [5] can also be applied to writer identification. In [6], an error rate of 1.5% is obtained from close-set identification of over 650 writers from 1500 pages of the IAM data set [7]. Only a few efforts have been made in using writer identification to improve handwriting recognition or using handwriting recognition to improve writer identification. In [8], a GMM-based writer identification algorithm is applied to selecting WD models for keyword spotting. MAP adaptation is used to build WD models.

In our prior work [9], we presented Arabic handwriting recognition using WD HMM created from MAP adaptation, and described a text-independent writer identification algorithm to select WD codebooks. In this paper, we tested the GMM-based writer identification method using the handwriting recognition system based on WD HMM selection [9] and obtained a significant improvement in writer identification performance and handwriting recognition, compared with the textdependent writer identification method described in [9]. We implemented the GMM –based writer identification method and showed advantage over the global features-based writer identification method [9]. But the OCR-driven textdependent writer identification algorithm described in this paper shows substantially lower identification error rates than both of the text-independent methods. The impact of incorrectly identified writers on writer-dependent OCR performance is investigated and is proved to be negligible by our experiments.

  1. OCR-DRIVEN WRITER IDENTIFICATION AND WRITER DEPENDENT OCR

A. OCR-driven Writer Identification

Our OCR-driven writer identification method is textdependent, i.e., the reference of handwriting is required to perform identification. Since the transcribed reference is only available for our training data, we decode our test set and take the OCR hypothesis as the reference.

During training, given a line image from the training data and the corresponding reference of the line, first, we get the boundary for every character in the reference. This is done by finding the optimal state sequence for the WI HMM built from the reference using the Viterbi algorithm [12]. Then, we create a component writer classifier for each distinct character using directional element features [13] computed from 16 non-overlapping bins (4 4) of each character image with white space on top and bottom chopped and the Support Vector Machine (SVM) with the radial basis kernel. An SVM classifier only solves two-class problem. For n classes, we need to build n(n-1)/2 binary classifiers and take the label that is obtained for the most of time from these classifi

剩余内容已隐藏,支付完成后下载完整资料


资料编号:[590405],资料为PDF文档或Word文档,PDF文档可免费转换为Word

您需要先支付 30元 才能查看全部内容!立即支付

课题毕业论文、文献综述、任务书、外文翻译、程序设计、图纸设计等资料可联系客服协助查找。