Analysing the Methods of Dzongkha Word Segmentation

Parshu Ram Dhungyel; Jānis Grundspeņķis

Analysing the Methods of Dzongkha Word Segmentation

Applied Computer Systems 2017
Parshu Ram Dhungyel, Jānis Grundspeņķis

In both Chinese and Dzongkha languages, the greatest challenge is to identify the word boundaries because there are no word delimiters as it is in English and other Western languages. Therefore, preprocessing and word segmentation is the first step in Dzongkha language processing, such as translation, spell-checking, and information retrieval. Research on Chinese word segmentation was conducted long time ago. Therefore, it is relatively mature, but the Dzongkha word segmentation has been less studied by researchers. In the paper, we have investigated this major problem in Dzongkha language processing using a probabilistic approach for selecting valid segments with probability being computed on the basis of the corpus.

Keywords
Dzongkha word segmentation, maximal matching, n-gram, natural language processing
DOI
10.1515/acss-2017-0008

Dhungyel, P., Grundspeņķis, J. Analysing the Methods of Dzongkha Word Segmentation. Applied Computer Systems, 2017, 21, pp.61-65. ISSN 2255-8683. e-ISSN 2255-8691. Available from: doi:10.1515/acss-2017-0008

Publication language
English (en)

Publication Type
Scientific article indexed in SCOPUS or WOS database
Funding for basic activity
Unknown
Field of research
2. Engineering and technology
Sub-field of research
2.2 Electrical engineering, Electronic engineering, Information and communication engineering
ID: 25601