From CLS Wiki

Main: Dictionaries

I need help here. I hope someone can fill this in.

Existing dictionaries

In production


English Citumbuka Abbrev. Noun Zina Zin. Pronoun munjiramumalo Munjir. Adjective Mulongosoli Mulos. Verb Muyowoyi Muy. Adverb Msazgirapo Msa. Conjunction Mgumannyiski Mgu. Preposition Mulinda Mul. Affix mubatikira Mub. Ideophone mupulikikwiro Mup. Interjection Nchemerezgo Nchem. Prefix mubatikiranyuma Mubny. Suffix Mubatikiranthazi Mubnth Vowel Lemba la lizgo Lizg. Consonants Lemba lambula lizgo Lamb. Letter Lemba Lemb. Word Mazgo Mazg. Idiom Nthalika Nthal.


Style sheet

Below the surface, a dictionary is a complicated, carefully crafted piece of work. A user expects to find certain kinds of information at a certain place and if the editors display inconsistencies, this immediately becomes a source of irritation for the user.

Dictionary structure

The style sheet of a dictionary can be likened to a phrase structure grammar. Things occur in a certain order and context

dictionary -> frontmatter body backmatter
frontmatter -> prose
body -> article+
backmatter -> tables, charts and illustrations

In the near future we will be focusing mainly on the article structure. I am assuming that the intended audience (future readers) has already been identified and we know what kind of dictionary we want to make, one for elementary, secondary or college use. I have all along assumed that the dictionaries will be for the schools.

Article structure

The basic article consists of:

article -> headword/lemma pronunciation part-of-speech definition+

There are variants depending on which part of speech we are creating an article for. If it is a verb, it tends to look like this:

article -> headword/lemma pronunciation Ku-, pos/mne. definition+ ??

gondola /ttk/ Ku-, mne. onani tanthauzo la godomala. kunekwi.: gundula

That is, there is a Ku- after the pronuciation.

I wrote ?? at the end of the formalized representation of the structure because I do not know what that last element is. It almost looks like a spelling variant.

If the article presents a noun, we have:

article -> headword/lemma loan-word? pronunciation pos/dzi. Noun-Class (possible plural information)? (definition, synonym?, example*)+ spelling-variant? language-2*

gondolosi /kt/ dzi. U-Ma- (zambi. gondolosi) 1 chitsamba choyanga chokhala ndi mtsitsi wa mnofunofu wotsekemera ndi wonunkhira 2 mankhwala wopangidwa kuchokera ku chitsamba cha gondolosi wothandiza amuna kuti azikhala ndi ukala wambiri.

This time we have nothing between the pronunciation and the part of speech (pos) information, but we have grammatical information specific to nouns between the part of speech and the definition section. We also see that the plus-sign (+) after the definition accounts for one or more possible definitions.

The question mark (?) after the (possible plural information) signifies that the field is optional (zero or one). Note that a star (*) would be less appropriate since it means "zero or more" and we don't want more than one.

The structure can get a bit more complicated since we have homographs, words that are spelled the same, but regarded as separate headwords/lemma, for instance:

guda1 /kt/ dzi. Li-Ma- (zambi. maguda) 1 chipanda kapena chikho chomwe anthu amatengera madzi paulendo wautali ndipo chimakhala ndi pakamwa papang'ono potseka ndi chivimbo. 2 onani tanthauzo 2 la gonero
guda2 mve. onani tanthauzo la gonkhonono.

There are two things I want to draw attention to here:

The index number first. This implies that you have a set of criteria that you can use to separate homonyms (homographs). It seems that the principle being used here is one of part-of-speech. That is a sound one and can be applied objectively in most cases. If one were to use semantic criteria it can get messy.

The second point is simply an illustration for why we want to use a specialized piece of editing software for dictionaries. It id tedious to count the senses by hand and is error prone if you have four and you take out the second one. That bumps the numbering for 3 and 4 up to 2 and 3.

Those kind of mechanical operations are best left to computer applications. It guarantees a greater element of consistency.

What we want to do is to define the categories. I have set out some of them above. Then we define how they should be represented in the final product separately. All we have to do is to place the right information in the right category and leave the formatting to a computer application.

I will get started on that as soon as we have our set of categories. This is top priority so that we can go forward in an orderly fashion.


Abbreviations are more than just that, short forms for words. They reflect the grammtical structure that you will assigning to words. They also reflect the set of stylistic and usage labels that you can use. It is important to be consistent and a table with the abbreviations that you have used will usually be found in the front matter of the dictionary as a reading aid for the user.

I have tried to assemble a "rough and ready" list of abbreviations that have been used in the Chichewa dictionary.

Character and lemma styles

You will have to make a decision about how to enter the entries when it comes to capitalization.

Retrieved from
Page last modified on November 28, 2007, at 12:35 PM