Difference between revisions of "User:GregF"

From Rangjung Yeshe Wiki - Dharma Dictionary
Jump to navigation Jump to search
Line 1: Line 1:
{{UserWorkbench}}
+
Microstructure of the Lotsawa Workbench (i.e., internal structure of any of entries)
  
Microstructure of the Lotsawa Workbench *i.e., internal structure of any of entries)
+
A. Entry identity
 
 
# Entry identity
 
Each entry of a dictionary must be identifiable, both by references from within the dictionary and by references from outside (e.g. the accompanying grammar). If the lexical database is a relational database, then each entry will have an ID in the technical sense. However, since that is a user-unfriendly number, it is normally not suitable for human users. Instead, for the identification of an entry, these rely on the following pieces of information:
 
 
<br /><br />1. Lemma
 
<br /><br />1. Lemma
The lemma is given in standard orthographic representation. Sometimes, the lemma representation itself is used for additional purposes, e.g. to indicate stress, syllable boundaries or word-break points for hyphenation. However, the present comprehensive microstructure provides dedicated fields for such purposes.
+
The lemma is given in standard wylie representation. A transliteration into unicode Tibetan fonts will be automatically generated.
If the lemma is a segmental sign, but not an independent word form, it is flanked by a hyphen (or two hyphens) at the side where it is bound, like English cran- and -ize. If it is a morphonological process or a suprasegmental sign, such as German metaphony marking plural or Yucatec Maya high tone marking deagentive formation, a suitable notation has to be developed. Such conventions are explained in the appropriate main section of the dictionary.
+
<br /><br />2. Homonym/Homophone number
##2. Homonym number
+
Homonyms and homophones are separate entries distinguished by numbers. Homonyms have the same spelling and the same pronunciation but have different meanings. Homophones have the same pronunciation but have different meanings and may have different spellings. This category is useful in case disambiguation is important. This information can be particularly useful for the interpretation of misspelled words.
Homonyms are, of course, separate entries distinguished by numbers; see the separate section for details. The same goes for the readings of a polysemous entry; see next item.
+
<br /><br />3. Sense number
##3. Sense number
+
This field documents each reading of a polysemous item. These readings are numbered from 1 (mother entry) to n.
As shown in the section on lexical relations, a relational lexical database provides an elegant way of both keeping the distinction between homonymy and polysemy flexible and of providing its own partial microstructure for each of the readings of a polysemous item. These readings are numbered from 0 (mother entry) to n.
+
<br /><br />4. Citation form
##4. Citation form
+
The lemma is usually given in a generic form conformed with the usage of textual corpus it is drawn from. Most of the time, the content field 4. is identical to the content of field 1. This field is used to cite the generic form of inflected lemmas or dependent forms (prefixes, suffixes, roots, stems, generic grammatical particles)
Normally, the lemma should be given in the citation form that is traditional in the speech community. In that case, the content of this field is identical to the content of field #1. However, the dictionary also contains lemmas – especially dependent forms (roots, stems, affixes ...) that would not be cited in their naked form by non-linguists. In such cases, the field ‘citation form’ must be filled in. For instance, Yucatec Maya verb roots are never cited as such, but with a default stem extension, for instance: lemma kul- (‘sit’), citation form: kutal. This field is also the reference point for field #6 below.
+
<br /><br />B. Expression
#B. Expression
+
The expression of a lemma can be a phonological or graphemic representation.
In every natural language that is at all represented in a dictionary, an expression has at least two significantia, a phonological and a graphemic one.
+
<br /><br />5. Phonological representation
##5. Phonological representation
+
The phonological representation is given in IPA. This is field is useful to provide valuable information when the phonological representation is not derivable by rule from the orthographic representation.
The phonological representation is given in IPA. If the citation form differs from the lemma, this field may refer to either of them. In practice, filling in this field may be limited to such cases where the phonological representation is not derivable by rule from the orthographic representation.
+
<br /><br />6. Sound
##6. Sound
+
This field is meant to provide a link to a sound file.
This is a link to a sound file. There are at least two possibilities here:
+
<br /><br />7. Phonological variants
###1. There is a separate sound file for each lemma. This presupposes that there is a representation – a citation form – of the lemma that may be pronounced naturally in isolation. Consequently, what is pronounced here is the content of field #4.
+
This field accounts for idiosyncratic phonological variation and possible allophones
###2. If there is no such sound file for each lemma, this field may contain one or more pointers to text recordings. Technically, such a pointer identifies a sound file and specifies a start point and an end point contained in that file, in milliseconds. It will typically be a sound file associated with one of the examples (see #31 below).
+
<br /><br />8. Orthographic variants
##7. Phonological variants
+
The standard graphemic representation serves as lemma. This field is used to document alternative spellings resulting from earlier or unstandardized usage. 
This field accounts for idiosyncratic phonological variation. For instance, the first phoneme of economic may be /ɛ/ or /i/.
+
<br /><br />C. Language variety
Other variants are derivable by phonological rule. For instance, the rule of syncope in German predicts that if there is a lemma Wanderer, there will be a variant Wandrer. This rule is stated in the phonology section of the dictionary and then renders superfluous the enumeration of all the variants it generates.
+
<br /><br />9. Dialect
8. Orthographic variants
+
This field is used to record forms that do not belong to standard Tibetan as found in the corpus of Buddhist literature or Central Tibetan literature. By default, the value of this field is ‘standard’.
The standard graphemic representation already serves as lemma. This field is needed for alternative spellings found in the corpus. For example, the lemma encyclopedia contains encyclopaedia in this field. The lesser the degree to which the language is standardized, the more variation there is in available texts (e.g. of earlier periods); and the fewer dictionaries there are for the language, the more important does it become to display the variants in this field.
+
<br /><br />10. Sociolect
Again, such orthographic variants which are based on regular phonological variants do not need to be noted individually.
+
This field can be used to indicate that the lemma is used by specific groups (e.g., age group), Buddhist traditions, or professions.
#C. Language variety
+
<br /><br />11. Style
The language that constitutes the object of the description including the dictionary determines the corpus set up for the research. It must be defined in the introduction to the dictionary. The definition will necessarily, and explicitly, exclude certain kinds of linguistic variation (cf. the section on variation). For instance, diachronic variation may be limited to one of the stages traditionally defined for the language. All the variation not explicity excluded will be represented in the corpus and have to be categorized. This information is also called diasystematic marking.
+
This field documents style, register, connotations and any kind of pragmatic information. Relevant values include ‘honorific’, ‘formal’, ‘vulgar’.  
The possible contents of the following fields may be defined as range sets.
+
<br /><br />12. Stage – Time period
##9. Dialect
+
This field is mainly used for archaic language.
Assuming that the lexicon is not confined to one dialect, dialect appurtenance of the lemma is indicated here. One possible value of this field is ‘standard’.
+
<br /><br />D. Structure
##10. Sociolect
+
<br /><br />13. Tibetan syntactic category
Again, this will usually be marked only if the lexical item is special. Relevant values include particular age groups or professions.
+
This field contains information about the Tibetan syntactic category under which the lemma is classified.
##11. Style
+
<br /><br />14. Syntactic and morphological category
This concerns style, register, connotations and any kind of pragmatic information. The same restrictions as in the previous fields apply. Relevant values include ‘ritual’, ‘formal’, ‘vulgar’. Cf. Wahrig 1973, ch. 4.
+
This field refers to subcategorys of a part of speech, e.g. noun, proper noun, transitive verb, etc. The taxonomy of these categories comprise standard syntactic categories.
##12. Stage
+
N.B. A lemma which belongs to diverse syntactic categories is considered polysemous. Each category then constitutes a record.
A lexicon is usually confined to one stage of a language. Other stages may come in in two ways:
+
The morphological categories to be specified here include noun class, gender, possessive class, verbal voice, inflection class. An inflected word (usually a verb) may fall into diverse morphological categories at once, e.g. voice x, conjugation class y. Some may be syntactically relevant lexical classes such as the gender of a noun, others may be purely morphological classes such as inflection classes.  
• If the text corpus includes older texts, it may feature obsolete items.
+
<br /><br />15. Morphological structure
• Because of the presence of diachrony in synchrony, some elements of the inventory of a given state are archaic, others are current, others are fashionable.
+
This field contains the constituents of the lemma. In the case of a nominal compound, they represent various stems. In the case of a derivative (e.g., verbal forms), they represent a stem and a derived form. Since the items listed here can be identical to certain lemmas of the database, it is encouraged to use hyperlinks.
#D. Structure
+
<br /><br />16. Word formation
This section of the microstructure concerns both the internal structure of the stem representing the lexeme its inflectional and derivational morphology – and its distribution in syntax and phraseology. The concepts used are those introduced systematically in the grammar coupled with the lexicon; and they will appear in print in the grammar section of the published dictionary. Whenever such a term appears in a lexical entry, reference to that grammar (section) is implied. Cf. Hausmann 1977, ch. 6, Wahrig 1973, ch. 2.
+
This field documents the word-formation process at the origin of the lemma stem, e.g. reduplication, bahuvrīhi, causative, denominal, intensive, etc.  
##13. Proper name
+
<br /><br />17. Derivatives
What is meant here is the proper name of a grammatical morpheme. For instance, the proper name of the English suffix -ize is ‘verbalizer’, and the proper name of English 's is ‘(Saxon) genitive’. Consequently, the possible contents of this field are unique (i.e. there is no range set), and only a portion of the entries of the lexical database will be specified for this field, viz. the grammatical formatives.
+
In this field, lemmas that have the current lemma in their field 15 are automatically referenced.  
##14. Syntactic category
+
<br /><br />18. Construction
From among the grammatical categories of a lexical item, this field is dedicated to its syntactic category qua distributional category (for morphological categories see #18). This is understood as a narrow subcategory of a part of speech, e.g. ‘proper noun’, 'transitive verb with additional prepositional complement'. The taxonomy implied here will be explained in the grammar.
+
This field contains information about the syntactic and semantic construction frame. For example, different syntactic constructions for verbs with regard to their complement can be documented here. This field is used to provide specify the information contained in field 14.  
A lemma which belongs to diverse syntactic categories is considered polysemous. Each category then constitutes a record.
+
<br /><br />19. Phraseology
##15. Morphological structure
+
This field lists relevant collocations in which the lemma is found. These include fixed expressions, such as technical terms, common phrases, idioms, and proverbs. If these phrases have lemma status, a link will be automatically generated.
This field contains the immediate constituents of the lemma stem; as long as binarism obtains, there are two of them. In the case of a compound, they are two stems; in the case of a derivative, they are a stem and some derivational operator which may or may not be segmental. The items listed there are identical to certain lemmas of the database.
+
<br /><br />E. Meaning
Database solution
+
<br /><br />20. Meaning definitions
relational set up a cross-table for derivational relations: column 1: ID of complex stem; column 2: ID of first constituent; column 3: ID of second constituent
+
This field contains the lexicographer's definition of the lemma.  
free field structure hyperlinks from the constituents to their target records
+
N.B. In the case of polysemous lexemes, each sense of the lemma has only one definition. Each sense must therefore be documented through the procedure described here.
##16. Word formation
+
<br /><br />21. Semantic classes
This field contains the technical term for the word-formation process that formed the lemma stem, e.g. bahuvrihi, causative, denominal, deverbal, intensive etc. Possible entries in this field are taken from a range set defined in the grammar, where the word-formation processes of the language are dealt with systematically.
+
Each lexical item at least those with a lexical meaning – belongs to one or more semantic classes. For instance, bear is an animal, anger is an emotion, bsang mchod is a ritual, etc. Even a simple word with a single sense may belong to several semantic classes.
In this field, the last word formation process applied is indicated, i.e. the process which was applied to the components of field #15 to form the stem of the lemma. In the case of a derivationally complex lemma, other word formation processes may have created stems that are part of it, in particular those of field #15. Such processes are not indicated here, since they may be seen by following the links of the latter field.
+
<br /><br />22. Semantic relations
##17. Derivatives
+
This field informa user about mutual lexical relations to other lemmas which have the current lemma in the corresponding field.  
In this field, the set of lemmas is referenced which have the current lemma in their field #15. Thus, references between the current field and that field are mutual.
+
Common relations include:
Database solution
+
*synonymy,
relational the cross-table already mentioned for field 15 provides the content both for that field and for the present field
+
*hyponymy/hyperonymy,
free field structure series of hyperlinks (converse to the ones of #15) to each of the derivative lemmas
+
*cohyponymy: antonymy, converse relation, minimal contrast,
An alternative would be not to have entries for productively derived stems in the dictionary. Then the present field would contain the derivation schemata (of #16) which are applicable to the lemma stem.
+
*part-whole relation.
##18. Morphological categories
+
<br /><br />23. Encyclopedic information
The nature of the inflectional categories to be specified here depends on the language. Examples are noun class, gender, possessive class, verbal voice, inflection class. An inflecting word of a language may fall into diverse morphological categories at once, e.g. voice x, conjugation class y. Some may be syntactically relevant lexical classes such as the gender of a noun, others may be purely morphological classes such as inflection classes. It is practical to set up a separate field for each of these categories.
+
This field contains information on the concept communicated by the lemma, for example, in relation to a doctrinal, ritual, or cultural background.  
##19. Irregular inflection
+
<br /><br />24. Picture
If the stem has inflected forms not derivable by rules pertaining to its inflection class, those forms are listed here. They may be both stem allomorphs such as worse, appearing in this field of the lemma bad, and irregular forms of the inflection paradigm, e.g. oxen appearing in this field of the lemma ox.
+
This field includes visual information about the lemma.
##20. Construction
+
<br /><br />F. Genetic-historical information
This field contains the syntactic and semantic construction frame (for a verb: its valency frame), including selection restrictions. A case in point are different constructions of complement for complement-taking verbs. This is a specification of the information contained in #14. It should be represented by a formal notation, e.g. [ ~ X ]Y, where ~ indicates the position of the lemma, X represents relevant syntactic constituents or properties of the context, and Y is the syntactic category of the construction.
+
<br /><br />25. Origin and cognates
##21. Phraseology
+
This field is used to document loans from other languages as well as morphologically or semantically related words from genetically related languages.
This field lists collocations in which the lemma is involved. These may be any kind of fixed expressions, including phrases, idioms and proverbs (cf. Bergenholtz & Tarp, ch. 7.2). If one has decided to bestow lemma status to such complex expressions, then this field contains links to such lemmas.
+
<br /><br />26. Etymology
#E. Meaning
+
<br /><br />27. Examples
##22. Meaning definitions
+
An dictionary example aims at illustrating a specific sense or construction. Ideally, each word sense should be illustrated with a relevant example, particularly in the case of non-generic lemmas, such as technical terms, idiomatic usage, etc.
Semantic information on the lemma is provided in different languages and from different points of view. The sum of the information contained in this subset of fields is highly redundant. It is, however, useful for different kinds of dictionaries to be output from the database.
+
A dictionary example therefore has two main functions:
The basis of the methodology of dictionary definitions is the logic and methodology of the definition in general. See the website devoted to this topic. For the present purpose, the meaning is specified in plain prose. What is easily formalizable about it is relegated to other fields of the lexical entry, in particular 24 and 25.
+
#It produces referenced evidence for the dictionary entry with regard to the semantic definition, the grammatical categorization, the stylistic marking, etc.
In specifying properties of an argument of a relational lexeme, care must be taken to distinguish selection restrictions for a possible argument from semantic features of the lexeme itself. For instance:
+
#It facilitates the understanding of the described lemma in the user’s texts.
a. anziehen (tr.vb.) put on [clothes]
+
As a consequence, a dictionary example should be typical:
b. sich anziehen (intr.vb.) put on clothes
+
*It illustrates exactly the described sense or construction of a lemma,
In the German example a), ‘[clothes]’ represents a selection restriction concerning the direct object of a transitive verb. In example b) instead, ‘clothes’ represents part of the meaning of an intransitive verb. If the brackets were missing in a), it would not be clear that ‘clothes’ does represent a selection restriction on a direct object.
+
*It represents a common collocation,
Native definition
+
*It is as simple and short as possible without any unnecessary  grammatical/semantic/stylistic complications.
This field contains a specification of the meaning of the lemma in plain prose of the same language, as it would be the case in a monolingual dictionary.
+
<br /><br />G. Methodology
User language definitions
+
<br /><br />28. Bibliographical references
The following fields specify the meaning in a set of languages that may be relevant in the working context. These may include the following languages:
+
Information on the lemma may be included in published sources, primary and secondary. In some specific cases (e.g., technical terms, specialized terminology, etc.), it can be useful to list bibliographical references to further document a specific sense of the lemma.
1. English, because that is the general lingua franca in which either the entire dictionary or, at least, extracts from it may be published;
+
<br /><br />29. Comment
2. the regional lingua franca, i.e. the language in which the linguist begins his fieldwork and in which the dictionary may be published, too;
+
This field contains any additional information about the lemma.  
3. the native language of the linguist, because that is the language he fully controls.
+
<br /><br />30. Problems
To the second of these fields may be added another one, called ‘native translation’, which contains literally the explanation that the informant gave. This differs from field #22.2, which generally contains the lexicographer's definition.
+
This field is used to list questions that need to be addressed in future lexicographic work. The information provided by answers and subsequent investigations can be integrated into the knowledge base or noted in the Comment field.  
Since polysemous lexemes are split up over so many database records (see the relevant discussion), each lemma has only one sense or meaning. There is, thus, no necessity to provide for a special substructure of these fields.
+
<br /><br />31. Date
##23. Gloss
 
Whenever a token of the lemma stem appears in a text provided by an interlinear morphological gloss, the gloss is the same for all tokens of that type. This is achieved by retrieving it from the lexicon.1 In principle, the gloss is provided in the same user languages as before. However, since only linguists are interested in interlinear glosses, an English gloss may suffice.
 
Care must be taken concerning the relation of the gloss to the lexical item. An item of an interlinear gloss corresponds to an itemi in the text thati is taken as a whole, i.e. not analyzed morphologically. That is, minimally, a morph. Whenever the lemma of a lexical entry consists of a morpheme, no problem arises for the gloss field. The same holds, in principle, if the lemma is a complex stem (which is not analyzed in the lemma field itself; see below). Problems do arise if the lemma is an inflected (citation) form, because then it will contain inflectional morphemes beside the stem. The gloss, however, must render the stem.
 
• If the lexicon makes the distinction between lemma (field 1) and citation form (#4), this problem does not arise, since lemmata then are stems, and citation forms are taken care of in field 4.
 
• If, for reasons of traditional lexicographic conventions for that language, there are inflected lemmata, then there should be an additional field (‘stem’ or ‘base’) which represents that morphological entity that the gloss corresponds to. To illustrate from a German lexicon:
 
lemma werden
 
stem werd
 
gloss become
 
Moreover, the morphological gloss does not need to have a morphological substructure. This is important for morphologically complex stems. For instance, the Yucatec Maya lexicon contains the following three items:
 
lemma gloss
 
kim die
 
-s -CAUS
 
kims kill
 
The granularity of the glosses in a text is adjusted to the granularity of the morphological analysis executed there. That means:
 
• If the form of the verb kims occurring in the text is parsed as follows: kim-s, then a gloss will be retrieved from the lexicon for each of the morphs shown, i.e. the gloss will be ‘die-CAUS’.
 
• If the form kims is not parsed in the text, it will be looked up as a whole in the lexicon, and its gloss will be ‘kill’.
 
Thus, the granularity of the gloss is not decided separately, but is a consequence of the granularity of the morphological analysis applied to the text. This, in turn, depends on the specific purpose pursued with the gloss (see Lehmann 2004, section 3.7). It is therefore inappropriate to provide a gloss ‘die:CAUS’ for the lexical entry kims.
 
24. Semantic classes
 
Each lexical item – at least those with a lexical meaning – belongs to one or more semantic classes. For instance, leg is a body part, spider is an insect, laugh is an expression of emotion. This classificatory information is implicit or even explicit in a good definition (field #22). However, it is highly useful to specify it in a separate field:
 
• While working on the lexical database, one may select all the entries of a given semantic class and compare them. This is the most efficient way to check the lexical database for consistency.
 
• One may output partial or comprehensive onomasiological dictionaries from the database.
 
• Depending on the language, some of these classes may be grammatically relevant and, thus, reappear in the grammar.
 
The semantic classes form a range set. A proposal for a practical set of semantic classes applicable to many languages is on a separate page.
 
Even monosemous lemmas may belong to more than one semantic class. For instance, apple is both a plant part and a food (and so are all the fruits).
 
Database solution
 
relational a table of semantic categories and a cross-table connecting lemmas with such categories
 
free field structure optionally more than one instance of this field per record
 
25. Semantic relations
 
This field contains paradigmatic lexical relations to other lemmas which have the current lemma in the corresponding field. These relations are, therefore, mutual. The technical implementation is as above for the derivational relations (field #17).
 
Relevant relations include:
 
• synonymy,
 
• hyponymy/hyperonymy,
 
• cohyponymy: antonymy, complementarity (contradictory contrast), converse relation, minimal contrast,
 
• part-whole relation.
 
See the detailed treatment of semantic relations in lexicography.
 
##26. Encyclopedic information
 
The content of this field goes beyond linguistic semantics, giving information on real-world, especially culture-specific properties of concepts designated. This field may refer to the pertinent section of the 'Situation of the language', esp. the ethnographic situation, for background information.
 
##27. Picture
 
The field contains a link to an illustrative image file which may be shown automatically or upon mouseclick. Naturally, this will be relevant in connection with the previous field.
 
#F. Genetic-historical information
 
The following set of fields contains information on historical and genetic relations of the lemma.
 
##28. Origin
 
This field is relevant for loans. Its content is the name of a language. This should be taken from a range set.
 
##29. Etymology
 
This field contains information on the etymology of the lemma, like an etymological dictionary. Naturally, this is subservient to information on word formation, which is contained in field #15.
 
##30. Cognates
 
This field contains formally or semantically related words from genetically related languages. Apart from their intrinsic interest, they are often methodologically useful since they may help identify the basic meaning of a lexeme.
 
##31. Examples
 
An example in a dictionary entry illustrates a specific sense or construction. This local relevance of the example contributes to the reasons for apportioning a polysemous or syntactically heterogeneous item to several records.
 
An example in a dictionary entry has a double functionality:
 
1. It serves as documentary evidence for the descriptive statements (the semantic definition, the grammatical categorization, the stylistic marking etc.).
 
2. It helps the dictionary user in employing/understanding the lemma in his texts.
 
For both purposes, it is convenient to select typical examples. A typical example:
 
• illustrates exactly the local sense or construction in an entry,
 
• represents a frequent collocation,
 
• is as simple as possible, i.e. contains no unnecessary grammatical/semantic/stylistic complications.
 
In their function #1, examples must be drawn from the corpus. However, the corpus does not always contain suitable (typical) examples. Examples that only have function #2 may be concocted by the lexicographer. The safest way of doing this is by simplifying a corpus example.
 
The structure of a dictionary example is as follows:
 
some complex expression containing some form of the lemmai, representedi by a tilde (~) translation into a background language (source of the example in the corpus)
 
In a monolingual dictionary, the translation is missing.
 
Each lexical entry (record) has a set of examples. The examples are not given literally, but represented by record IDs of the text corpus.
 
Database solution
 
relational cross-table linking lemma IDs with IDs of records of the corpus
 
free field structure set of instances of the field ‘examples’ per record, each containing a hyperlink to a record of the corpus
 
References
 
Sinclair (ed.) 1987, ch. 7
 
G. Methodology
 
The set of fields following here contain information relevant for the researcher in working on the database. Most of it is not destined to be published.
 
32. Bibliographical references
 
Information on lexical properties, including whole lexical items, may come from published sources. In principle, it may relate to any of the fields of the entry in particular. For that reason, it might seem appropriate to accompany each field of a record with its own bibliographical reference field. However, that would be overdoing it. It suffices to have one field for such references, which contains free text specifying which information comes from which source. The source itself is indicated by the conventional bibliographical short form (like ‘Hale 1985’), implying reference to a bibliographical database that resolves such short forms.
 
Such lexical information can generally only be published after counterchecking it with one's text corpus and/or informants. After that, the bibliographical reference no longer needs to appear in the published dictionary entry and may pass to the general bibliography.
 
33. Comment
 
This field contains any additional information, esp. of a methodological, stylistic, sociolinguistic nature, including the status of the lemma and ungrammatical examples. In contrast to the following field, its content could, in principle, be published (although it seldom will be).
 
34. Problems
 
This field contains questions to be investigated and problems to be solved in future lexicographic work, especially fieldwork. This field is directly related to the previous one: a problem is formulated in the present field. Once it is solved, its solution is noted in the field ‘comment’ (so that it may not be forgotten), and the problem is deleted. Thus, the content of this field is destined exclusively for the researcher and never published.
 
35. Date
 
 
This is the date of last modification, which the DBMS will update automatically.
 
This is the date of last modification, which the DBMS will update automatically.
Minimum and expanded microstructure
+
Reference: [https://www.christianlehmann.eu/ling/ling_meth/ling_description/lexicography/index.html?https://www.christianlehmann.eu/ling/ling_meth/ling_description/lexicography/microstructure.html Lexicography: Microstructure - Christian Lehmann (University of Erfurt)]
Only a dictionary that boils down to a word list has no microstructure (see the retrograde dictionary as an example). Traditionally, the minimum microstructure for a general dictionary is:
 
lemma – definition – examples.
 
On the other hand, additional kinds of information not mentioned above are easily imaginable: paronomasias, concept history etc.
 
If there is a team of lexicographers, it may be necessary to add a field ‘editor’ to the methodology section, which holds the initials of the researchers who touched the entry, in chronological order.
 
Mono- vs. bilingual dictionary
 
The microstructures of a monolingual dictionary and of the L2–L1 volume of a bilingual general dictionary differ only in a few fields:
 
• In field 22, the monolingual dictionary provides a definition in the language of the dictionary, while the bilingual dictionary lists the equivalents of the lemma in L1.
 
• In field 31, the bilingual dictionary couples each example with its L1 translation, which the monolingual dictionary does not.
 
________________________________________
 
1 Toolbox comes with an automatic interlinear glossing feature which works rather economically for agglutinative morphology.
 

Revision as of 08:46, 7 March 2022

Microstructure of the Lotsawa Workbench (i.e., internal structure of any of entries)

A. Entry identity

1. Lemma The lemma is given in standard wylie representation. A transliteration into unicode Tibetan fonts will be automatically generated.

2. Homonym/Homophone number Homonyms and homophones are separate entries distinguished by numbers. Homonyms have the same spelling and the same pronunciation but have different meanings. Homophones have the same pronunciation but have different meanings and may have different spellings. This category is useful in case disambiguation is important. This information can be particularly useful for the interpretation of misspelled words.

3. Sense number This field documents each reading of a polysemous item. These readings are numbered from 1 (mother entry) to n.

4. Citation form The lemma is usually given in a generic form conformed with the usage of textual corpus it is drawn from. Most of the time, the content field 4. is identical to the content of field 1. This field is used to cite the generic form of inflected lemmas or dependent forms (prefixes, suffixes, roots, stems, generic grammatical particles)

B. Expression The expression of a lemma can be a phonological or graphemic representation.

5. Phonological representation The phonological representation is given in IPA. This is field is useful to provide valuable information when the phonological representation is not derivable by rule from the orthographic representation.

6. Sound This field is meant to provide a link to a sound file.

7. Phonological variants This field accounts for idiosyncratic phonological variation and possible allophones

8. Orthographic variants The standard graphemic representation serves as lemma. This field is used to document alternative spellings resulting from earlier or unstandardized usage.

C. Language variety

9. Dialect This field is used to record forms that do not belong to standard Tibetan as found in the corpus of Buddhist literature or Central Tibetan literature. By default, the value of this field is ‘standard’.

10. Sociolect This field can be used to indicate that the lemma is used by specific groups (e.g., age group), Buddhist traditions, or professions.

11. Style This field documents style, register, connotations and any kind of pragmatic information. Relevant values include ‘honorific’, ‘formal’, ‘vulgar’.

12. Stage – Time period This field is mainly used for archaic language.

D. Structure

13. Tibetan syntactic category This field contains information about the Tibetan syntactic category under which the lemma is classified.

14. Syntactic and morphological category This field refers to subcategorys of a part of speech, e.g. noun, proper noun, transitive verb, etc. The taxonomy of these categories comprise standard syntactic categories. N.B. A lemma which belongs to diverse syntactic categories is considered polysemous. Each category then constitutes a record. The morphological categories to be specified here include noun class, gender, possessive class, verbal voice, inflection class. An inflected word (usually a verb) may fall into diverse morphological categories at once, e.g. voice x, conjugation class y. Some may be syntactically relevant lexical classes such as the gender of a noun, others may be purely morphological classes such as inflection classes.

15. Morphological structure This field contains the constituents of the lemma. In the case of a nominal compound, they represent various stems. In the case of a derivative (e.g., verbal forms), they represent a stem and a derived form. Since the items listed here can be identical to certain lemmas of the database, it is encouraged to use hyperlinks.

16. Word formation This field documents the word-formation process at the origin of the lemma stem, e.g. reduplication, bahuvrīhi, causative, denominal, intensive, etc.

17. Derivatives In this field, lemmas that have the current lemma in their field 15 are automatically referenced.

18. Construction This field contains information about the syntactic and semantic construction frame. For example, different syntactic constructions for verbs with regard to their complement can be documented here. This field is used to provide specify the information contained in field 14.

19. Phraseology This field lists relevant collocations in which the lemma is found. These include fixed expressions, such as technical terms, common phrases, idioms, and proverbs. If these phrases have lemma status, a link will be automatically generated.

E. Meaning

20. Meaning definitions This field contains the lexicographer's definition of the lemma. N.B. In the case of polysemous lexemes, each sense of the lemma has only one definition. Each sense must therefore be documented through the procedure described here.

21. Semantic classes Each lexical item – at least those with a lexical meaning – belongs to one or more semantic classes. For instance, bear is an animal, anger is an emotion, bsang mchod is a ritual, etc. Even a simple word with a single sense may belong to several semantic classes.

22. Semantic relations This field informa user about mutual lexical relations to other lemmas which have the current lemma in the corresponding field. Common relations include:

  • synonymy,
  • hyponymy/hyperonymy,
  • cohyponymy: antonymy, converse relation, minimal contrast,
  • part-whole relation.



23. Encyclopedic information This field contains information on the concept communicated by the lemma, for example, in relation to a doctrinal, ritual, or cultural background.

24. Picture This field includes visual information about the lemma.

F. Genetic-historical information

25. Origin and cognates This field is used to document loans from other languages as well as morphologically or semantically related words from genetically related languages.

26. Etymology

27. Examples An dictionary example aims at illustrating a specific sense or construction. Ideally, each word sense should be illustrated with a relevant example, particularly in the case of non-generic lemmas, such as technical terms, idiomatic usage, etc. A dictionary example therefore has two main functions:

  1. It produces referenced evidence for the dictionary entry with regard to the semantic definition, the grammatical categorization, the stylistic marking, etc.
  2. It facilitates the understanding of the described lemma in the user’s texts.

As a consequence, a dictionary example should be typical:

  • It illustrates exactly the described sense or construction of a lemma,
  • It represents a common collocation,
  • It is as simple and short as possible without any unnecessary grammatical/semantic/stylistic complications.



G. Methodology

28. Bibliographical references Information on the lemma may be included in published sources, primary and secondary. In some specific cases (e.g., technical terms, specialized terminology, etc.), it can be useful to list bibliographical references to further document a specific sense of the lemma.

29. Comment This field contains any additional information about the lemma.

30. Problems This field is used to list questions that need to be addressed in future lexicographic work. The information provided by answers and subsequent investigations can be integrated into the knowledge base or noted in the Comment field.

31. Date This is the date of last modification, which the DBMS will update automatically. Reference: Lexicography: Microstructure - Christian Lehmann (University of Erfurt)