L2/15-014 Proposal to Change Annotations on Some Cyrillic Characters Aleksandr Andreev* Yuri Shardt Nikita Simmons Ponomar Project Slavonic Computing Initiative 1 Introduction Church Slavic (also known as Church Slavonic or Old Slavonic) is a historical literary language of the Slavs. Presently it is used as a liturgical language by the Russian Orthodox Church, various other local Orthodox Churches, and Byzantine-rite Catholic and Old Ritualist communities. As considered in this document, Church Slavic is wrien in the Cyrillic script. e encoding of Cyrillic characters required for writing Church Slavic has had a long history and a large number of proposals by various authors representing diﬀerent institutions have been considered by the Unicode Technical Commiee. As a result of the piecemeal approach, several inconsistencies in the encoding model have emerged. Understanding that the Unicode Stability Policy prevents us from making changes to the existing implementation, we do, however, request the UTC to update some annotations in order to clarify certain aspects of the encoding model. Our proposed changes to Annotations and the Documentation are presented in this document. 2 U+0479 Digraph Uk is character has been discussed by the UTC before (see Everson et al. (2006) and Everson et al. (2007)). e authors of Everson et al. (2006, p. 3) state that “What we have regarding CYRILLIC LETTER UK is quite a mess.” Since then, the character U+A64B MONOGRAPH UK has been encoded and the annotations on U+0479 and U+0478 have been changed in an aempt to ﬁx the problems with these characters. However, recently a decision was made to encode also U+1C82 CYRILLIC SMALL LETTER NARROW O (see Andreev et al. (2014)) in the Cyrillic Extended-C block. In fact, the digraph character Onik (which is the character that was supposed to be encoded at U+0479), most oen appears using the Narrow O as the ﬁrst character (ᲂу), and not U+043E CYRILLIC SMALL LETTER O (see Figure 1). In the past, the narrow O character was not available in Unicode, but it has now been accepted for encoding in a future version of the standard. What we have now is three diﬀerent potential spellings for the Onik digraph, which have the representations given in Table 1. * Corresponding author: [email protected] 1 Table 1: Encoding of the Onik digraph U+1C82 U+0443 U+043E U+0443 U+0479 ᲂу о у ambiguous Figure 1: Orthography of the digraph onik in Church Slavic. Note the usage of Narrow O (encoded at U+1C82) in writing the digraph onik (boxed in red) vs. the usage of Cyrillic Leer O (encoded at U+043E) in writing the leer O (boxed in black). Source: Gamanovitch (1991, p. 13) Clearly, the character U+0479 has been problematic all along, though the UTC has hesitated to deprecate it completely. Now with the encoding of U+1C82, the problematic nature of this character is compounded. us, we propose that the annotation on U+0479 be changed to: this character has ambiguous glyph representation and should not be used for " digraph onik" use U+1 C82 U+0443 or U+043E U +0443 for " monograph uk" use U+A64B Furthermore, the annotation on U+0478, “may be rendered as either monograph or digraph form” seems to reﬂect the state of the standard before the UTC acted on Everson et al. (2007), where it was proposed to clarify the appearance of U+0478. In light of the past discussion about this character and the proposed changes to U+0479, we propose that the annotations on U+0478 be changed to: this character should not be used for " digraph onik" use U+041E U +0443 for " monograph uk" use U+A64A 2.1 or U+041E U +0423 U+047C and U+047D ese characters were originally encoded in the Unicode standard with an erroneous name and representation. Aer the UTC ruling on Everson et al. (2006), the representation was corrected and an annotation was added to U+047C, reading “despite its name, this character does not have a titlo, nor is it composed of an omega plus a diacritic”. However, no annotation was added to the lowercase form U+047D. e character that is encoded here is a ligature of the Cyrillic broad (or wide) Omega (encoded at U+A64C and U+A64D) and the ‘great apostro’, a stylized diacritical mark consisting of the so breathing (encoded at U+0486) and the Cyrillic kamora (encoded at U+0311). e broad Omega (U+A64D) can occur by itself, without this diacritical mark, in pre-1700 printed Church Slavic books, though not in modern liturgical texts. Functionally, the character with 2 the diacritical mark is analogous to the Greek character ὦ, which also consists of an Omega, a so breathing mark and a Perispomene. Both the Greek and Church Slavic characters have identical functions: to record the exclamation ‘Oh!’ Since U+047C and U+047D were encoded without a canonical decomposition, though they are linguistically decomposable, they should not be decomposed to avoid an encoding ambiguity. However, in our opinion, the annotation as wrien does not make this clear. We propose to change the annotation on U+047C to read: Alias name: Cyrillic " beautiful omega" Used for exclamation Oh! in Church Slavic Despite its name , this character does not have a titlo . It should not be decomposed into an omega plus diacritics . See also: U+A64C CYRILLIC CAPITAL LETTER BROAD OMEGA We further propose to add an analogous annotation for U+047D, reading: Alias name: Cyrillic " beautiful omega" Used for exclamation Oh! in Church Slavic Despite its name , this character does not have a titlo . It should not be decomposed into an omega plus diacritics . See also: U+A64D CYRILLIC SMALL LETTER BROAD OMEGA 3 U+0484: Combining Palatalization is character is used in ancient manuscripts and in academic work to indicate that a vowel is soened, a phenomenon called ‘palatalization’ (Karsky, 1979, p. 230). However, many users incorrectly use this character to encode the Cyrillic Kamora (circumﬂex accent), which in Unicode has been encoded as U+0311 COMBINING INVERTED BREVE. Furthermore, this character has a cross-reference to U+033E COMBINING VERTICAL TILDE. However, U+033E is not used for palatalization, but is rather used to indicate an omied yer (so sign and, in later sources, hard sign) and, very rarely in some manuscripts, an omied [j] (Karsky, 1979, pp. 228). e cross-reference probably arises because of a confusion between the function of U+033E and the function of U+02BC MODIFIER LETTER APOSTROPHE, which is used in some manuscripts and in academic publications to indicate palatalization in Cyrillic (see Golyshenko (1987, pp. 52) and Yelkina (1960, p. 26)). In any case, the cross-reference is providing further misleading information about the purpose of this character. We propose that the cross-reference to U+033E be removed and that the following annotations be made to U+0484: Not used for kamora . See also: U +0311 COMBINING INVERTED BREVE See also: U+02 BC MODIFIER LETTER APOSTROPHE In addition, because this character is used both in Cyrillic and Glagolitic scripts to indicate palatalization (Golyshenko, 1987, p. 44), we propose that the Script property on U+0484 be changed to Inherited. 4 U+04A4 and U+04A5 Cyrillic Ligature En Ghe e Unicode standard includes a number of characters for writing the palatalized (so) consonants used in some early manuscripts and in academic work: U+A663 CYRILLIC SMALL 3 Figure 2: Examples of the Cyrillic Leer So En (boxed in red) used in academic literature. Source: Golyshenko (1987, p. 42) Figure 3: Examples of the Cyrillic Leer So En (boxed in red) from Church Slavic texts. Source: Typografsky Ustav, a manuscript wrien c. 1300. Source: Uspensky (2006, p. 152). LETTER SOFT DE (and its capitalized analog U+A662), U+A665 CYRILLIC SMALL LETTER SOFT EL (and its capitalized analog U+A664), and U+A667 CYRILLIC SMALL LETTER SOFT EM (and its capitalized analog U+A666). e Cyrillic writing system uses one additional so character: Cyrillic Small Leer So En (see examples from academic literature in Figure 2 and examples from reproductions of ancient sources in Figure 3). Instead of encoding this leer as a standalone character, we propose to merge it with U+04A5 (U+04A4 for the capitalized form), since the Ligature En Ghe used in Altay, Mari and Yakut is almost identical visually to the So En. us, we propose that the following annotation be added to U+04A4: Also used for Cyrillic Capital Letter Soft En and the following annotation be added to U+04A5: Also used for Cyrillic Small Letter Soft En 5 U+2DF5 Combining Cyrillic Letter Es-Te e authors of the present document also submied to the UTC a proposal to clarify the encoding of Cyrillic composite combining characters (see L2/15-002). us, we propose that the appropriate annotations be made to this character. For example: This character should not be used. Correct spelling is: U+2 DED U+200D U+2 DEE 4 6 U+0483 Combining Cyrillic Titlo One ﬁnds contradictory information in the Unicode documentation regarding the usage of the characters U+0483 COMBINING CYRILLIC TITLO, U+A66F COMBINING CYRILLIC VZMET and U+0487 COMBINING CYRILLIC POKRYTIE. For example, the Unicode Standard version 7.0 (Section 7.4) reads: e [Cyrillic Extended-A] block contains a set of superscripted (wrien above), or titlo leers, used in manuscript Old Church Slavonic texts, usually to indicate abbreviations of words in the text. ese may occur with or without the generic titlo character, U+0483 COMBINING CYRILLIC TITLO, or with U+A66F COMBINING CYRILLIC VZMET. On the other hand, the annotation on U+0483 reads not used with letter titlos. e problem with the documentation as wrien is that it is aempting two describe two diﬀerent practices dating to two diﬀerent eras. Contrary to the statement in Section 7.4 of the documentation, superscripted (titlo) leers are used not only in manuscript Old Church Slavonic texts, but also in modern Church Slavic. As used in modern Church Slavic, the superscripted leers are usually “covered” by a bow shaped symbol called the “pokrytie” (Slavic for “cover”) and encoded in Unicode at U+0487 COMBINING CYRILLIC POKRYTIE. e character “titlo” encoded at U+0483 COMBINING CYRILLIC TITLO is also used in modern Church Slavic to indicate that a leer or leers have been omied from the spelling of a word; it is also used to indicate numerals, and it is never used to “cover” a superscripted leer. e character U+A66F COMBINING CYRILLIC VZMET is not used in modern Church Slavic. In Old Church Slavonic manuscripts, the use of titlo, vzmet and pokrytie is more or less interchangeable. Superscript leers usually occur either by themselves or covered by a pokrytie or vzmet (more rarely, titlo) and either the titlo or the vzmet can occur as an indication that leers have been omied (either in nomina sacra or in abbreviations) and in numerals. In fact, the distinction between titlo and vzmet is purely typographical, and the two characters ought to be viewed as two glyphs depicting the same character. For examples, we refer the reader to Karsky (1979, pp. 230). We therefore propose the following changes to the Unicode documentation. Section 7.4 should be amended to read: e [Cyrillic Extended-A] block contains a set of superscripted (wrien above), or titlo leers, used in Church Slavic and Old Church Slavic texts, usually to indicate abbreviations of words in the text. ese characters may be followed by U+0487 COMBINING CYRILLIC POKRYTIE, and in Old Church Slavic texts, also by U+0483 COMBINING CYRILLIC TITLO, or by U+A66F COMBINING CYRILLIC VZMET. We also propose the following changes to Annotations. For U+0483 COMBINING CYRILLIC TITLO, remove the annotation not used with letter titlos. Add the annotation: used in Cyrillic or Glagolitic to indicate abbreviation or numeral. Under “See also”, add a reference to U+0487 COMBINING CYRILLIC POKRYTIE. e Script property of this character should be changed to Inherited, since it can be used both for Cyrillic and Glagolitic. 5 For U+0487 COMBINING CYRILLIC POKRYTIE, remove the annotation: used only with letter titlos. Add the annotation: used with combining Cyrillic or Glagolitic letters. Under “See also”, remove the reference to U+0311 COMBINING INVERTED BREVE. e character U+0311 is used to encode a Cyrillic Kamora (circumﬂex accent), not for supralineation. Add a reference to U+0483 COMBINING CYRILLIC TITLO. e Script property of this character should be changed to Inherited, since it can be used both for Cyrillic and Glagolitic. For U+A66F COMBINING CYRILLIC VZMET, remove the annotation: used with Cyrillic letters to indicate abbreviation. Add the annotation: used in Cyrillic or Glagolitic to indicate abbreviation or numeral. Again, the Script property of this character should be changed to Inherited, since it can be used both for Cyrillic and Glagolitic. 7 Naming Conventions Because in its listing of the language in the CLDR, the term “Church Slavic” has been adopted, we propose that the Unicode documentation be made to conform with the CLDR terminology and that the words “Church Slavonic” in the documentation be replaced everywhere by “Church Slavic”. e terms “Church Slavic” and “Church Slavonic” are used interchangeably in English, and this naming change is only for the sake of consistency between the two sets of documentation. References Andreev, A., Y. Shardt, and N. Simmons (2014). Proposal to Encode Additional Cyrillic Characters Used in Early Church Slavonic Printed Books. ISO/IEC JTC1/SC2/WG2 N4607. Everson, M., D. Birnbaum, R. Cleminson, I. Derzhanski, V. Dorosh, A. Kryukov, and S. Paliga (2006). On CYRILLIC LETTER OMEGA WITH TITLO and on CYRILLIC LETTER UK. ISO/IEC JTC1 SC2 WG2 N3184. Everson, M., D. Birnbaum, R. Cleminson, I. Derzhanski, V. Dorosh, A. Kryukov, S. Paliga, and K. Ruppel (2007). Proposal to encode additional cyrillic characters in the BMP of the UCS. ISO/IEC JTC1 SC2 WG2 N3194. Gamanovitch, A. (1991). Грамматика Церковно-славянского Языка. Moscow: Khudozhestvennaya Literatura Press. Golyshenko, V. S. (1987). Мягкость согласных в языке восточных славян XI-XII вв. Moscow: Nauka. Karsky, E. F. (1979). Славянская Кирилловская Палеография. Moscow: Nauka Press. Uspensky, B. A. (Ed.) (2006). Типографский устав: Устав с кондакарем конца XI – начала XII века, Volume 2. Moscow: Yazyki Slavyanskikh Kulʹtur. Yelkina, N. M. (1960). Старославянский язык. Moscow, Russia: Ministry of Education of the 6 RSFSR.
© Copyright 2016 ExploreDoc