Unicode Character Encoding of Archived Linguistic DataPart 1: Tutorial on Key Unicode ConceptsPeter ConstableSIL Internationaliopeter_constable@sil.orgwww.sil.orgIntroductionPublished data archive requirements:chive requirements: Documented protocols / encodings for data umented protocols / encodings for data representation Common protocols / encodings Durable protocols / encodingsCharacter encoding: Past: multiple encodings, including non-standard Current Best Practice: Unicode©NRSI/SIL CTC 2000IntroductionCharacter encoding: Past: multiple encodings, including non-standard Meaning documented only by font Current Best Practice: Unicode UniversUniversaal character set Becoming dominant IT standardi widely documented Assumed by many other IT standards XML©NRSI/SIL CTC 2000IntroductionOverview: Key Unicode concepts to understandtand Specific issuesues©NRSI/SIL CTC 2000Key ConceptsUnicode-related concepts to understand: Encoding forms Character-glyph model & “smart font” rendering-gly Abstract character repertoire Character properties Alternate representations & normalization Compatibility characters Private-Use characters©NRSI/SIL CTC 2000Unicode Encoding FormsThe Unicode Standard does not provide just one encoding form but threeoMulti-tiered character model: Universal character repertoire Coded character set: Unicode scalar values U+0000 to U+10FFFF Encoding form: specific computer ...
Voir