Skip to content

Language Tagging

What is a language tag? See docs in langtags repo

SIL’s langtags.json file resides in the a langtags repo. Information on making a maximal and minimal language tag is derived from various sources.

  • langtags.csv is manually maintained. We have close cooperation with the Ethnologue script editor. Any new information on writing systems for a particular language in the Ethnologue is passed on to be included in the langtags.csv file and vice-versa.
  • IANA Language Subtag Registry - this file is placed in sldrtool repo and is available here. (LE:This makes no sense to me why it would not be in the langtags repo.)
    • ISO 639-1
    • ISO 639-3 (no files are directly used, IANA incorporates the changes)
  • Ethnologue (LanguageIndex.tab is renamed langindex.tab and placed in langtags/source
    • autonyms.csv is from the Ethnologue, but it is not publicly available for download from the Ethnologue.
  • ISO 15924 script codes are used. However, no downloaded files are used, the information is manually added to langtags.csv.
    • A few of the script codes may need to be described.
      • Zyyy is used for when the script is unknown
      • Zxxx is used for Sign languages
      • Zzzz is used for unencoded scripts. In those cases a subtag indicates the script name
  • ISO 3166-1 alpha-2 country codes are used. However, these are derived from the Ethnologue LanguageIndex.tab file which includes all the regions.
  • We are in the process of incorporating ROLV (Registry of Language Varieties) codes. At this point these would be processed as private subtags.
  • During the build process of langtags.json, the sldr files are also used for testing purposes. Any language tags that are used in the sldr files should be added to langtags.csv. If the sldr file is incorrect, it should be corrected.