View on GitHub


A JavaScript library that converts scription text files to the Data Format for Digital Linguistics


GitHub version downloads GitHub issues tests status license DOI GitHub stars

A JavaScript library that converts linguistic texts in scription format to the Data Format for Digital Linguistics (DaFoDiL). This library is useful for language researchers who want to work with their data in text formats that are simple to type and read (scription), but want to convert their data for use in other Digital Linguistics tools.


Basic Usage

  1. Install the library using npm or yarn:
  npm i @digitallinguistics/scription2dlx
  yarn add @digitallinguistics/scription2dlx

Or download the latest release from the releases page.

  1. Import the library into your project:


  import convert from '@digitallinguistics/scription2dlx';


  <script src=scription2dlx.js type=module></script>
  1. The library exports a single function which accepts a string and returns a DaFoDiL Text Object.


  title: How the world began
  waxdungu qasi
  one day a man


  const data = await fetch(`data.txt`);
  const text = scription2dlx(data);

  console.log(text.utterances.transcription); // "waxdungu qasi"

You may also pass an options hash as the second option. See the Options section below.

  const text = scription2dlx(data, { /* options */ });



Option Default Description
alignmentError "warn" This option specifies whether the library should throw an error when it encounters an utterance which has a different number of words on each line, or a different number of morphemes in each word. The Leipzig glossing rules state that each line in an interlinear example must have the same number of words and morphemes on each line. By default, this option is set to "warn", which displays a warning if an alignment problem is encountered. To turn off warnings entirely, set this option to false; to throw an error, set this option to true.
codes {} This option allows you to use custom backslash codes in your interlinear glosses. It should be a hash containing the scription code as a key (without a leading backslash), and the custom code as the value; ex: "txn": "t" will allow you to write \t instead of \txn for transcription lines.
orthography "default" An abbreviation for the default orthography to use for transcriptions when one is not specified.
parser undefined A YAML parser to use in parsing the header of a scription document. If none is present, the header will be provided as a string in the header property of the returned object.
utteranceMetadata true Whether to parse the utterance metadata line (the first line when it begins with #). If set to true, a metadata property will be added to each utterance that has it.