Skip to content

avlukanin/normatex

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Normatex - Russian text normalization

This is a set of Finite-State Transducers (FSTs) for normalization of Russian texts for speech synthesis, machine translation and other natural language processing tasks.

The FSTs are developed using Unitex, a corpus processor.

To normalize a Russian text:

  1. Copy your text (e.g. example.txt) to Corpus folder, open it in Unitex and preprocess it with following resources:
  • apply Graphs/Preprocessing/Sentence/SentenceUniver.grf in MERGE mode
  • apply Graphs/Preprocessing/Replace/replace.grf in REPLACE mode
  1. Apply lexical resources:
  1. Create a cascade (Text\Apply CasSys Cascade... menu, New) to sequentially apply the following FSTs to your text in REPLACE mode:
  • Graphs/numbers.fst2
  • Graphs/abbr/abbr_w.fst2
  • Graphs/abbr/acronyms_w.fst2
  • Graphs/Postprocessing/replace.fst2
  1. Launch the cascade of FSTs.
  2. The normalized text is in Corpus/example_csc/example_4_0.snt.

Slides

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published