Skip to content

Releases: kuhumcst/gml

GML-resources

25 Mar 13:30
Compare
Choose a tag to compare

The files toklemSort.tab.ph and toklemposSort.tab.ph can be used to train a lemmatiser. For CSTlemma, the training program is affixtrain.

The file toklemSort.tab.ph.trigramFrequencies.tab can be used by CSTlemma to improve the disambiguation when more than one lemma candidate are available.

The archive PosTrainFiles.zip contains tokenized and segmented texts. Each token is followed by a slash and its part of speech tag. These files can be concatenated and used as training data set for a part of speech tagger, e.g. Brill or Lapos.