Releases: kuhumcst/gml
Releases · kuhumcst/gml
GML-resources
The files toklemSort.tab.ph and toklemposSort.tab.ph can be used to train a lemmatiser. For CSTlemma, the training program is affixtrain.
The file toklemSort.tab.ph.trigramFrequencies.tab can be used by CSTlemma to improve the disambiguation when more than one lemma candidate are available.
The archive PosTrainFiles.zip contains tokenized and segmented texts. Each token is followed by a slash and its part of speech tag. These files can be concatenated and used as training data set for a part of speech tagger, e.g. Brill or Lapos.