Subcorpus | docs |
---|---|
edeposcorpus.split00.docs.token | 297,478 |
flashcorpus.split00.docs.token | 273,794 |
news_2.split00.docs.token | 3,093,543 |
news_2.split01.docs.token | 2,976,780 |
news_2.split02.docs.token | 3,100,877 |
news_2.split03.docs.token | 2,934,816 |
news_2.split04.docs.token | 2,811,136 |
news_2.split05.docs.token | 2,671,540 |
news_2.split06.docs.token | 2,167,264 |
news_2.split07.docs.token | 2,823,473 |
news_2.split08.docs.token | 281,078 |
news_2.split09.docs.token | 2,979,623 |
news_2.split10.docs.token | 3,105,495 |
news_2.split11.docs.token | 2,745,166 |
news_2.split12.docs.token | 3,105,369 |
news_2.split13.docs.token | 2,888,914 |
news_2.split14.docs.token | 3,044,163 |
news_2.split15.docs.token | 2,243,778 |
news_2.split16.docs.token | 2,652,094 |
news_2.split17.docs.token | 3,073,893 |
news_2.split18.docs.token | 2,926,715 |
nok.split00.docs.token | 1,622,992 |
offentligt.split00.docs.token | 3,306,624 |
offentligt.split01.docs.token | 2,198,892 |
oscar.split00.docs.token | 859,790 |
oscar.split01.docs.token | 800,000 |
oscar.split02.docs.token | 881,928 |
oscar.split03.docs.token | 873,169 |
oscar.split04.docs.token | 892,183 |
oscar.split05.docs.token | 882,298 |
oscar.split06.docs.token | 882,881 |
oscar.split07.docs.token | 888,328 |
oscar.split08.docs.token | 904,712 |
oscar.split09.docs.token | 893,941 |
oscar.split10.docs.token | 865,024 |
oscar.split11.docs.token | 903,509 |
oscar.split12.docs.token | 415,618 |
runeberg.split00.docs.token | 902,768 |
tweets.split00.docs.token | 10,442,046 |
wiki.split00.docs.token | 3,421,795 |
total | 85,035,487 |
Total tokens in corpus : 15,151,843,671
Subcorpus | too long docs (> 1022) |
---|---|
edeposcorpus.split00.docs.token | 76 |
flashcorpus.split00.docs.token | 135 |
news_2.split00.docs.token | 668 |
news_2.split01.docs.token | 547 |
news_2.split02.docs.token | 117 |
news_2.split03.docs.token | 742 |
news_2.split04.docs.token | 571 |
news_2.split05.docs.token | 805 |
news_2.split06.docs.token | 2269 |
news_2.split07.docs.token | 557 |
news_2.split08.docs.token | 35 |
news_2.split09.docs.token | 283 |
news_2.split10.docs.token | 443 |
news_2.split11.docs.token | 3459 |
news_2.split12.docs.token | 218 |
news_2.split13.docs.token | 151 |
news_2.split14.docs.token | 171 |
news_2.split15.docs.token | 1461 |
news_2.split16.docs.token | 1232 |
news_2.split17.docs.token | 314 |
news_2.split18.docs.token | 100 |
nok.split00.docs.token | 122 |
offentligt.split00.docs.token | 8 |
offentligt.split01.docs.token | 1 |
oscar.split00.docs.token | 87546 |
oscar.split01.docs.token | 80124 |
oscar.split02.docs.token | 86818 |
oscar.split03.docs.token | 88286 |
oscar.split04.docs.token | 86809 |
oscar.split05.docs.token | 86787 |
oscar.split06.docs.token | 86828 |
oscar.split07.docs.token | 86707 |
oscar.split08.docs.token | 87348 |
oscar.split09.docs.token | 87151 |
oscar.split10.docs.token | 85381 |
oscar.split11.docs.token | 88898 |
oscar.split12.docs.token | 39836 |
runeberg.split00.docs.token | 2738 |
tweets.split00.docs.token | 0 |
wiki.split00.docs.token | 24200 |
total | 1119942 |
Total tokens in too long docs: 2,999,450,123
Tokens left if not splitting: 12,152,393,548