Skip to content

Commit

Permalink
Added randomized partitioning
Browse files Browse the repository at this point in the history
  • Loading branch information
christofs committed Mar 10, 2017
1 parent d8d2512 commit 11b68cc
Show file tree
Hide file tree
Showing 1,185 changed files with 412 additions and 47,163 deletions.
3 changes: 2 additions & 1 deletion HOWTO.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,8 +55,9 @@ Currently, the following standard processes are supported:
- Visualize the most distinctive words as a horizontal bar chart. (pyzeta.plot_scores)
- Visualize the feature distribution as a scatterplot (pyzeta.plot_types)

The following experimental functions are present but not really supported:
The following experimental functions are present (but not really supported):

- If you use ["random", "0", "1"] as the value for the `contrast` parameter, the partitions will be built randomly, splitting the collection in equal-sized parts. This is interesting if you want to see how strong your zeta scores really are relative to a random partitioning. (Expanding on this principle could be the basis for some type of significance test for zeta scores.)
- Visualize the relation between three partitions based on type proportions in two partitions (pyzeta.threeway)
- PCA for three partitions using distinctive features.

Expand Down
Binary file modified __pycache__/pyzeta.cpython-35.pyc
Binary file not shown.
17 changes: 12 additions & 5 deletions pyzeta.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@
import itertools
import shutil
from sklearn.decomposition import PCA
import random


# =================================
Expand Down Expand Up @@ -104,11 +105,17 @@ def make_filelist(metadatafile, contrast):
with open(metadatafile, "r") as infile:
metadata = pd.DataFrame.from_csv(infile, sep=";")
# print(metadata.head())
onemetadata = metadata[metadata[contrast[0]].isin([contrast[1]])]
twometadata = metadata[metadata[contrast[0]].isin([contrast[2]])]
onelist = list(onemetadata.loc[:, "idno"])
twolist = list(twometadata.loc[:, "idno"])
# print(onelist, twolist)
if contrast[0] != "random":
onemetadata = metadata[metadata[contrast[0]].isin([contrast[1]])]
twometadata = metadata[metadata[contrast[0]].isin([contrast[2]])]
onelist = list(onemetadata.loc[:, "idno"])
twolist = list(twometadata.loc[:, "idno"])
elif contrast[0] == "random":
idnolist = list(metadata.loc[:, "idno"])
newidnolist = random.sample(idnolist, len(idnolist))
onelist = newidnolist[:int(len(idnolist)/2)]
twolist = newidnolist[int(len(idnolist)/2):]
print(onelist, twolist)
print("----number of texts: " + str(len(onelist)) + " and " + str(len(twolist)))
return onelist, twolist

Expand Down
9,073 changes: 0 additions & 9,073 deletions sample-output/data/features_no_2000-lemmata-NN.csv

This file was deleted.

22,570 changes: 0 additions & 22,570 deletions sample-output/data/features_no_2000-words-all.csv

This file was deleted.

3,949 changes: 0 additions & 3,949 deletions sample-output/data/features_yes_2000-lemmata-NN.csv

This file was deleted.

10,383 changes: 0 additions & 10,383 deletions sample-output/data/features_yes_2000-words-all.csv

This file was deleted.

1 change: 0 additions & 1 deletion sample-output/data/yes-no_2000-lemmata-NN/acd001-0000.txt

This file was deleted.

1 change: 0 additions & 1 deletion sample-output/data/yes-no_2000-lemmata-NN/acd001-0001.txt

This file was deleted.

1 change: 0 additions & 1 deletion sample-output/data/yes-no_2000-lemmata-NN/acd001-0002.txt

This file was deleted.

1 change: 0 additions & 1 deletion sample-output/data/yes-no_2000-lemmata-NN/acd001-0003.txt

This file was deleted.

1 change: 0 additions & 1 deletion sample-output/data/yes-no_2000-lemmata-NN/acd001-0004.txt

This file was deleted.

1 change: 0 additions & 1 deletion sample-output/data/yes-no_2000-lemmata-NN/acd001-0005.txt

This file was deleted.

1 change: 0 additions & 1 deletion sample-output/data/yes-no_2000-lemmata-NN/acd001-0006.txt

This file was deleted.

1 change: 0 additions & 1 deletion sample-output/data/yes-no_2000-lemmata-NN/acd001-0007.txt

This file was deleted.

1 change: 0 additions & 1 deletion sample-output/data/yes-no_2000-lemmata-NN/acd001-0008.txt

This file was deleted.

1 change: 0 additions & 1 deletion sample-output/data/yes-no_2000-lemmata-NN/acd001-0009.txt

This file was deleted.

1 change: 0 additions & 1 deletion sample-output/data/yes-no_2000-lemmata-NN/acd001-0010.txt

This file was deleted.

1 change: 0 additions & 1 deletion sample-output/data/yes-no_2000-lemmata-NN/acd001-0011.txt

This file was deleted.

Loading

0 comments on commit 11b68cc

Please sign in to comment.