Skip to content

Commit 91e8ec7

Browse files
committed
update README for the new Myket dataset
1 parent 20eb15e commit 91e8ec7

File tree

4 files changed

+16
-10
lines changed

4 files changed

+16
-10
lines changed

README.md

+11-4
Original file line numberDiff line numberDiff line change
@@ -13,15 +13,22 @@ US Legis., UN Trade, UN Vote, and Contact. The first five datasets are bipartite
1313

1414
Most of the used original dynamic graph datasets come from [Towards Better Evaluation for Dynamic Link Prediction](https://openreview.net/forum?id=1GVpwr2Tfdg),
1515
which can be downloaded [here](https://zenodo.org/record/7213796#.Y1cO6y8r30o).
16-
Please first download them and put them in ```DG_data``` folder.
17-
Then, please run ```preprocess_data/preprocess_data.py``` for pre-processing the datasets.
16+
Please download them and put them in ```DG_data``` folder.
17+
The Myket dataset comes from [Effect of Choosing Loss Function when Using T-batching for Representation Learning on Dynamic Networks](https://arxiv.org/abs/2308.06862) and
18+
can be accessed from [here](https://github.com/erfanloghmani/myket-android-application-market-dataset).
19+
The original and preprocessed files for Myket dataset are included in this repository.
20+
21+
We can run ```preprocess_data/preprocess_data.py``` for pre-processing the datasets.
1822
For example, to preprocess the *Wikipedia* dataset, we can run the following commands:
1923
```{bash}
2024
cd preprocess_data/
2125
python preprocess_data.py --dataset_name wikipedia
2226
```
23-
24-
The Myket dataset comes from [Effect of Choosing Loss Function when Using T-batching for Representation Learning on Dynamic Networks](https://arxiv.org/abs/2308.06862) and can be accessed from [here](https://github.com/erfanloghmani/myket-android-application-market-dataset). The preprocessed files for this dataset are included in the repository and are located at `processed_data/myket`.
27+
We can also run the following commands to preprocess all the original datasets at once:
28+
```{bash}
29+
cd preprocess_data/
30+
python preprocess_all_data.py
31+
```
2532

2633
## Dynamic Graph Learning Models
2734
Eight popular continuous-time dynamic graph learning methods are included in DyGLib, including

preprocess_data/data_statistics.py

+3-3
Original file line numberDiff line numberDiff line change
@@ -14,11 +14,11 @@ def pprint_df(df, tablefmt='psql'):
1414
for dataset_name in sorted(all_datasets, key=lambda v: v.upper()):
1515
edge_raw_features = np.load('../processed_data/{}/ml_{}.npy'.format(dataset_name, dataset_name))
1616
node_raw_features = np.load('../processed_data/{}/ml_{}_node.npy'.format(dataset_name, dataset_name))
17-
info = {'name': dataset_name,
17+
info = {'dataset_name': dataset_name,
1818
'num_nodes': node_raw_features.shape[0] - 1,
19-
'node_fea_dim': node_raw_features.shape[-1],
19+
'node_feat_dim': node_raw_features.shape[-1],
2020
'num_edges': edge_raw_features.shape[0] - 1,
21-
'edge_fea_dim': edge_raw_features.shape[-1]}
21+
'edge_feat_dim': edge_raw_features.shape[-1]}
2222
records.append(info)
2323

2424
info_df = pd.DataFrame.from_records(records)
+1-1
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
import os
22

3-
for name in ['wikipedia', 'reddit', 'mooc', 'lastfm', 'enron', 'SocialEvo', 'myket',
3+
for name in ['wikipedia', 'reddit', 'mooc', 'lastfm', 'myket', 'enron', 'SocialEvo',
44
'uci', 'Flights', 'CanParl', 'USLegis', 'UNtrade', 'UNvote', 'Contacts']:
55
os.system(f'python preprocess_data.py --dataset_name {name}')

preprocess_data/preprocess_data.py

+1-2
Original file line numberDiff line numberDiff line change
@@ -163,8 +163,7 @@ def check_data(dataset_name: str):
163163
if args.dataset_name in ['enron', 'SocialEvo', 'uci']:
164164
Path("../processed_data/{}/".format(args.dataset_name)).mkdir(parents=True, exist_ok=True)
165165
copy_tree("../DG_data/{}/".format(args.dataset_name), "../processed_data/{}/".format(args.dataset_name))
166-
print(
167-
f'the original dataset of {args.dataset_name} is unavailable, directly use the processed dataset by previous works.')
166+
print(f'the original dataset of {args.dataset_name} is unavailable, directly use the processed dataset by previous works.')
168167
else:
169168
# bipartite dataset
170169
if args.dataset_name in ['wikipedia', 'reddit', 'mooc', 'lastfm', 'myket']:

0 commit comments

Comments
 (0)