[master < ] Add batched and parallel import #43

gitbuda · 2023-01-25T12:36:39Z

Resolves #42 -> mgconsole ~1000 lines/s (VERY SLOW)

DATASET
D1) Playground Cora Scientific Publications, N: 2708, E:5278, T:7986
D2) Playground Marve Cinematic Universe, N:21732 E: 682943, T: 704675

HARDWARE
H1) Ubuntu 20.04, Ryzen 7, 8 cores, 64GB RAM
H2) MacBook Pro M1, 16GB RAM

MEASUREMENTS

context	nodes	edges	serial (n+e)/s	parallel (n+e)/s	batch	workers
D1 + H1	2708	5278	1198.37	2642.62	1000	32
D2 + H1	21732	682943	5655.13	43820.34	1000	32
D1 + H2	2708	5278	736.51	2252.75	1000	16
D2 + H2	21732	682943	1060.32	7939.91	1000	16

NOTE
a) Parsing depends on the size of each line/node/edge, for smaller nodes/edges, query parsing time on H2) is >10k/s, but for bigger nodes (e.g. with large properties) can be much slower, e.g. ~3k/s

TASKS

gitbuda · 2023-01-27T15:56:50Z

gitbuda · 2023-01-28T14:39:36Z

On M1, execution is completely removed, just the "single thread batching" overhead -> a lot of room for parallel execution

gitbuda · 2023-01-28T20:46:16Z

Single-threaded batched execution but with the same calls, without index creation and __vertex_id prop removal because it's impossible to create indexes in a multi-query transaction -> very similar results.

gitbuda · 2023-02-05T14:06:43Z

Still just a bit faster 🏃

Add batched and parallel import

3e16ce2

gitbuda added the enhancement enhancement label Jan 25, 2023

gitbuda self-assigned this Jan 25, 2023

gitbuda and others added 2 commits January 27, 2023 17:12

Add Line struct

12fd963

Add only some batching code without execution to measure

6627064

Add single thread batched execution test

c2f7fed

gitbuda added 7 commits January 29, 2023 13:09

Add first attempt in correct batched parallel execution

65192f0

Added a few more issues with parallel batched execution

5e57555

Try to add serial execution (doesn't solve the problem yet)

6e6d9a2

Add thread pool utils + fix basic batching bug

03d2733

Update some small stuff

3471de8

Decouple different execution modes

4587b0c

Implement batching window

d731cd7

Add parsing exection

47d088a

gitbuda changed the title ~~Add batched and parallel import~~ [master < ] Add batched and parallel import Feb 14, 2023

gitbuda added this to the mgconsole-v1.4 milestone Feb 14, 2023

gitbuda added 10 commits April 15, 2023 15:58

Add cpp impl files for modes

8c05b92

Add ParseLineResult struct

28467e5

Upgrade Ubuntu 22.04 and MacOS Latest

415049b

Add functional header

33cef62

Update sys deps and Memgraph to 2.7

4249132

Upgrade mgclient to 1.4.1

eadedb2

Merge master

9743f3f

Add the experimental README placeholder

0444c22

Add hacked version of create vertex state machine detection

6cd8ae8

Move the clause clause deduction to a seprated file

8c669cd

gitbuda and others added 15 commits April 23, 2023 17:36

Fix the order of fields in the QueryInfo

be34d11

Add --import-mode flag

9683a08

Implement query line number and index

23f35cc

Add part of the ordered execution

50f8c85

Split execution to pure_vertices and others

cb2dd31

Add full list of states, does NOT fully work yet

829966a

Fix parsing

5b324ee

Fix batching

892671b

Add DROP_INDEX and REMOVE

14dd0d0

Add pre and post serial part

49adaa9

Move the input_output tests under a new directory

6b33d80

Add batch_size, parsing options and majority of the benchmarking script

8327923

Add parametrization of the number of workers

5a0823f

Finish dataset benchmark script

8feb2fb

Remove TODOs (most of them), improve stuff

bcd4f6c

gitbuda requested a review from antoniofilipovic May 16, 2023 06:29

gitbuda and others added 5 commits May 19, 2023 08:55

Fix merging of ParseLineInfo but in a dummy way

b89be37

Add STORAGE_MODE and improve collected clauses merge

eec32ee

Extened README

1817fba

Improve README

a2c7792

Fix type

54ad6a3

gitbuda removed the request for review from antoniofilipovic May 20, 2023 13:08

gitbuda marked this pull request as ready for review May 20, 2023 13:08

gitbuda merged commit c1d60b7 into master May 20, 2023

gitbuda deleted the add-batching-parallelization branch May 20, 2023 13:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[master < ] Add batched and parallel import #43

[master < ] Add batched and parallel import #43

gitbuda commented Jan 25, 2023 •

edited

Loading

gitbuda commented Jan 27, 2023

gitbuda commented Jan 28, 2023

gitbuda commented Jan 28, 2023 •

edited

Loading

gitbuda commented Feb 5, 2023

[master < ] Add batched and parallel import #43

[master < ] Add batched and parallel import #43

Conversation

gitbuda commented Jan 25, 2023 • edited Loading

gitbuda commented Jan 27, 2023

gitbuda commented Jan 28, 2023

gitbuda commented Jan 28, 2023 • edited Loading

gitbuda commented Feb 5, 2023

gitbuda commented Jan 25, 2023 •

edited

Loading

gitbuda commented Jan 28, 2023 •

edited

Loading