Skip to content

training/inference time as a function of number of scans used #25

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
radu-diaconescu13 opened this issue Apr 17, 2025 · 3 comments
Open

Comments

@radu-diaconescu13
Copy link

Hello,

I have noticed that I get a substantial increase for both training time and inference time
when increasing the number of scans used from 1 at a time to 5 at a time.

For example, for training it takes around 10 hours for an epoch for finish when using 1 scan
vs 40 hours when using 5 scans. (i am trying on cpu for the moment, hence the absolute big training times)
Similarly, when using the test_inference script I get around 1.1 seconds versus 3.5 seconds on my mac M1 laptop.

I have looked into the code and a great chunk of the time increase comes from the forward method of the DRSPAAM object
in https://github.com/VisualComputingInstitute/2D_lidar_person_detection/blob/master/dr_spaam/dr_spaam/model/dr_spaam.py
at the for loop at line 102.

Is there a reason this is done sequentially and not paralled/vectorized using torch's capabilities in this respect?

Thank you

@Pandoro
Copy link
Member

Pandoro commented Apr 17, 2025

Hi there,

so it's been a while since I looked at this, but the obvious reason that comes to mind would be the auto-regressive nature of DR-SPAAM. Line 112 of the same file actually updates a state, which can't be parallelized in a trivial fashion. The feature extraction itself should be parallelizable though, given that these ops don't have a state. I guess you could easily flatten the batch and number of scan dimensions, extract the features and then reshape it back prior to running only line 112 in a loop. I guess you can even try that without re-training given that it should in essence be exactly the same during inference. Do pay attention though that during training this would behave slightly differently (no idea if it's good or bad). Right now batchnorm is performed on batches of a single scan, whereas doing what I proposed above would run batchnorm collectively on all B*N scans. This might even be better, but I guess it's hard to tell without trying.

@radu-diaconescu13
Copy link
Author

Thank you @Pandoro

By " guess you can even try that without re-training given that it should in essence be exactly the same during inference. "
do you mean manually changing the structure of the weights that are already obtained after training? do you have any suggestion on how to do this and/or a link to an example, please?

@Pandoro
Copy link
Member

Pandoro commented Apr 17, 2025

No, you don't need to change the weights. My pytorch syntax is a bit rusty and I can't test this right now, but I'm suggesting something along the following lines:

B, CT, N, L = x.shape

# extract feature from all scan
out = x.view(B * CT * N, 1, L) # Not sure if that works, but I wouldn't see why not. You could also use Einops to be more explicit.
out = self._conv_and_pool(out, self.conv_block_1)  # /2   <-- cut outs are processed as usual.
out = self._conv_and_pool(out, self.conv_block_2)  # /4
features_all = out.view(B, CT, N, out.shape[-2], out.shape[-1])  # Again this might need some testing

for i in range(n_scan):
    features_i = features_all[:, :, i, :, :]  # (B, CT, C, L)
    # combine current feature with memory
    out, sim = self.gate(features_i)  # (B, CT, C, L)

Each cutout from each scan, from each batch entry is processed independently, the only place they interact is in the batchnorm in the _conv_and_pool blocks. That's what I mentioned before. I don't think this is a huge issue though and post training it should give you the same results, up to GPU non-determinism and the likes.

Take of all this with a grain of salt though and test it for sure. I might be overlooking something stupid here and I'm only 95% sure this will work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants