Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pin stack overflow occurs when running SPEC CPU2017 621.wrf_s. #7

Open
icyclv opened this issue Apr 3, 2024 · 5 comments
Open

Pin stack overflow occurs when running SPEC CPU2017 621.wrf_s. #7

icyclv opened this issue Apr 3, 2024 · 5 comments

Comments

@icyclv
Copy link

icyclv commented Apr 3, 2024

Hi, thank you for the Looppoint. Recently, I've been trying to use LoopPoint to collect representative regions on SPEC CPU. Following the readme file, I successfully ran SPEC CPU2017 603.bwaves (command: ./speed_bwaves_base.icc bwaves_1 < bwaves_1.in). However, when I run SPEC CPU2017 621.wrf_s, it shows a Pin stack overflow error. Do I need to adjust some settings?

Here is my cfg file:

[Parameters]
program_name: loopiccgo2
input_name: 1
command: ./wrf_s_base.icc

The log of the exception section is as follows:

***  Finished generating whole program pinballs [log_whole]  ***    April 03, 2024 17:28:13

+++  Using whole program pinballs in dir: whole_program.1

***  TRACING: END  ***    April 03, 2024 17:28:13
Running commands:
/mnt/hdd/users/ycchang/code/performance/looppoint/tools/sde-external-9.14.0-2022-10-25-lin/pinplay-scripts/replay.py --pintool=sde-global-looppoint.so  --pintool_options -dcfg -replay:deadlock_timeout 0 -replay:strace -dcfg:out_base_name /mnt/hdd/users/ycchang/code/performance/looppoint/apps/wrf_icc_o2/custom-loopiccgo2-1-test-passive-8-20240403172006/whole_program.1/loopiccgo2.1_2882515 /mnt/hdd/users/ycchang/code/performance/looppoint/apps/wrf_icc_o2/custom-loopiccgo2-1-test-passive-8-20240403172006/whole_program.1/loopiccgo2.1_2882515

......
WRF NUMBER OF TILES =   8
......
wrf: SUCCESS COMPLETE WRF
In: 
Thread: 0
PID: 2889645
SYSTEM TID: 2889645
Exception code: ACCESS_DENIED
Exception Class: 2
Faulty AccessType : 0
Exception address: 0x14923732b008
E: Pin stack overflow in thread 2889645

It also causes subsequent tasks to fail. The complete log file is attached.

Thank you for your project, and I look forward to your response.

looppoint.log.txt

@alenks
Copy link
Collaborator

alenks commented Apr 3, 2024

Not quite sure what is causing the failure. Can you rerun turning off flow-control (--no-flowcontrol)?
Also, it'd be helpful to know if the problem repeats with other Pin/SDE tools using the same binary. Can you try running a simple SDE tool, like the mix tool ($SDE_BUILD_KIT/sde -mix -- ./wrf_s_base.icc), to verify that?

@icyclv
Copy link
Author

icyclv commented Apr 5, 2024

After selecting --no-flowcontrol, the error still occurs. Additionally, it seems that 'sde -mix' can run normally.

I tested other workloads from SPEC CPU (like pop2), and they also run without error. I suspect there might be some specific issues with WRF.

@alenks
Copy link
Collaborator

alenks commented Apr 7, 2024

I have introduced a new flag --binary-profile to the run-looppoint.py script to enable binary profiling of the application instead of relying on a pinball. Could you test this option with wrf to see if it resolves the issue?

@icyclv
Copy link
Author

icyclv commented Apr 8, 2024

Sorry, I tested WRF with the flag --binary-profile, but it seems to still show a stack overflow error. Here's the log:

Running commands:
/mnt/hdd/users/ycchang/code/performance/looppoint/tools/sde-external-9.14.0-2022-10-25-lin/sde64 -t sde-global-looppoint.so -dcfg -dcfg:out_base_name /mnt/hdd/users/ycchang/code/performance/looppoint/apps/wrf_icc_o2/custom-loopiccgo2-1-test-passive-8-20240408170440/whole_program.1/dcfg-out -- ./wrf_s_base.icc

......
WRF NUMBER OF TILES FROM OMP_GET_MAX_THREADS =   8
......
wrf: SUCCESS COMPLETE WRF
In: 
Thread: 0
PID: 2891854
SYSTEM TID: 2891854
Exception code: ACCESS_DENIED
Exception Class: 2
Faulty AccessType : 0
Exception address: 0x1515921a5008
E: Pin stack overflow in thread 2891854

@alenks
Copy link
Collaborator

alenks commented Apr 9, 2024

@hgpatil, Have you seen this problem before?
@icyclv, Meanwhile, could you see what is causing the issue exactly by enabling debugging (-pause_tool 20) info? See the link for details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants