-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Psa crypto dcache #20260
base: main
Are you sure you want to change the base?
Psa crypto dcache #20260
Conversation
CI InformationTo view the history of this post, clich the 'edited' button above Inputs:Sources:sdk-nrf: PR head: b57d288b1d33763dbbf8d042d518edcf4eba8350 more detailssdk-nrf:
Github labels
List of changed files detected by CI (22)
Outputs:ToolchainVersion: 4ffa2202d5 Test Spec & Results: ✅ Success; ❌ Failure; 🟠 Queued; 🟡 Progress; ◻️ Skipped;
|
You can find the documentation preview for this PR at this link. Note: This comment is automatically posted by the Documentation Publish GitHub Action. |
9db5297
to
4ab0fa8
Compare
6ae8aa0
to
51a1eff
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🥵
51a1eff
to
e32263e
Compare
You can find the documentation preview for this PR here. |
d4c3de6
to
854256e
Compare
@jonathannilsen @tomi-font thanks for the previous review, can you please have another look? I tried making it fully correct now, that means we need to create an extra buffer for pointers to data getting written from the remote, as that could lead to a situation where the local domain and the remote domain both could write to the same dataunit, e.g. a static uint8_t buffer[5] and bool in the same struct. It tries avoiding the copy by checking for alignment. I tried having it in a way that it gets optimized away if the referenced structure is already aligned, but couldn't have it look acceptable so figured this way it is at least consistent. |
854256e
to
b80100e
Compare
b80100e
to
4b6d8bd
Compare
63c8f9a
to
56d2a3a
Compare
47a0d0c
to
137c871
Compare
|
@tomi-font @jonathannilsen hei, please check again now, CI is green after the testcases were adjusted to not pass in full buffer sizes. The bounce buffer handling can be optionally disabled. |
We can skip remote psa init, so remove the call but leave the handler intact to not break the interface. Also add new functions returning not supported. Signed-off-by: Karsten Koenig <karsten.koenig@nordicsemi.no>
Removed pointless whitespaces all over the service. Signed-off-by: Karsten Koenig <karsten.koenig@nordicsemi.no>
Need to writeback and invalidate all buffers that get sent/received from the remote to make sure we have consistent view of the data for all involved parties. Signed-off-by: Karsten Koenig <karsten.koenig@nordicsemi.no>
Reduced the cache handling calls based on pointer usage, data only going in to the psa call only needs to be flushed, data only getting read back just needs to be invalidated and data going in and out needs both. Signed-off-by: Karsten Koenig <karsten.koenig@nordicsemi.no>
2fc85d6
to
890e103
Compare
int "Size of the heap buffer used for out buffer" | ||
default 4096 | ||
help | ||
Size of the heap buffer used for out buffer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should document to the user that with SSF_PSA_CRYPTO, performance will improve when PSA calls use output buffers that are aligned to the DataUnit size.
And specify that we consider a buffer to be aligned when it both starts and ends at an address aligned with the DataUnit size. As opposed to the C concept of aligned, that just deals with the start address.
And also that PSA calls to unaligned buffers larger than SSF_PSA_CRYPTO_SERVICE_OUT_HEAP_SIZE will return PSA_ERROR_INSUFFICIENT_MEMORY.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this could be a good place:
Although we don't need to do this in this PR...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nrf54h20 doesn't seem to be covered much anyways there, I'll ask around if there is a plan for this.
|
||
if ((IS_ALIGNED(original_buffer, CACHE_DATA_UNIT_SIZE)) && | ||
(IS_ALIGNED(size, CACHE_DATA_UNIT_SIZE))) { | ||
out_buffer = original_buffer; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should not buffer pointers to uncached memory areas.
Use-case:
Output buffer is too large to practically buffer it.
The protocol has determined the size, and it's not word-aligned.
Maybe this use-case doesn't exist, but if it did, the user would have no choice except to configure the MPU to not cache the buffer. And we would need to check if original_buffer is pointing to an uncached area at this point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose this use-case is rare.
So I'm OK with doing nothing here wrt. the review.
But our team should at least be aware that we could be unnecessarily buffering uncached memory :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This only happens on output buffers, and those can be allocated to any size and the server will just fill what is needed and report the length it actually filled, so it is easy for the client to just pass in 4B aligned buffers for output, even if they expect an unaligned size of data back. That'd be a lot easier than adding an MPU region at least.
I guess we could try harder here to figure out if it is uncached, but to my mind that'd already be an active part on the client side and then they might just as well turn off the bounce buffers if they are confident they aren't using unaligned buffers. That also means they get to save 4k of RAM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there are some PSA operations where the user cannot pad the output buffer themselves.
i.e.
https://arm-software.github.io/psa-api/crypto/1.1/api/ops/kdf.html#c.psa_key_derivation_output_bytes
For this operation, the output_length must match exactly the amount of bytes that is actually written by PSA. It cannot be an oversized/padded buffer.
Because of:
I hope I am mistaken though!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"That'd be a lot easier than adding an MPU region at least."
I am hoping that, if not supported by NCS already, then at least eventually, the user can just do:
__nocache my_buffer;
And Zephyr will aggregate all the uncached buffers and put them in the same memory region.
See __nocache in https://docs.zephyrproject.org/latest/hardware/cache/index.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but you'll have to add that memory region into the already fairly convoluted memory layout. But maybe it makes sense for users to do this out of principle anyways for crypto operations as those will always cause an extra copy of the data that potentially also needs to be erased when trying to handle those safely.
I'd still leave that up for a potential follow up performance improvement/simplification in use
psa_status_t ssf_psa_crypto_init(void) | ||
#if defined(CONFIG_SSF_PSA_CRYPTO_SERVICE_OUT_BOUNCE_BUFFERS_ENABLED) | ||
|
||
#define CACHE_DATA_UNIT_SIZE 4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we get this from DT?
Joking aside, we subtly rely on CACHE_DATA_UNIT_SIZE == sizeof(uintptr) as k_heap_alloc returns addresses aligned to the pointer.
I'm worried this requirement will subtly fail if they bump the cache line size on the next platform. Adding an assert and a comment would ease portability I think.
/* k_heap_alloc allocated memory is aligned on a multiple of pointer sizes. The HW's DataUnit size must match this Zephyr behaviour. */
BUILD_ASSERT(DCACHEDATA_DATAWIDTH * 4 == sizeof(uintptr))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah nice catch, yes I meant to check for this at build time but forgot. Also good catch that DCACHEDATA_DATAWIDTH is in the MDK, I didn't realize we had that DataUnit size information in there. I'll use this instead of harcoding it to 4.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reviewed all but psa_crypto_service.c
psa_status_t ssf_psa_crypto_init(void) | ||
{ | ||
int err; | ||
struct psa_crypto_req req = { 0 }; | ||
struct psa_crypto_rsp rsp = { 0 }; | ||
|
||
req.psa_crypto_req_msg_choice = psa_crypto_req_msg_psa_crypto_init_req_m_c; | ||
|
||
err = ssf_client_send_request(&psa_crypto_srvc, &req, &rsp, NULL); | ||
if (err != 0) { | ||
return err; | ||
} | ||
|
||
return rsp.psa_crypto_rsp_status; | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The implementation of this function is added in the previous commit, and removed in this commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was to align different versions of the generator script that I wasn't the author of, I just tried to have a synchronized history between the script and the commits here
help | ||
Size of the heap buffer used for out buffer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Useless help message?
45f3548
to
0c286d0
Compare
Added bounce buffers for any field written from secure domain to local domain, as we can't prevent the local domain to have dirty data in the DataUnit that the secure domain writes to. Signed-off-by: Karsten Koenig <karsten.koenig@nordicsemi.no>
Enabled DCache is now fully supported by the PSA API and hence not disabled anymore in the respective crypto samples. Signed-off-by: Karsten Koenig <karsten.koenig@nordicsemi.no>
Update the parameter names to follow the change in the header file. Also used latest zcbor which introduced a style change. Signed-off-by: Karsten Koenig <karsten.koenig@nordicsemi.no>
0c286d0
to
b57d288
Compare
@tejlmand @nordicjm Could you have a look please? The Jenkins failure passed on a rerun and the sonarcloud failure is certainly correct in the duplication of code but that's because it is a generated file. We have a plan to migrate to a different solution eventually but for now we need to fix the DCache handling when using cryptography |
you need to re-run the whole job, not an individual sub-job, it will only re-run the sub-jobs that have failed and update the github status. The sonar one is advisory, it is not mandatory |
Enable dcache on the crypto using code for nrf54h20
test_crypto: PR-762