Skip to content

Commit f939277

Browse files
committed
[card-cache] optimizations to avoid deadlocks
- deadlocks happened when cleanups coincided with heavy load - tests simulating a cleanup (disk + shared-objects) and heavy load together were successfully able to reproduce the deadlock situation - Locking at the `context` level: - Each time we want to clean up we lock to create a new context - All directories/processes are written within that new context. - Switching this context which ensure that all new processes get created differently and the cleanup process can safely remove everything. - Context also sets the read/write directory for the cache object used in the API endpoint. - All locking on the API side now always time-bound. The code time's out if it can't acquire a lock. - The method ensure that all operations won't in-definately hold the lock. - Changed defaults for minimum amount to time to wait for cards in the cache process to 20 seconds (helps make things snappier) - Added `timings` dict in card cache to optimize loading cycles (Ensured that it is set based on a per-card basis)
1 parent d24ee48 commit f939277

File tree

4 files changed

+193
-136
lines changed

4 files changed

+193
-136
lines changed

services/ui_backend_service/api/card.py

+3-2
Original file line numberDiff line numberDiff line change
@@ -153,11 +153,12 @@ async def get_card_content_by_hash(self, request):
153153
task,
154154
hash,
155155
)
156+
html_reload_script = "<script>window.needsReload = true;</script><body></body>"
156157
if cards is None:
157158
return web.Response(
158159
content_type="text/html",
159160
status=404,
160-
body="Card not found for task. Possibly still being processed. Please refresh page to check again.",
161+
body=html_reload_script,
161162
)
162163

163164
if cards and hash in cards:
@@ -275,7 +276,7 @@ async def get_card_data_for_task_async(
275276
step_name=task.get("step_name"),
276277
task_id=task.get("task_name") or task.get("task_id"),
277278
)
278-
await cache_client.cache_manager.register(pathspec)
279+
await cache_client.cache_manager.register(pathspec, lock_timeout=0.2)
279280
_local_cache = cache_client.cache_manager.get_local_cache(pathspec, card_hash)
280281
if not _local_cache.read_ready():
281282
# Since this is a data update call we can return a 404 and the client

0 commit comments

Comments
 (0)