-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor file management system #1543
Labels
Backend
Backend related issue
Comments
lucassousaf
added a commit
that referenced
this issue
Aug 9, 2023
[Re #1543] * Visibility will determine in which bucket the file will be stored.
lucassousaf
added a commit
that referenced
this issue
Aug 9, 2023
[Re #1543] * This only take into consideration those files that are already stored in GCS. Other files stored in the database in base64 format or public URLs pointing to other sources are going to be migrated by code. We'll develop an admin API for that use case.
lucassousaf
added a commit
that referenced
this issue
Aug 9, 2023
[Re #1543] * We'll need to sign the URLs for files stored in private buckets so the FE can download the files without the need ask the user to login into GC.
lucassousaf
added a commit
that referenced
this issue
Aug 9, 2023
[Re #1543] * With this we can specify a vector containing N fns to execute in sequence. Each fn needs to signal either success or failure using a map like so: `{:success? true|false}` Extra details in the map are allowed and recommended as they are passed as the context for the next function in the sequence. Each fn node can have a rollback function associated with it in case the next fn in the sequence fails. Rollback functions are not mandatory and you can use them when needed. In the following commits there will be a full example of this mechanism in the new file service.
lucassousaf
added a commit
that referenced
this issue
Aug 9, 2023
[Re #1543] * Important to note the with-constraint-violation-check. This macro basically allows to execute a JDBC call specifying which constraint violation checks to look for in case it happens and return the desired error's reason. If it's not a PSQLException it catches anything else with Throwable.
lucassousaf
added a commit
that referenced
this issue
Aug 9, 2023
lucassousaf
added a commit
that referenced
this issue
Aug 9, 2023
[Re #1543] * Also added a generic function in the gpml.domain.types namespace to generate the Malli schema for a given type. Which comes in handy as we always endup writing the same code for enums.
lucassousaf
added a commit
that referenced
this issue
Aug 9, 2023
[Re #1543] Notable functions: - create-file - delete-file - get-file - get-file-url
lucassousaf
added a commit
that referenced
this issue
Aug 9, 2023
[Re #1543] * From now on GPML will store files in different buckets depending on its visibility. - stakeholders and organisations entities will save their files in a private bucket. - the rest of entities will save their files in a public bucket.
lucassousaf
added a commit
that referenced
this issue
Aug 9, 2023
lucassousaf
added a commit
that referenced
this issue
Aug 9, 2023
[Re #1543] * Misc: fixed clj-kondo warnings.
lucassousaf
added a commit
that referenced
this issue
Aug 9, 2023
lucassousaf
added a commit
that referenced
this issue
Aug 9, 2023
[Re #1543] * Fix get-public-file-url in file service to use the correct configuration setting.
lucassousaf
added a commit
that referenced
this issue
Aug 9, 2023
[Re #1543] * Using another random UUID can be a lot of noise. If the file doesn't have a filename which is the case for raw base64 strings. * Perphas an improvement for this would be for the client to send more information about the image file. But for this specific use case the filenames don't matter much since we'll not be listing them in the client side. * The filename will be useful for future developments where we'll have to store other files other than files such as workspace documents.
lucassousaf
added a commit
that referenced
this issue
Aug 9, 2023
lucassousaf
added a commit
that referenced
this issue
Aug 9, 2023
lucassousaf
added a commit
that referenced
this issue
Aug 9, 2023
lucassousaf
added a commit
that referenced
this issue
Aug 9, 2023
lucassousaf
added a commit
that referenced
this issue
Aug 9, 2023
lucassousaf
added a commit
that referenced
this issue
Aug 9, 2023
lucassousaf
added a commit
that referenced
this issue
Aug 9, 2023
lucassousaf
added a commit
that referenced
this issue
Aug 9, 2023
[Re #1543] * Useful if we want to convert a file to a domain when pulled from the DB using join and JSON aggregations.
lucassousaf
added a commit
that referenced
this issue
Aug 9, 2023
lucassousaf
added a commit
that referenced
this issue
Aug 9, 2023
lucassousaf
added a commit
that referenced
this issue
Aug 9, 2023
lucassousaf
added a commit
that referenced
this issue
Aug 9, 2023
lucassousaf
added a commit
that referenced
this issue
Aug 10, 2023
[Re #1543] * To be used at the API level since due to historic reasons the API handler's run inside a transaction and to signal failure and abort the transaction we have to throw an exception. This should be changed in the future and use a more ideal approach like we do in the gpml.service.file namespace.
lucassousaf
added a commit
that referenced
this issue
Aug 10, 2023
lucassousaf
added a commit
that referenced
this issue
Aug 15, 2023
[Re #1543] * Wrong destructuring when getting the old CV file id. * Also remove :cv and :photo keys from response as they are not needed anymore.
lucassousaf
added a commit
that referenced
this issue
Aug 15, 2023
[Re #1543] * Correct instances where we use image uploads.
lucassousaf
added a commit
that referenced
this issue
Aug 15, 2023
[Re #1543] * Mocks an object storage in the container's local file system.
lucassousaf
added a commit
that referenced
this issue
Aug 16, 2023
lucassousaf
added a commit
that referenced
this issue
Aug 16, 2023
[Re #1543] * We should always use the same name for properties in both BE and FE. In this case BE and DB uses picture while FE photo in some ocasions. So we should stick to a single name. * Frontend will do the necessary changes to update this as well.
lucassousaf
added a commit
that referenced
this issue
Aug 16, 2023
lucassousaf
added a commit
that referenced
this issue
Aug 16, 2023
[Re #1543] * The conditional was wrong. It should be a cond instead of cond->. We can only apply a single update not many in this case.
lucassousaf
added a commit
that referenced
this issue
Aug 16, 2023
joseAyudarte91
pushed a commit
that referenced
this issue
Aug 21, 2023
* Add migration to setup File model [Re #1543] * Visibility will determine in which bucket the file will be stored. * Add migration to adapt old files data to new file model [Re #1543] * This only take into consideration those files that are already stored in GCS. Other files stored in the database in base64 format or public URLs pointing to other sources are going to be migrated by code. We'll develop an admin API for that use case. * Refactor StorageClient protocol extension to support signing URLs [Re #1543] * We'll need to sign the URLs for files stored in private buckets so the FE can download the files without the need ask the user to login into GC. * Add thread transactions mechanism for executing seq of actions [Re #1543] * With this we can specify a vector containing N fns to execute in sequence. Each fn needs to signal either success or failure using a map like so: `{:success? true|false}` Extra details in the map are allowed and recommended as they are passed as the context for the next function in the sequence. Each fn node can have a rollback function associated with it in case the next fn in the sequence fails. Rollback functions are not mandatory and you can use them when needed. In the following commits there will be a full example of this mechanism in the new file service. * Add db/jdbc_util.clj namespace with JDBC utilities [Re #1543] * Important to note the with-constraint-violation-check. This macro basically allows to execute a JDBC call specifying which constraint violation checks to look for in case it happens and return the desired error's reason. If it's not a PSQLException it catches anything else with Throwable. * Impl. file persistence layer [Re #1543] * Impl. file domain layer [Re #1543] * Also added a generic function in the gpml.domain.types namespace to generate the Malli schema for a given type. Which comes in handy as we always endup writing the same code for enums. * Impl. file service for file management [Re #1543] Notable functions: - create-file - delete-file - get-file - get-file-url * Add private and public GCS bucket environment variables [Re #1543] * From now on GPML will store files in different buckets depending on its visibility. - stakeholders and organisations entities will save their files in a private bucket. - the rest of entities will save their files in a public bucket. * Add missing caml-snake-kebab dependency [Re #1543] * Improve get-file to also include the file URL [Re #1543] * Misc: fixed clj-kondo warnings. * Remove silly docstring on get-files* hugsql function [Re #1543] * Add storage API host name environment variable [Re #1543] * Fix get-public-file-url in file service to use the correct configuration setting. * Use file-id as part of filename [Re #1543] * Using another random UUID can be a lot of noise. If the file doesn't have a filename which is the case for raw base64 strings. * Perphas an improvement for this would be for the client to send more information about the image file. But for this specific use case the filenames don't matter much since we'll not be listing them in the client side. * The filename will be useful for future developments where we'll have to store other files other than files such as workspace documents. * Fix get-public-file-url missing bucket-name [Re #1543] * Fix get-file result map [Re #1543] * Instrument submission API to build image URLs [Re #1543] * Instrument event API to create and upload files [Re #1543] * Instrument browse API to build file URLs [Re #1543] * Add utility function to add URLs to files [Re #1543] * Refactor browse API to use file utility fn to add urls [Re #1543] * Add file domain utility function to decode obj with schema [Re #1543] * Useful if we want to convert a file to a domain when pulled from the DB using join and JSON aggregations. * Instrument detail GET API to add files URLs [Re #1543] * Instrument cummunity API to add files URLs [Re #1543] * Add missing environment variable [Re #1543] * Fix clj-kondo warnings on browse API [Re #1543] * Add file handler fn to create files [Re #1543] * To be used at the API level since due to historic reasons the API handler's run inside a transaction and to signal failure and abort the transaction we have to throw an exception. This should be changed in the future and use a more ideal approach like we do in the gpml.service.file namespace. * Update policy API to create files [Re #1543] * Update resource API to create files [Re #1543] * Update case_study API to create files [Re #1543] * Update initiative API to create files [Re #1543] * Update technology API to create files [Re #1543] * Update event API to create files [Re #1543] * Add get-files service function [Re #1543] * Clean up unnused stakeholder hugsql functions [Re #1543] * Update stakeholder hugsql function to support picture_id/cv_id [Re #1543] * Add delete-stakeholder* hugsql function [Re #1543] * We'll need this to rollback stakeholder creation in case of an error. * Add persistence layer delete-stakeholder and get-stakeholder fns [Re #1543] * Also declared the other existing function as it helps track which are used and which are not. * The new functions `delete-stakeholder` and `get-stakeholder` are intermediare functions to handle errors. In fact all other HugSQL functions should work this way. But this change can be done organically. * Extract download-image to gpml.util.image namespace [Re #1543] * We'll use it from other places too. * Impl stakeholder service [Re #1543] * In order to refactor the files creation and retrieval it needs a more controlled way to handle errors and rollbacks. So I decided to refactor the code to create, update and get stakeholder profile to better handle errors and rollbacks. * Refactor check-approved to get-user-info in auth-middleware [Re #1543] * In some endpoints we need the user information and we can inject it here when running the auth-middleware. User approval check is now done per endpoint using RBAC to identify if the users has the necessary permissions to consume the endpoint. This also take into consideration user approval. * Refactor stakeholder handler to use new stakeholder service [Re #1543] * Fix clj-kondo warnings [Re #1543] * Add mapping from mime-types to common-extensions [Re #1543] * We weren't storing the right extension for the file in some cases. The mime-type's suffix is not always the same as the extension name. for the most common mime-types such as `application/pdf`, `image/jpeg` and `image/png`, the extension of the file can be easily inferred by take the suffix of the mime-type (the string after the slash), but for other like docx files it's completely different. So, we are mapping those that are used in GPML to make sure we store the right extension name. * Add missing images files ids to initiative topic query [Re #1543] * We need this information later to process the files correctly. * Fix detail GET API using wrong resource type name [Re #1543] * Refactor detail PUT API to use new file management system [Re #1543] * Fix community API organisation picture key [Re #1543] * Fix stakeholder GET API using the wrong user-id [Re #1543] * Fix stakeholder PUT restricted API not adding user-id to body [Re #1543] * Fix community API not handling files URLs correctly [Re #1543] * Now we bring the files using the query and the add the URLs for each of them. * Since the query is of UNION type the image is placed in a common key that for historical reasons is `picture`. * Update organisation API to use new file manager [Re #1543] * Some functions didn't have the necessary information to do the file handling so I had to modify the references as well. * Update export API to add files URLs to exported entities [Re #1543] * Fix invitation API ig keys to use global common config [Re #1543] * Fix 192 migration to make organisation images private [Re #1543] * Impl case_study hugsql functions for file migration [Re #1543] * Impl stakeholder hugsql function for file migration [Re #1543] * Impl organisation hugsql function for file migration [Re #1543] * Impl event hugsql function for file migration [Re #1543] * Impl files-migrator API to programmatically migrate files [Re #1543] * This API endpoint will serve to trigger file migration for certain entities. These entities are `event`, `case_study`, `stakeholder` and `organisation. These entities' files can't be migrated using a database migration because some of records store files in a different. Some store files as base64 or public URLs (not pointing to GCS). So we have to go row by row and handle each case and finally upload the file to GCS. * This API will lose its purpose once the migration is done as well as the associated hugsql function. In which case we can safely delete everything. * Update brs_api_importer to use new file manager [Re #1543] * We'll have to test this properly when we resume BRS integration. * Fix wrong param destructuring in files-migrator API [Re #1543] * Add get-blob-signed-url method to file-system storage mock [Re #1543] * Remove deprecated handler/image_test.clj namespace [Re #1543] * We are going to remove its associatiated namespace in the following commits as well. * Remove deprecated handler/image usage in handler/resource_test [Re #1543] * Remove deprecated handler/image related code [Re #1543] * Fix stakeholder CV file creation in service layer [Re #1543] * Wrong destructuring when getting the old CV file id. * Also remove :cv and :photo keys from response as they are not needed anymore. * Update tests affected by new file management system [Re #1543] * Correct instances where we use image uploads. * Add storage-client mock adapter for tests [Re #1543] * Mocks an object storage in the container's local file system. * Bump google-cloud-storage version to 2.26.0 [Re #1543] * Update stakeholder API to use picture key instead of photo [Re #1543] * We should always use the same name for properties in both BE and FE. In this case BE and DB uses picture while FE photo in some ocasions. So we should stick to a single name. * Frontend will do the necessary changes to update this as well. * Fix clj-kondo warnings [Re #1543] * Fix conditional to add files URLs in detail API [Re #1543] * The conditional was wrong. It should be a cond instead of cond->. We can only apply a single update not many in this case. * Add exclusion for org.checkerframework/checker-qual [Re #1543]
joseAyudarte91
added a commit
that referenced
this issue
Aug 22, 2023
[Re #1543] We needed to take into account dash characters among the allowed/expected file names when building the temporal image tables for the "image" and "thumbnail" properties during the DB migration related to already stored GCS images for the different resource types. Otherwise, the file name string generation ends up with an empty value, which violates the NOT NULL constraint in the File table's Name column. After further checking the code and both Prod and Test Env DBs, we are confident that these are the only changes needed to fix the issue and avoid any other surprises (since we know how the code is building the image urls when uploading them to GCS).
joseAyudarte91
added a commit
that referenced
this issue
Aug 22, 2023
* Fix file management refactor DB migration [Re #1543] We needed to take into account dash characters among the allowed/expected file names when building the temporal image tables for the "image" and "thumbnail" properties during the DB migration related to already stored GCS images for the different resource types. Otherwise, the file name string generation ends up with an empty value, which violates the NOT NULL constraint in the File table's Name column. After further checking the code and both Prod and Test Env DBs, we are confident that these are the only changes needed to fix the issue and avoid any other surprises (since we know how the code is building the image urls when uploading them to GCS). * Fix formatting of misc. files
joseAyudarte91
added a commit
that referenced
this issue
Aug 22, 2023
[Re #1543] The DB query used to update the case studies with the migrated images was missing the actual `update table set` prefiex before the dynamic HugSQL-related part. After this the API should also work for migrating images that were not stored previously in GCS.
joseAyudarte91
added a commit
that referenced
this issue
Aug 22, 2023
[Re #1543] The DB query used to update the case studies with the migrated images was missing the actual `update table set` prefiex before the dynamic HugSQL-related part. After this the API should also work for migrating images that were not stored previously in GCS.
joseAyudarte91
added a commit
that referenced
this issue
Aug 24, 2023
[Re #1543] We were not including the namespace that extends the GCSStorageClient protocol with the methods to get blob signed URL and delete blob-relate functionality, that was preventing the reading of private image files from GCS. This was not reproduced locally, since there we load always all the namespaces, which is not the case with the Uberjar. So now we require the mentioned file as part of the File service, that is required already, fixing the issue.
joseAyudarte91
added a commit
that referenced
this issue
Aug 24, 2023
[Re #1543] We were not including the namespace that extends the GCSStorageClient protocol with the methods to get blob signed URL and delete blob-relate functionality, that was preventing the reading of private image files from GCS. This was not reproduced locally, since there we load always all the namespaces, which is not the case with the Uberjar. So now we require the mentioned file as part of the File service, that is required already, fixing the issue.
joseAyudarte91
added a commit
that referenced
this issue
Aug 24, 2023
[Re #1543] Removed forgotten print statement that shouldn't be there.
joseAyudarte91
added a commit
that referenced
this issue
Aug 24, 2023
[Re #1543] Removed forgotten print statement that shouldn't be there.
joseAyudarte91
added a commit
that referenced
this issue
Aug 28, 2023
[Re #1543] We were using the inverse logic to handle the result of deleting the image of a resource that was going to be replaced by another image as part of the update process. The main problem is that we were throwing an exception to rollback the DB-related updates on the resource data when the old image delete process was successful. However, we want the opposite and throw the exception only when the process goes wrong and continue with the DB-related updates otherwise. Now we have fixed that logic in all the affected places. Tests have been updated as well to avoid passing explicit nil `image` property when updating resources, since in that case we are expecting a previous old image when we do not have it for those cases.
joseAyudarte91
added a commit
that referenced
this issue
Aug 28, 2023
[Re #1543] We were not checking if we had a previous image before trying to delete it in the case of replacing the image of a resource by a nil value. Now we do it.
joseAyudarte91
added a commit
that referenced
this issue
Aug 28, 2023
[Re #1543] In the case of failing the process of updating a resource, when updating images, we could end-up keeping deleted image referemces from Object Storage (GCS), in the DB, as a result of rolling back the DB-related operation. Hence, if we end-up in that case, we try to delete those files from the DB, so at least it's clearer for the user and no broken links are displayed. This is the simples approach, since rollbacking fully the changes to re-create the files would imply more complexity and resources consumption for the server and it is not worth it, since the user wanted to replace those files anyway, so he can just try it again by uploading the new files in newer updates. This process has to be improved since we are doing the mentioned operations as part of a Catch exception block, so this code is not catched in another block.
Leaving this issue opened to perform some post-migration clean-up tasks. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
The current management of files (images, CVs, etc) have a few problems that can be a problem from both maintenance and database performance in the future:
Other entities store their images directly in Google Cloud Storage (GCS).
The idea is to centralize the images management in a way that:
The text was updated successfully, but these errors were encountered: