Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1543 refactor file management #1546

Merged
merged 75 commits into from
Aug 21, 2023
Merged

Conversation

lucassousaf
Copy link
Contributor

@lucassousaf lucassousaf commented Aug 16, 2023

  • All the file management now happens in the gpml.service.file namespace.
  • All files are now stored in the File table which is agnostic about the file type that is stored. It stores all the metadata related to the file. Other tables that need store files have a foreign key pointing to this table.
  • All files are stored in GCS in either a public or private bucket:
    • Organisation and stakeholder files are stored in a private bucket.
    • All other entities files are stored in the private bucket.
    • Beware that these rules are only true for images files and CVs in the case of stakeholder. Any other new documents would need to follow their own rules.
  • Update resources APIs to handle file management with the new system and also build the URLs for those files accordingly.
  • Improve stakeholder image management and as a side-effect the whole API was refactored as the process of managing images needs a more complex rollback system then the previous one. So there is a new service for stakeholders in the namespace gpml.service.stakeholder which handles create/update/get.

For more details on the task please refer to issue #1543.

[Re #1543]

* Visibility will determine in which bucket the file will be stored.
[Re #1543]

* This only take into consideration those files that are already
stored in GCS. Other files stored in the database in base64 format or
public URLs pointing to other sources are going to be migrated by
code. We'll develop an admin API for that use case.
[Re #1543]

* We'll need to sign the URLs for files stored in private buckets so
the FE can download the files without the need ask the user to login
into GC.
[Re #1543]

* With this we can specify a vector containing N fns to execute in
sequence. Each fn needs to signal either success or failure using a
map like so:

`{:success? true|false}`

Extra details in the map are allowed and recommended as they are passed
as the context for the next function in the sequence. Each fn node can
have a rollback function associated with it in case the next fn in the
sequence fails. Rollback functions are not mandatory and you can use
them when needed. In the following commits there will be a full
example of this mechanism in the new file service.
[Re #1543]

* Important to note the with-constraint-violation-check. This macro
basically allows to execute a JDBC call specifying which
constraint violation checks to look for in case it happens and return the
desired error's reason. If it's not a PSQLException it catches
anything else with Throwable.
[Re #1543]

* Also added a generic function in the gpml.domain.types namespace to
generate the Malli schema for a given type. Which comes in handy as we
always endup writing the same code for enums.
[Re #1543]

Notable functions:
- create-file
- delete-file
- get-file
- get-file-url
[Re #1543]

* From now on GPML will store files in different buckets depending on
its visibility.
- stakeholders and organisations entities will save their files in a
private bucket.
- the rest of entities will save their files in a public bucket.
[Re #1543]

* Misc: fixed clj-kondo warnings.
[Re #1543]

* Fix get-public-file-url in file service to use the correct
configuration setting.
[Re #1543]

* Using another random UUID can be a lot of noise. If the file doesn't
have a filename which is the case for raw base64 strings.
* Perphas an improvement for this would be for the client to send more
information about the image file. But for this specific use case the
filenames don't matter much since we'll not be listing them in the
client side.
* The filename will be useful for future developments where we'll have
to store other files other than files such as workspace documents.
[Re #1543]

* Useful if we want to convert a file to a domain when pulled from the DB using join
and JSON aggregations.
[Re #1543]

* To be used at the API level since due to historic reasons the API
handler's run inside a transaction and to signal failure and abort the
transaction we have to throw an exception. This should be changed in
the future and use a more ideal approach like we do in the
gpml.service.file namespace.
[Re #1543]

* Now we bring the files using the query and the add the URLs for each
of them.
* Since the query is of UNION type the image is placed in a common key
that for historical reasons is `picture`.
[Re #1543]

* Some functions didn't have the necessary information to do the file
handling so I had to modify the references as well.
[Re #1543]

* This API endpoint will serve to trigger file migration for certain
entities. These entities are `event`, `case_study`, `stakeholder` and
`organisation. These entities' files can't be migrated using a
database migration because some of records store files in a
different. Some store files as base64 or public URLs (not pointing to
GCS). So we have to go row by row and handle each case and finally
upload the file to GCS.
* This API will lose its purpose once the migration is done as well as
the associated hugsql function. In which case we can safely delete
everything.
[Re #1543]

* We'll have to test this properly when we resume BRS integration.
[Re #1543]

* We are going to remove its associatiated namespace in the following
commits as well.
[Re #1543]

* Wrong destructuring when getting the old CV file id.
* Also remove :cv and :photo keys from response as they are not
needed anymore.
[Re #1543]

* Correct instances where we use image uploads.
[Re #1543]

* Mocks an object storage in the container's local file system.
[Re #1543]

* We should always use the same name for properties in both BE and
FE. In this case BE and DB uses picture while FE photo in some
ocasions. So we should stick to a single name.
* Frontend will do the necessary changes to update this as well.
[Re #1543]

* The conditional was wrong. It should be a cond instead of cond->. We
can only apply a single update not many in this case.
@joseAyudarte91 joseAyudarte91 merged commit 17b7ff2 into main Aug 21, 2023
@joseAyudarte91
Copy link
Contributor

The PR has been merged as the first step to migrate the files from old GCS bucket to the test-env-related buckets has been performed (all the files were copied, but just the right ones will be used and the rest can be removed in another moment, since it would too much effort for little benefit to only copy the ones related to test env).

@joseAyudarte91 joseAyudarte91 deleted the 1543-refactor-file-management branch August 22, 2023 11:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants