Skip to content

Commit d1a8096

Browse files
committed
Address cosmetic changes
1 parent e31d712 commit d1a8096

File tree

11 files changed

+83
-82
lines changed

11 files changed

+83
-82
lines changed

docs/commands/cp.md

Lines changed: 10 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,12 @@ usage: datachain cp [-h] [-v] [-q] [-r] [--team TEAM]
1414

1515
This command copies files and directories between local and/or remote storage. This uses the credentials in your system by default or can use the cloud authentication from Studio.
1616

17+
The command supports two main modes of operation:
18+
19+
- By default, the command operates directly with clouds using credentials in your system, supporting various copy scenarios between local and remote storage.
20+
- When using `-s` or `--studio-cloud-auth` flag, the command uses credentials from Studio for cloud operations. This mode provides enhanced authentication and access control for cloud storage operations.
21+
22+
1723
## Arguments
1824

1925
* `source_path` - Path to the source file or directory to copy
@@ -22,35 +28,20 @@ This command copies files and directories between local and/or remote storage. T
2228
## Options
2329

2430
* `-r`, `-R`, `--recursive` - Copy directories recursively
25-
* `--team TEAM` - Team name to use the credentials from.
31+
* `--team TEAM` - Team name to use the credentials from. (Default: from config)
2632
* `-s`, `--studio-cloud-auth` - Use credentials from Studio for cloud operations (Default: False)
2733
* `--update` - Update cached list of files for the source when downloading from cloud using local credentials.
2834
* `-h`, `--help` - Show the help message and exit
2935
* `-v`, `--verbose` - Be verbose
3036
* `-q`, `--quiet` - Be quiet
3137

32-
## Copy Operations
33-
34-
The command supports two main modes of operation:
3538

36-
### Default Mode
37-
By default, the command operates directly with clouds using credentials in ypur system, supporting various copy scenarios between local and remote storage.
38-
39-
### Studio Cloud Auth Mode
40-
When using `-s` or `--studio-cloud-auth` flag, the command uses credentials from Studio for cloud operations. This mode provides enhanced authentication and access control for cloud storage operations.
41-
42-
## Supported Storage Protocols
39+
## Notes
40+
* When using Studio cloud auth mode, you must be authenticated with `datachain auth login` before using it
41+
* The default mode operates directly with storage providers
4342

44-
The command supports the following storage protocols:
45-
- **Local file system**: Direct paths (e.g., `/path/to/directory` or `./relative/path`)
46-
- **AWS S3**: `s3://bucket-name/path`
47-
- **Google Cloud Storage**: `gs://bucket-name/path`
48-
- **Azure Blob Storage**: `az://container-name/path`
4943

5044
## Examples
51-
52-
The command automatically determines the operation type based on the source and destination protocols:
53-
5445
### Local to Local
5546

5647
**Operation**: Direct local file system copy
@@ -115,15 +106,3 @@ datachain cp gs://my-bucket/data/file.py gs://my-bucket/archive/file.py
115106
# Copy within same bucket with Studio cloud auth
116107
datachain cp gs://my-bucket/data/file.py gs://my-bucket/archive/file.py --studio-cloud-auth
117108
```
118-
119-
### Additional Examples
120-
121-
```bash
122-
# Copy with specific team:
123-
datachain cp -s --team other-team /path/to/file.txt s3://my-bucket/data/file.txt
124-
```
125-
126-
127-
## Notes
128-
* When using Studio cloud auth mode, you must be authenticated with `datachain auth login` before using it
129-
* The default mode operates directly with storage providers

docs/commands/mv.md

Lines changed: 8 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# mv
22

3-
Move storage files and directories through Studio.
3+
Move storage files and directories in clouds or local filesystem.
44

55
## Synopsis
66

@@ -11,7 +11,7 @@ usage: datachain mv [-h] [-v] [-q] [--recursive]
1111

1212
## Description
1313

14-
This command moves files and directories within storage. The move operation is performed within the same bucket - you cannot move files between different buckets. The command supports both individual files and directories, with the `--recursive` flag required for moving directories.
14+
This command moves files and directories within storage. The command supports both individual files and directories, with the `--recursive` flag required for moving directories.
1515

1616
## Arguments
1717

@@ -21,7 +21,7 @@ This command moves files and directories within storage. The move operation is p
2121
## Options
2222

2323
* `--recursive` - Move recursively
24-
* `--team TEAM` - Team name to move storage contents from
24+
* `--team TEAM` - Team name to use the credentials from. (Default: from config)
2525
* `-s`, `--studio-cloud-auth` - Use credentials from Studio for cloud operations (Default: False)
2626
* `-h`, `--help` - Show the help message and exit
2727
* `-v`, `--verbose` - Be verbose
@@ -31,6 +31,11 @@ This command moves files and directories within storage. The move operation is p
3131

3232
The command supports moving files and directories within the same bucket:
3333

34+
## Notes
35+
* When using Studio cloud auth mode, you must be authenticated with `datachain auth login` before using it
36+
* The default mode operates directly with storage providers
37+
* **Warning**: This is a destructive operation. Always double-check the path before executing the command
38+
3439
### Move Single File
3540

3641
```bash

docs/commands/rm.md

Lines changed: 9 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# rm
22

3-
Delete storage files and directories through Studio.
3+
Delete storage files and directories from cloud or local system.
44

55
## Synopsis
66

@@ -19,12 +19,19 @@ This command deletes files and directories within storage. The command supports
1919
## Options
2020

2121
* `--recursive` - Delete recursively
22-
* `--team TEAM` - Team name to delete storage contents from
22+
* `--team TEAM` - Team name to use the credentials from. (Default: from config)
2323
* `-s`, `--studio-cloud-auth` - Use credentials from Studio for cloud operations (Default: False)
2424
* `-h`, `--help` - Show the help message and exit
2525
* `-v`, `--verbose` - Be verbose
2626
* `-q`, `--quiet` - Be quiet
2727

28+
29+
## Notes
30+
* When using Studio cloud auth mode, you must be authenticated with `datachain auth login` before using it
31+
* The default mode operates directly with storage providers
32+
* **Warning**: This is a destructive operation. Always double-check the path before executing the command
33+
34+
2835
## Examples
2936

3037
The command supports deleting files and directories:
@@ -48,9 +55,3 @@ datachain rm gs://my-bucket/data/directory --recursive
4855
# Delete directory with Studio cloud auth
4956
datachain rm gs://my-bucket/data/directory --recursive --studio-cloud-auth
5057
```
51-
52-
53-
## Notes
54-
* When using Studio cloud auth mode, you must be authenticated with `datachain auth login` before using it
55-
* The default mode operates directly with storage providers
56-
* **Warning**: This is a destructive operation. Always double-check the path before executing the command

src/datachain/cli/__init__.py

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -115,36 +115,36 @@ def handle_command(args, catalog, client_config) -> int:
115115
return 1
116116

117117

118-
def _get_storage_implementation(args: "Namespace", catalog: "Catalog"):
118+
def _get_file_handler(args: "Namespace", catalog: "Catalog"):
119119
from datachain.cli.commands.storage import (
120120
LocalCredentialsBasedFileHandler,
121-
StorageCredentialFileHandler,
121+
StudioAuthenticatedFileHandler,
122122
)
123123
from datachain.config import Config
124124

125125
config = Config().read().get("studio", {})
126126
token = config.get("token")
127127
studio = False if not token else args.studio_cloud_auth
128128
return (
129-
StorageCredentialFileHandler(args, catalog)
129+
StudioAuthenticatedFileHandler(args, catalog)
130130
if studio
131131
else LocalCredentialsBasedFileHandler(args, catalog)
132132
)
133133

134134

135135
def handle_cp_command(args, catalog):
136-
storage_implementation = _get_storage_implementation(args, catalog)
137-
return storage_implementation.cp()
136+
file_handler = _get_file_handler(args, catalog)
137+
return file_handler.cp()
138138

139139

140140
def handle_mv_command(args, catalog):
141-
storage_implementation = _get_storage_implementation(args, catalog)
142-
return storage_implementation.mv()
141+
file_handler = _get_file_handler(args, catalog)
142+
return file_handler.mv()
143143

144144

145145
def handle_rm_command(args, catalog):
146-
storage_implementation = _get_storage_implementation(args, catalog)
147-
return storage_implementation.rm()
146+
file_handler = _get_file_handler(args, catalog)
147+
return file_handler.rm()
148148

149149

150150
def handle_clone_command(args, catalog):
Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,10 @@
11
from .local import LocalCredentialsBasedFileHandler
2-
from .studio import StorageCredentialFileHandler
2+
from .studio import StudioAuthenticatedFileHandler
33
from .utils import build_file_paths, validate_upload_args
44

55
__all__ = [
66
"LocalCredentialsBasedFileHandler",
7-
"StorageCredentialFileHandler",
7+
"StudioAuthenticatedFileHandler",
88
"build_file_paths",
99
"validate_upload_args",
1010
]

src/datachain/cli/commands/storage/local.py

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -12,8 +12,10 @@ def cp(self):
1212
source_path = self.args.source_path
1313
destination_path = self.args.destination_path
1414
destination_cls = Client.get_implementation(destination_path)
15+
1516
source_cls = Client.get_implementation(source_path)
1617
source_fs = source_cls.create_fs()
18+
_, src_subpath = source_cls.split_url(source_path)
1719

1820
update = self.args.update
1921
if source_cls.protocol == "file":
@@ -29,7 +31,9 @@ def cp(self):
2931
file_paths = {}
3032

3133
if not is_file:
32-
chain.to_storage(destination_path, placement="normpath")
34+
chain.to_storage(
35+
destination_path, placement="normpath", relative_to=src_subpath
36+
)
3337

3438
for (file,) in chain.to_iter("file"):
3539
if is_file:
@@ -40,7 +44,9 @@ def cp(self):
4044
)
4145
file.save(dst)
4246
else:
43-
dst = file.get_destination_path(destination_path, "normpath")
47+
dst = file.get_destination_path(
48+
destination_path, "normpath", relative_to=src_subpath
49+
)
4450
_, dst_path = destination_cls.split_url(str(dst))
4551
file_paths[dst_path] = file.size
4652

src/datachain/cli/commands/storage/studio.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@
77
from datachain.client.fsspec import Client
88

99

10-
class StorageCredentialFileHandler(CredentialBasedFileHandler):
10+
class StudioAuthenticatedFileHandler(CredentialBasedFileHandler):
1111
def cp(self):
1212
from datachain.client.fsspec import Client
1313

src/datachain/cli/parser/studio.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -169,7 +169,7 @@ def add_storage_parser(subparsers, parent_parser) -> None:
169169
storage_cp_parser.add_argument(
170170
"--team",
171171
action="store",
172-
help="Team name to copy storage contents to",
172+
help="Team name to use the credentials from.",
173173
)
174174

175175
storage_cp_parser.add_argument(
@@ -211,7 +211,7 @@ def add_storage_parser(subparsers, parent_parser) -> None:
211211
mv_parser.add_argument(
212212
"--team",
213213
action="store",
214-
help="Team name to move storage contents from",
214+
help="Team name to use the credentials from.",
215215
)
216216

217217
mv_parser.add_argument(
@@ -242,7 +242,7 @@ def add_storage_parser(subparsers, parent_parser) -> None:
242242
rm_parser.add_argument(
243243
"--team",
244244
action="store",
245-
help="Team name to delete storage contents from",
245+
help="Team name to use the credentials from.",
246246
)
247247
rm_parser.add_argument(
248248
"-s",

src/datachain/lib/dc/datachain.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2413,6 +2413,7 @@ def to_storage(
24132413
num_threads: Optional[int] = EXPORT_FILES_MAX_THREADS,
24142414
anon: Optional[bool] = None,
24152415
client_config: Optional[dict] = None,
2416+
relative_to: Optional[str] = None,
24162417
) -> None:
24172418
"""Export files from a specified signal to a directory. Files can be
24182419
exported to a local or cloud directory.
@@ -2468,6 +2469,7 @@ def to_storage(
24682469
link_type,
24692470
max_threads=num_threads or 1,
24702471
client_config=client_config,
2472+
relative_to=relative_to,
24712473
)
24722474
file_exporter.run(
24732475
(rows[0] for rows in chain.to_iter(signal)),

src/datachain/lib/file.py

Lines changed: 11 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -58,13 +58,15 @@ def __init__(
5858
link_type: Literal["copy", "symlink"],
5959
max_threads: int = EXPORT_FILES_MAX_THREADS,
6060
client_config: Optional[dict] = None,
61+
relative_to: Optional[str] = None,
6162
):
6263
super().__init__(max_threads)
6364
self.output = output
6465
self.placement = placement
6566
self.use_cache = use_cache
6667
self.link_type = link_type
6768
self.client_config = client_config
69+
self.relative_to = relative_to
6870

6971
def done_task(self, done):
7072
for task in done:
@@ -77,6 +79,7 @@ def do_task(self, file: "File"):
7779
self.use_cache,
7880
link_type=self.link_type,
7981
client_config=self.client_config,
82+
relative_to=self.relative_to,
8083
)
8184
self.increase_counter(1)
8285

@@ -422,10 +425,11 @@ def export(
422425
use_cache: bool = True,
423426
link_type: Literal["copy", "symlink"] = "copy",
424427
client_config: Optional[dict] = None,
428+
relative_to: Optional[str] = None,
425429
) -> None:
426430
"""Export file to new location."""
427431
self._caching_enabled = use_cache
428-
dst = self.get_destination_path(output, placement)
432+
dst = self.get_destination_path(output, placement, relative_to)
429433
dst_dir = os.path.dirname(dst)
430434
client: Client = self._catalog.get_client(dst_dir, **(client_config or {}))
431435
client.fs.makedirs(dst_dir, exist_ok=True)
@@ -549,7 +553,10 @@ def get_fs_path(self) -> str:
549553
return path
550554

551555
def get_destination_path(
552-
self, output: Union[str, os.PathLike[str]], placement: ExportPlacement
556+
self,
557+
output: Union[str, os.PathLike[str]],
558+
placement: ExportPlacement,
559+
relative_to: Optional[str] = None,
553560
) -> str:
554561
"""
555562
Returns full destination path of a file for exporting to some output
@@ -568,6 +575,8 @@ def get_destination_path(
568575
raise NotImplementedError("Checksum placement not implemented yet")
569576
elif placement == "normpath":
570577
path = unquote(self.get_path_normalized())
578+
if relative_to:
579+
path = posixpath.relpath(path, relative_to)
571580
else:
572581
raise ValueError(f"Unsupported file export placement: {placement}")
573582
return posixpath.join(output, path) # type: ignore[union-attr]

0 commit comments

Comments
 (0)