Dir
A generic directory class representing a directory with files of a specified format. Provides both async and sync interfaces for directory operations. Users are responsible for handling all I/O - the type transformer for Dir does not do any automatic uploading or downloading of files. The generic type T represents the format of the files in the directory.
Attributes
-
path: string
- Represents either a local or remote path.
-
name: Optional[str] = None
- The name of the directory.
-
format: string = ""
- The format of the files in the directory.
-
hash: Optional[str] = None
- An optional hash value for the directory.
Constructors
- Initializes a Dir instance.
Args: path (str): The path to the directory, which can be local or remote. name (Optional[str], optional): An optional name for the directory. If not provided, it will be derived from the path. Defaults to None. format (str, optional): The format of the files within the directory. Defaults to "". hash (Optional[str], optional): An optional hash value for the directory, often used for caching. Defaults to None.
-
Parameters
- path: str
- The path to the directory, which can be local or remote.
- name: Optional[str]
- An optional name for the directory. If not provided, it will be derived from the path.
- format: str
- The format of the files within the directory.
- hash: Optional[str]
- An optional hash value for the directory, often used for caching.
- path: str
Methods
def pre_init(data: object) - > object
-
Validator to set the directory name if not provided.
-
Parameters
- data: object
- The input data dictionary.
- data: object
-
Return Value: object
- The data dictionary with the name field populated.
def schema_match(incoming: dict) - > bool
-
Checks if the schema of an incoming dictionary matches the schema of this Dir class.
-
Parameters
- incoming: dict
- The incoming dictionary to compare schemas with.
- incoming: dict
-
Return Value: bool
- True if the schemas match, False otherwise.
def walk(recursive: bool = True, max_depth: Optional[int]) - > AsyncIterator[[File](src_flyte_io__file_file)[T]]
-
Asynchronously walks through the directory and yields File objects.
-
Parameters
- recursive: bool
- If True, recursively walk subdirectories.
- max_depth: Optional[int]
- Maximum depth for recursive walking.
- recursive: bool
-
Return Value: AsyncIterator[File[T]]
- An asynchronous iterator yielding File objects.
def walk_sync(recursive: bool = True, file_pattern: str = "*", max_depth: Optional[int]) - > Iterator[[File](src_flyte_io__file_file)[T]]
-
Synchronously walks through the directory and yields File objects.
-
Parameters
- recursive: bool
- If True, recursively walk subdirectories.
- file_pattern: str
- Glob pattern to filter files.
- max_depth: Optional[int]
- Maximum depth for recursive walking.
- recursive: bool
-
Return Value: Iterator[File[T]]
- An iterator yielding File objects.
def list_files()
-
Asynchronously gets a list of all files in the directory (non-recursive).
-
Return Value: List[File[T]]
- A list of File objects.
def list_files_sync()
-
Synchronously gets a list of all files in the directory (non-recursive).
-
Return Value: List[File[T]]
- A list of File objects.
def download(local_path: Optional[Union[str, Path]]) - > str
-
Asynchronously downloads the entire directory to a local path.
-
Parameters
- local_path: Optional[Union[str, Path]]
- The local path to download the directory to. If None, a temporary directory will be used.
- local_path: Optional[Union[str, Path]]
-
Return Value: str
- The path to the downloaded directory.
def download_sync(local_path: Optional[Union[str, Path]]) - > str
-
Synchronously downloads the entire directory to a local path.
-
Parameters
- local_path: Optional[Union[str, Path]]
- The local path to download the directory to. If None, a temporary directory will be used.
- local_path: Optional[Union[str, Path]]
-
Return Value: str
- The path to the downloaded directory.
def from_local(local_path: Union[str, Path], remote_path: Optional[str], dir_cache_key: Optional[str]) - > [Dir](src_flyte_io__dir_dir)[T]
-
Asynchronously creates a new Dir by uploading a local directory to the configured remote store.
-
Parameters
- local_path: Union[str, Path]
- Path to the local directory.
- remote_path: Optional[str]
- Optional path to store the directory remotely. If None, a path will be generated.
- dir_cache_key: Optional[str]
- If you have a precomputed hash value you want to use when computing cache keys for discoverable tasks that this File is an input to.
- local_path: Union[str, Path]
-
Return Value: Dir[T]
- A new Dir instance pointing to the uploaded directory.
def from_existing_remote(remote_path: str, dir_cache_key: Optional[str]) - > [Dir](src_flyte_io__dir_dir)[T]
-
Creates a Dir reference from an existing remote directory.
-
Parameters
- remote_path: str
- The remote path to the existing directory.
- dir_cache_key: Optional[str]
- Optional hash value to use for cache key computation. If not specified, the cache key will be computed based on this object's attributes.
- remote_path: str
-
Return Value: Dir[T]
- A Dir instance referencing the existing remote directory.
def from_local_sync(local_path: Union[str, Path], remote_path: Optional[str]) - > [Dir](src_flyte_io__dir_dir)[T]
-
Synchronously creates a new Dir by uploading a local directory to the configured remote store.
-
Parameters
- local_path: Union[str, Path]
- Path to the local directory.
- remote_path: Optional[str]
- Optional path to store the directory remotely. If None, a path will be generated.
- local_path: Union[str, Path]
-
Return Value: Dir[T]
- A new Dir instance pointing to the uploaded directory.
def exists()
-
Asynchronously checks if the directory exists.
-
Return Value: bool
- True if the directory exists, False otherwise.
def exists_sync()
-
Synchronously checks if the directory exists.
-
Return Value: bool
- True if the directory exists, False otherwise.
def get_file(file_name: str) - > Optional[[File](src_flyte_io__file_file)[T]]
-
Asynchronously gets a specific file from the directory.
-
Parameters
- file_name: str
- The name of the file to get.
- file_name: str
-
Return Value: Optional[File[T]]
- A File instance if the file exists, None otherwise.
def get_file_sync(file_name: str) - > Optional[[File](src_flyte_io__file_file)[T]]
-
Synchronously gets a specific file from the directory.
-
Parameters
- file_name: str
- The name of the file to get.
- file_name: str
-
Return Value: Optional[File[T]]
- A File instance if the file exists, None otherwise.