Skip to main content

Dir

A generic directory class representing a directory with files of a specified format. Provides both async and sync interfaces for directory operations. Users are responsible for handling all I/O - the type transformer for Dir does not do any automatic uploading or downloading of files. The generic type T represents the format of the files in the directory.

Attributes

  • path: string

    • Represents either a local or remote path.
  • name: Optional[str] = None

    • The name of the directory.
  • format: string = ""

    • The format of the files in the directory.
  • hash: Optional[str] = None

    • An optional hash value for the directory.

Constructors

  • Initializes a Dir instance.

Args: path (str): The path to the directory, which can be local or remote. name (Optional[str], optional): An optional name for the directory. If not provided, it will be derived from the path. Defaults to None. format (str, optional): The format of the files within the directory. Defaults to "". hash (Optional[str], optional): An optional hash value for the directory, often used for caching. Defaults to None.

  • Parameters

    • path: str
      • The path to the directory, which can be local or remote.
    • name: Optional[str]
      • An optional name for the directory. If not provided, it will be derived from the path.
    • format: str
      • The format of the files within the directory.
    • hash: Optional[str]
      • An optional hash value for the directory, often used for caching.

Methods

def pre_init(data: object) - > object
  • Validator to set the directory name if not provided.

  • Parameters

    • data: object
      • The input data dictionary.
  • Return Value: object

    • The data dictionary with the name field populated.
def schema_match(incoming: dict) - > bool
  • Checks if the schema of an incoming dictionary matches the schema of this Dir class.

  • Parameters

    • incoming: dict
      • The incoming dictionary to compare schemas with.
  • Return Value: bool

    • True if the schemas match, False otherwise.
def walk(recursive: bool = True, max_depth: Optional[int]) - > AsyncIterator[[File](src_flyte_io__file_file)[T]]
  • Asynchronously walks through the directory and yields File objects.

  • Parameters

    • recursive: bool
      • If True, recursively walk subdirectories.
    • max_depth: Optional[int]
      • Maximum depth for recursive walking.
  • Return Value: AsyncIterator[File[T]]

    • An asynchronous iterator yielding File objects.
def walk_sync(recursive: bool = True, file_pattern: str = "*", max_depth: Optional[int]) - > Iterator[[File](src_flyte_io__file_file)[T]]
  • Synchronously walks through the directory and yields File objects.

  • Parameters

    • recursive: bool
      • If True, recursively walk subdirectories.
    • file_pattern: str
      • Glob pattern to filter files.
    • max_depth: Optional[int]
      • Maximum depth for recursive walking.
  • Return Value: Iterator[File[T]]

    • An iterator yielding File objects.
def list_files()
  • Asynchronously gets a list of all files in the directory (non-recursive).

  • Return Value: List[File[T]]

    • A list of File objects.
def list_files_sync()
  • Synchronously gets a list of all files in the directory (non-recursive).

  • Return Value: List[File[T]]

    • A list of File objects.
def download(local_path: Optional[Union[str, Path]]) - > str
  • Asynchronously downloads the entire directory to a local path.

  • Parameters

    • local_path: Optional[Union[str, Path]]
      • The local path to download the directory to. If None, a temporary directory will be used.
  • Return Value: str

    • The path to the downloaded directory.
def download_sync(local_path: Optional[Union[str, Path]]) - > str
  • Synchronously downloads the entire directory to a local path.

  • Parameters

    • local_path: Optional[Union[str, Path]]
      • The local path to download the directory to. If None, a temporary directory will be used.
  • Return Value: str

    • The path to the downloaded directory.
def from_local(local_path: Union[str, Path], remote_path: Optional[str], dir_cache_key: Optional[str]) - > [Dir](src_flyte_io__dir_dir)[T]
  • Asynchronously creates a new Dir by uploading a local directory to the configured remote store.

  • Parameters

    • local_path: Union[str, Path]
      • Path to the local directory.
    • remote_path: Optional[str]
      • Optional path to store the directory remotely. If None, a path will be generated.
    • dir_cache_key: Optional[str]
      • If you have a precomputed hash value you want to use when computing cache keys for discoverable tasks that this File is an input to.
  • Return Value: Dir[T]

    • A new Dir instance pointing to the uploaded directory.
def from_existing_remote(remote_path: str, dir_cache_key: Optional[str]) - > [Dir](src_flyte_io__dir_dir)[T]
  • Creates a Dir reference from an existing remote directory.

  • Parameters

    • remote_path: str
      • The remote path to the existing directory.
    • dir_cache_key: Optional[str]
      • Optional hash value to use for cache key computation. If not specified, the cache key will be computed based on this object's attributes.
  • Return Value: Dir[T]

    • A Dir instance referencing the existing remote directory.
def from_local_sync(local_path: Union[str, Path], remote_path: Optional[str]) - > [Dir](src_flyte_io__dir_dir)[T]
  • Synchronously creates a new Dir by uploading a local directory to the configured remote store.

  • Parameters

    • local_path: Union[str, Path]
      • Path to the local directory.
    • remote_path: Optional[str]
      • Optional path to store the directory remotely. If None, a path will be generated.
  • Return Value: Dir[T]

    • A new Dir instance pointing to the uploaded directory.
def exists()
  • Asynchronously checks if the directory exists.

  • Return Value: bool

    • True if the directory exists, False otherwise.
def exists_sync()
  • Synchronously checks if the directory exists.

  • Return Value: bool

    • True if the directory exists, False otherwise.
def get_file(file_name: str) - > Optional[[File](src_flyte_io__file_file)[T]]
  • Asynchronously gets a specific file from the directory.

  • Parameters

    • file_name: str
      • The name of the file to get.
  • Return Value: Optional[File[T]]

    • A File instance if the file exists, None otherwise.
def get_file_sync(file_name: str) - > Optional[[File](src_flyte_io__file_file)[T]]
  • Synchronously gets a specific file from the directory.

  • Parameters

    • file_name: str
      • The name of the file to get.
  • Return Value: Optional[File[T]]

    • A File instance if the file exists, None otherwise.