Skip to main content

Resource Allocation

Resource allocation is fundamental for defining the computational requirements of tasks, ensuring they receive adequate CPU, memory, GPU, disk, and shared memory to execute efficiently. By explicitly specifying these resources, developers can optimize task scheduling, manage infrastructure costs, and prevent resource contention.

Defining Task Resources

Tasks declare their resource needs using the Resources object, typically passed to a task decorator. This object allows for granular control over various resource types.

# Example of how Resources is used with a task decorator
# @task(resources=Resources(cpu=1, memory="1GiB"))
# def my_task():
# # Task implementation
# pass

The Resources object supports the following parameters:

  • cpu: Specifies the CPU cores. This can be an integer (e.g., 1 for one core), a float (e.g., 0.5 for half a core), or a string representing millicores (e.g., "500m"). To define a minimum request and a maximum limit, use a tuple (e.g., (1, 2) for 1 to 2 CPU cores).
  • memory: Defines the memory allocation. This is a string specifying the amount and unit (e.g., "1GiB", "512MiB"). Similar to CPU, a tuple can specify a memory range (e.g., ("1GiB", "2GiB")).
  • gpu: Allocates GPU resources. This can be an integer for the number of generic GPUs (e.g., 1), a string specifying the GPU type and quantity (e.g., "T4:1", "A100:8"), or a Device object for more detailed specifications.
  • disk: Sets the disk space required, specified as a string with amount and unit (e.g., "10GiB").
  • shm: Configures shared memory. This is a string specifying the size (e.g., "2GiB") or "auto" to let the system determine an appropriate size.

CPU Allocation

Specify CPU requirements as an integer or float for a fixed amount, or a tuple for a request/limit range.

# Request 1 CPU core
resources_fixed_cpu = Resources(cpu=1)

# Request 0.5 CPU cores (e.g., 500 millicores)
resources_fractional_cpu = Resources(cpu=0.5)

# Request 1 CPU core, with a limit up to 2 CPU cores
resources_cpu_range = Resources(cpu=(1, 2))

Memory Allocation

Memory is specified using a string that includes the value and unit (e.g., GiB, MiB). A tuple can define a memory range.

# Request 1 Gigabyte of memory
resources_fixed_memory = Resources(memory="1GiB")

# Request 2 Gigabytes of memory, with a limit up to 4 Gigabytes
resources_memory_range = Resources(memory=("2GiB", "4GiB"))

GPU Allocation

GPU allocation offers flexibility, from requesting a generic number of GPUs to specifying exact models and partitions.

Generic GPU Request: Provide an integer to request a specific number of GPUs without specifying their type. The system will allocate available GPUs.

# Request 1 generic GPU
resources_generic_gpu = Resources(gpu=1)

Specific GPU Type and Quantity: Use a string in the format "TYPE:QUANTITY" to request a specific GPU model and count.

# Request 1 NVIDIA T4 GPU
resources_t4_gpu = Resources(gpu="T4:1")

# Request 8 NVIDIA A100 GPUs
resources_a100_gpu = Resources(gpu="A100:8")

Advanced GPU Allocation with Device: For more granular control, especially when dealing with partitioned GPUs or specific accelerator types, use the Device object. The Device object allows specifying the device type, quantity, and an optional partition.

The Device object has the following attributes:

  • quantity: The number of devices requested (must be at least 1).
  • device: The specific type of device (e.g., "T4", "A100" for GPUs, or other accelerator types like TPUs).
  • partition: An optional string to specify a device partition. This is common for cloud provider GPUs (e.g., "1g.5gb", "2g.10gb") or TPUs (e.g., "1x1").
# Request 8 A100 GPUs without a specific partition
# Assuming Device is imported from a relevant module
resources_device_a100 = Resources(gpu=Device(device="A100", quantity=8))

# Request 1 T4 GPU with a specific partition
resources_device_t4_partition = Resources(gpu=Device(device="T4", quantity=1, partition="1g.5gb"))

When a Resources object is initialized with a string like "T4:1", its get_device() method internally parses this into a Device object, ensuring consistent handling of accelerator requests.

Disk Allocation

Specify disk space as a string with the value and unit.

# Request 10 Gigabytes of disk space
resources_disk = Resources(disk="10GiB")

Shared Memory (SHM) Allocation

Shared memory can be explicitly sized or set to "auto".

# Request 2 Gigabytes of shared memory
resources_shm_fixed = Resources(shm="2GiB")

# Request automatic shared memory allocation
resources_shm_auto = Resources(shm="auto")

Common Use Cases and Best Practices

  • Minimal Resource Allocation: For lightweight tasks, specify only the essential resources to minimize overhead and improve scheduling latency.
    # A simple task requiring minimal resources
    # @task(resources=Resources(cpu=0.5, memory="512MiB"))
    # def lightweight_job():
    # # Task implementation
    # pass
  • High-Performance Computing (HPC) Tasks: For compute-intensive workloads, allocate sufficient CPU, memory, and high-performance GPUs.
    # A machine learning training task
    # @task(resources=Resources(cpu=(4, 8), memory="32GiB", gpu="A100:4", disk="100GiB"))
    # def ml_training_job():
    # # Task implementation
    # pass
  • Resource Ranges for Flexibility: Using tuples for CPU and memory allows the scheduler more flexibility to place tasks while ensuring they have enough resources to operate and a cap to prevent runaway consumption.
  • Understanding Cloud Provider GPU Labels: When using Device with partition, consult your cloud provider's documentation for the exact partition labels (e.g., 1g.5gb for NVIDIA A100 GPUs on Google Cloud). Incorrect partition strings will lead to allocation failures.
  • Validation: The Resources object performs basic validation during initialization, such as ensuring CPU and GPU quantities are non-negative and tuple lengths are correct. Invalid resource specifications will raise ValueError.

Considerations

  • Over-allocation: Requesting more resources than a task genuinely needs can lead to inefficient resource utilization and increased costs, as well as potentially delaying task scheduling.
  • Under-allocation: Insufficient resources can cause tasks to fail, run slowly, or be preempted.
  • Shared Memory for IPC: Explicitly setting shm is crucial for tasks that rely heavily on inter-process communication (IPC) via shared memory, such as certain data processing or machine learning frameworks.