Skip to main content

PysparkFunctionTask

Actual Plugin that transforms the local python code for execution within a spark context

Attributes

  • plugin_config: Spark

    • Actual Plugin that transforms the local python code for execution within a spark context
  • task_type: string = "spark"

    • Actual Plugin that transforms the local python code for execution within a spark context

Methods

def pre(args: Any = None, kwargs: Any = None) - > Dict[str, Any]
  • This method is executed before the main task logic. It initializes a SparkSession and, if running in a cluster, adds the current working directory as a Python file to the Spark context. It returns the SparkSession object.

  • Parameters

    • args: Any
      • Positional arguments.
    • kwargs: Any
      • Keyword arguments.
  • Return Value: Dict[str, Any]

    • A dictionary containing the SparkSession object.
def custom_config(sctx: [SerializationContext](src_flyte_models_serializationcontext) = None) - > Dict[str, Any]
  • This method generates a SparkJob configuration based on the plugin's configuration and the serialization context. It includes settings for the driver and executor pods, Spark and Hadoop configurations, and application details. It returns the configuration as a dictionary.

  • Parameters

  • Return Value: Dict[str, Any]

    • A dictionary representing the SparkJob configuration.
def post(return_vals: Any = None) - > Any
  • This method is executed after the main task logic. It retrieves the SparkSession and stops it, releasing resources. It returns the value passed to it.

  • Parameters

    • return_vals: Any
      • The return values from the executed task.
  • Return Value: Any

    • The return value from the main task logic.