Class GraphExecution

Object
org.apache.spark.sql.pipelines.graph.GraphExecution
All Implemented Interfaces:
org.apache.spark.internal.Logging
Direct Known Subclasses:
TriggeredGraphExecution

public abstract class GraphExecution extends Object implements org.apache.spark.internal.Logging
  • Constructor Details

  • Method Details

    • determineFlowExecutionActionFromError

      public static GraphExecution.FlowExecutionAction determineFlowExecutionActionFromError(scala.Function0<Throwable> ex, scala.Function0<String> flowDisplayName, scala.Function0<Object> currentNumTries, scala.Function0<Object> maxAllowedRetries)
      Analyze the exception thrown by flow execution and figure out if we should retry the execution, or we need to reanalyze the flow entirely to resolve issues like schema changes. This should be the narrow waist for all exception analysis in flow execution. TODO: currently it only handles schema change and max retries, we should aim to extend this to include other non-retryable exception as well so we can have a single SoT for all these error matching logic.
      Parameters:
      ex - Exception to analyze.
      flowDisplayName - The user facing flow name with the error.
      currentNumTries - Number of times the flow has been tried.
      maxAllowedRetries - Maximum number of retries allowed for the flow.
      Returns:
      (undocumented)
    • org$apache$spark$internal$Logging$$log_

      public static org.slf4j.Logger org$apache$spark$internal$Logging$$log_()
    • org$apache$spark$internal$Logging$$log__$eq

      public static void org$apache$spark$internal$Logging$$log__$eq(org.slf4j.Logger x$1)
    • LogStringContext

      public static org.apache.spark.internal.Logging.LogStringContext LogStringContext(scala.StringContext sc)
    • graphForExecution

      public DataflowGraph graphForExecution()
    • streamTrigger

      public abstract Trigger streamTrigger(Flow flow)
      The `Trigger` configuration for a streaming flow.
    • flowExecutions

      public scala.collection.concurrent.TrieMap<org.apache.spark.sql.catalyst.TableIdentifier,FlowExecution> flowExecutions()
      FlowExecutions currently being executed and tracked by the graph execution.
      Returns:
      (undocumented)
    • planAndStartFlow

      public scala.Option<FlowExecution> planAndStartFlow(ResolvedFlow flow)
      Plans the logical ResolvedFlow into a FlowExecution and then starts executing it. Implementation note: Thread safe

      Parameters:
      flow - (undocumented)
      Returns:
      None if the flow planner decided that there is no actual update required here. Otherwise returns the corresponding physical flow.
    • start

      public void start()
      Starts the execution of flows in graphForExecution. Does not block.
    • stop

      public void stop()
      Stops this execution by stopping all streams and terminating any other resources.

      This method may be called multiple times due to race conditions and must be idempotent.

    • stopFlow

      public void stopFlow(FlowExecution pf)
      Stops execution of a `FlowExecution`.
    • awaitCompletion

      public abstract void awaitCompletion()
      Blocks the current thread while any flows are queued or running. Returns when all flows that could be run have completed. When this returns, all flows are either SUCCESSFUL, TERMINATED_WITH_ERROR, SKIPPED, CANCELED, or EXCLUDED.
    • getRunTerminationReason

      public abstract RunTerminationReason getRunTerminationReason()
      Returns the reason why this flow execution has terminated. If the function is called before the flow has not terminated yet, the behavior is undefined, and may return UnexpectedRunFailure.
      Returns:
      (undocumented)
    • maxRetryAttemptsForFlow

      public int maxRetryAttemptsForFlow(org.apache.spark.sql.catalyst.TableIdentifier flowName)
    • stopThread

      public void stopThread(Thread thread)
      Stop a thread timeout.
      Parameters:
      thread - (undocumented)