Class BroadcastableJobInfo

  • All Implemented Interfaces:
    java.io.Serializable

    public final class BroadcastableJobInfo
    extends java.lang.Object
    implements java.io.Serializable
    Broadcastable wrapper for job information with ZERO transient fields to optimize Spark broadcasting.

    Only essential fields are broadcast; executors reconstruct CassandraJobInfo to rebuild TokenPartitioner.

    Why ZERO transient fields matters:
    Spark's SizeEstimator uses reflection to estimate object sizes before broadcasting. Each transient field forces SizeEstimator to inspect the field's type hierarchy, which is expensive. Logger references are particularly costly due to their deep object graphs (appenders, layouts, contexts). By eliminating ALL transient fields and Logger references, we:

    • Minimize SizeEstimator reflection overhead during broadcast preparation
    • Reduce broadcast variable serialization size
    • Avoid accidental serialization of non-serializable objects
    See Also:
    Serialized Form
    • Method Detail

      • from

        public static BroadcastableJobInfo from​(@NotNull
                                                JobInfo source,
                                                @NotNull
                                                BulkSparkConf conf)
        Creates a BroadcastableJobInfo from a source JobInfo. Extracts partition mappings from TokenPartitioner to avoid broadcasting Logger.
        Parameters:
        source - the source JobInfo (typically CassandraJobInfo)
        conf - the BulkSparkConf needed for executors