Class BroadcastableTableSchema

  • All Implemented Interfaces:
    java.io.Serializable

    public final class BroadcastableTableSchema
    extends java.lang.Object
    implements java.io.Serializable
    Broadcastable wrapper for TableSchema with ZERO transient fields to optimize Spark broadcasting.

    Contains all essential fields from TableSchema needed on executors, but without the Logger reference. Executors will reconstruct TableSchema from these fields.

    Why ZERO transient fields matters:
    Spark's SizeEstimator uses reflection to estimate object sizes before broadcasting. Each transient field forces SizeEstimator to inspect the field's type hierarchy, which is expensive. Logger references are particularly costly due to their deep object graphs (appenders, layouts, contexts). By eliminating ALL transient fields and Logger references, we:

    • Minimize SizeEstimator reflection overhead during broadcast preparation
    • Reduce broadcast variable serialization size
    • Avoid accidental serialization of non-serializable objects
    See Also:
    Serialized Form
    • Method Detail

      • from

        public static BroadcastableTableSchema from​(@NotNull
                                                    TableSchema source)
        Creates a BroadcastableTableSchema from a source TableSchema. Extracts all essential fields but excludes the Logger.
        Parameters:
        source - the source TableSchema (driver-only)
        Returns:
        broadcastable version without Logger
      • getCreateStatement

        public java.lang.String getCreateStatement()
      • getModificationStatement

        public java.lang.String getModificationStatement()
      • getPartitionKeyColumns

        public java.util.List<java.lang.String> getPartitionKeyColumns()
      • getPartitionKeyColumnTypes

        public java.util.List<ColumnType<?>> getPartitionKeyColumnTypes()
      • getKeyFieldPositions

        public java.util.List<java.lang.Integer> getKeyFieldPositions()
      • getWriteMode

        public WriteMode getWriteMode()
      • getTtlOption

        public TTLOption getTtlOption()
      • getLowestCassandraVersion

        public java.lang.String getLowestCassandraVersion()
      • isQuoteIdentifiers

        public boolean isQuoteIdentifiers()
      • normalize

        public java.lang.Object[] normalize​(java.lang.Object[] row)
        Normalizes a row by applying type converters to each field. This mirrors the normalize method in TableSchema but uses the broadcast-safe converters list.
        Parameters:
        row - the row data to normalize
        Returns:
        the normalized row (same array instance, mutated in place)
      • getKeyColumns

        public java.lang.Object[] getKeyColumns​(java.lang.Object[] allColumns)
        Extracts key columns from all columns based on key field positions. This mirrors the getKeyColumns method in TableSchema but uses the broadcast-safe keyFieldPositions list.
        Parameters:
        allColumns - all columns in the row
        Returns:
        array containing only the key columns
      • getKeyColumns

        @NotNull
        public static java.lang.Object[] getKeyColumns​(java.lang.Object[] allColumns,
                                                       java.util.List<java.lang.Integer> keyFieldPositions)