Category: Spark tuning

  • Tuning Spark code

    How some teams have lean spark workloads when others doing similar work are using 2-3x more compute and their jobs are running 5-10x longer. There isn’t any rocket science behind this. Follow these basic steps often. Minimize data shuffling: You can achieve this by partitioning data using columns that are frequently used for joins. Avoid…