📄️ Data Generation
This Spark job generates synthetic retail data for benchmarking Spark RAPIDS performance on Kubernetes. The generated data includes sales, customer, and product information in CSV format and is designed for large-scale testing scenarios. By default, the script generates:
📄️ Benchmarks
This guide provides a step-by-step process to run Spark RAPIDS benchmarks on both CPUs and GPUs. It includes detailed instructions on how to configure, execute, monitor, and compare the results efficiently. The benchmarks will help you assess performance improvements when using GPUs for data processing and machine learning predictions.