Benchmarks | Spark RAPIDS on Kubernetes

📄️ Data Generation

This Spark job generates synthetic retail data for benchmarking Spark RAPIDS performance on Kubernetes. The generated data includes sales, customer, and product information in CSV format and is designed for large-scale testing scenarios. By default, the script generates:

📄️ Benchmarks

This guide provides a step-by-step process to run Spark RAPIDS benchmarks on both CPUs and GPUs. It includes detailed instructions on how to configure, execute, monitor, and compare the results efficiently. The benchmarks will help you assess performance improvements when using GPUs for data processing and machine learning predictions.