The second edition of the MLPerf Storage benchmark shows tested systems serving roughly twice the number of accelerators than in the 2023 v1.0 benchmark round.
MLCommons said this round of the benchmark saw dramatically increased participation, more geographic representation from submitting organizations, and greater diversity of the systems submitted for testing. The benchmark tests how storage systems perform on the Unet3D, Cosmoflow, and Resnet50 AI training tasks along with checkpoint performance on Llama 3 training runs.

Curtis Anderson, MLPerf Storage working group co-chair and Hammerspace field CTO, stated: “At the scale of computation being implemented for training large AI models, regular component failures are simply a fact of life. Checkpointing is now a standard practice in these systems to mitigate failures, and we are proud to be providing critical benchmark data on storage systems to allow stakeholders to optimize their training performance.”
As AI training clusters have increased the number of GPUs, the chance of a GPU failure increases, necessitating a training run to be halted and restarted from the last checkpoint, when intervening results were written to storage. MLCommons says that if the mean time to failure for an accelerator is 50,000 hours, then a 100,000-accelerator cluster running for extended periods at full utilization will likely experience a failure every half-hour. A cluster with one million accelerators would expect to see a failure every three minutes. The faster checkpoints can be written and read, the quicker a failed job can be restarted and the shorter the overall job run time.
MLPerf Storage v2.0 includes more than 200 performance results from 26 submitting organizations: Alluxio, Argonne National Lab, DDN, ExponTech, FarmGPU, H3C, Hammerspace, HPE, JNIST/Huawei, Juicedata, Kingston, Kioxia, Lightbits Labs, MangoBoost, Micron, Nutanix, Oracle, Quanta Computer, Samsung, Sandisk, Simplyblock, TTA, UBIX, IBM, Western Digital, and YanRong.

Western Digital, which makes disk drives, also supplies its OpenFlex Data24 2RU EBOF (Ethernet Box of Flash) and tested 24-drive and 48-drive versions of that product, fitted with KIOXIA CM7-V Series NVMe SSDs, and collaborating with high-performance storage software provider PEAK:AIO.
David Kanter, head of MLPerf at MLCommons, said: “This level of participation is a game-changer for benchmarking. It enables us to openly publish more accurate and more representative data on real-world systems. That, in turn, gives the stakeholders on the front lines the information and tools they need to succeed at their jobs. The checkpoint benchmark results are an excellent case in point: now that we can measure checkpoint performance, we can think about optimizing it.”
The v2.0 submissions included a more diverse set of technical approaches to delivering high-performance storage for AI training than v1.0, including:
- 6 local storage solutions
- 2 solutions using in-storage accelerators
- 13 software-defined solutions
- 12 block systems
- 16 on-prem shared storage solutions
- 2 object stores

Oana Balmau, MLPerf Storage working group co-chair and an Assistant Professor at McGill University, said: “Everything is scaling up: models, parameters, training datasets, clusters, and accelerators. It’s no surprise to see that storage system providers are innovating to support ever larger-scale systems.”
DDN issued a statement about its results, saying it “has set a new industry benchmark with its AI400X3 storage appliance” that “delivered record-breaking throughput and unmatched performance density, saturating hundreds of Nvidia H100 GPUs from a compact, energy-efficient 2RU system.” That’s performance density, not absolute numbers such as total accelerators (GPUs) and throughput in GiB/sec.
DDN says that in single-node benchmarking, the DDN AI400X3 achieved:
- The highest performance density on Cosmoflow and Resnet50 training, serving 52 and 208 simulated H100 GPUs with only a 2RU 2400 W appliance
- IO performance of 30.6 GBps reads and 15.3 GBps writes resulting in load and save times of Llama3-8b checkpoints of only 3.4 and 5.7 seconds respectively
In multi-node benchmarking, it achieved:
120.68 GBps sustained read throughput and 45 simulated accelerators for Unet3D H100 training.
- In the v1 benchmark it was 99.02 GBps and 36 accelerators
- Support for up to 640 simulated H100 GPUs on ResNet50
- Up to 135 simulated H100 GPUs on Cosmoflow with the new AI400X3, a 2x improvement over last year’s results.
DDN claimed that since 2016, Nvidia has relied exclusively on DDN to power its internal AI clusters.
Western Digital said that in the Unet3D workload, its OpenFlex Data24 achieved sustained read throughput of 106.5 GBps (99.2 GiB/s), saturating 36 simulated H100 GPUs across three physical client nodes. With the PEAK:AIO AI Data Server, OpenFlex Data24 was able to deliver 64.9 GBps (59.6 GiB/s), saturating 22 simulated H100 GPUs from a single head server and single client node. Kurt Chan, VP and GM, Western Digital Platforms Business, said: “The OpenFlex Data24 4000 Series NVMe-oF Storage Platform delivers near-saturation performance across demanding AI benchmarks, both standalone and with a single PEAK:AIO AI Data Server appliance, translating to faster time-to-results and reduced infrastructure sprawl.”
Full MLPerf Storage v2.0 benchmark results are available here.
MLPerf invites stakeholders to join the MLPerf Storage working group and help it continue to evolve the benchmark suite. A deeper understanding of the issues around storage systems and checkpointing, and the design of the checkpointing benchmarks, can be found in a post from Wes Vaske, an MLPerf Storage working group member.