Understanding and leveraging the I/O patterns of emerging machine learning analytics


Understanding and leveraging the I/O patterns of emerging machine learning analytics

Gainaru, A.; Ganyushin, D.; Xie, B.; Kurc, T.; Saltz, J.; Oral, S.; Podhorszki, N.; Pöschel, F.; Hübl, A.; Klasky, S.

The scientific community is currently experiencing unprecedented amounts of data generated by cutting-edge science facilities. Soon facilities will be producing up to 1 PB/s which will force scientist to use more autonomous techniques to learn from the data. The adoption of machine learning methods, like deep learning techniques, in large-scale workflows comes with a shift in the workflow’s computational and I/O patterns. These changes often include iterative processes and model architecture searches, in which datasets are analyzed multiple times in different formats with different model configurations in order to find accurate, reliable and efficient learning models. This shift in behavior brings changes in I/O patterns at the application level as well at the system level. These changes also bring new challenges for the HPC I/O teams, since these patterns contain more complex I/O workloads. In this paper we discuss the I/O patterns experienced by emerging analytical codes that rely on machine learning algorithms and highlight the challenges in designing efficient I/O transfers for such workflows. We comment on how to leverage the data access patterns in order to fetch in a more efficient way the required input data in the format and order given by the needs of the application and how to optimize the data path between collaborative processes. We will motivate our work and show performance gains with a study case of medical applications.

Keywords: emerging HPC applications; deep learning methods; I/O patterns; I/O optimization; data management

  • Contribution to proceedings
    Smoky Mountains Computational Sciences & Engineering Conference (SMC2021), 18.-20.10.2021, Oak Ridge, USA

Permalink: https://www.hzdr.de/publications/Publ-33788