Skip to Content

News...

  • PNNL:Research Highlights April 2008

      Code Improvements Enhance Analysis of Global Cloud Resolving Model Data

        Pacific Northwest National Laboratory (PNNL) researchers, in collaboration with the developers of the open-source NetCDF Operators (NCO), recently added capabilities for processing geodesic grid data and optimized performance to support efficient manipulation of data sets consisting of many files of tens to hundreds of gigabytes in size. The improved and optimized tools enhance the ability to analyze data generated by the Global Cloud Resolving Model (GCRM).

Community Access to Global Cloud Resolving Model and Data

Tools to subset and visualize the petabyte data set sizes that will be produced by the Global Cloud Resolving Model

The vision of the Global Cloud Resolving Model (GCRM), led by Professor David Randall is to develop a model capable of simulating climate at a resolution of 2 km over the entire globe at a speed that allows at least a 1:1 ratio of simulated time to wall clock time. The GCRM, based on Geodesic grids, is computationally challenging to the degree that prohibits running the model repeatedly to save just the output required for a particular analysis. Therefore, it is necessary to store model results throughout the computation and provide tools to flexibly extract subsets of the data required for a wide range of analyses. Model outputs are expected to be on the order of 1 terabyte per hourly snapshot or 8.6 petabytes per year of continuous simulation time.

The high spatial and temporal resolution of the data will result in volumes of data that present significant new challenges to all aspects of generating, managing and accessing data for model validation, analysis and visualization, and more generally to data dissemination. In particular, the current process for moving output files to local storage where data can be extracted and subsequently analyzed by primarily serial tools, breaks down. An alternative approach, where high performance file systems are combined with sophisticated parallel data extraction and analysis tools co-located with the data, is necessary. Additionally, a user environment that provides flexible mechanisms to browse data and metadata, request extracted subsets, or request derived data products that can be moved to local storage and analyzed will increase the value of these new models to the scientific community ultimately enabling the highest rate of scientific understanding. As illustrated in the figure, paradigm changing models such as a GCRM require coupled compute, storage, and analysis resources. The software services that provide data access to the broad community are a vital link in the flow of information.

architecture

This project is a Scidac Scientific Application Partnership (SAP) dedicated to providing efficient, flexible access to logical subsets of this data as well as selected derived products and visualizations, thus enabling a range of analyses by the broader climate research community. The proposed work focuses on four main tasks: parallel tools to efficiently access and extract subsets of data performing averaging and binning if requested, visualizations services, user services that enable researches to browse, search and make specific data requests, and development of the necessary metadata models and services.