News...
- PNNL:Research
Highlights April 2008
Code Improvements Enhance Analysis of Global Cloud Resolving Model Data
Pacific Northwest National Laboratory (PNNL) researchers, in collaboration with the developers of the open-source NetCDF Operators (NCO), recently added capabilities for processing geodesic grid data and optimized performance to support efficient manipulation of data sets consisting of many files of tens to hundreds of gigabytes in size. The improved and optimized tools enhance the ability to analyze data generated by the Global Cloud Resolving Model (GCRM).
![]() |
Community Access to Global Cloud Resolving Model and DataTools to subset and visualize the petabyte data set sizes that will be produced by the Global Cloud Resolving Model |
The high spatial and temporal resolution of the data will result in
volumes of data that present significant new challenges to all aspects
of generating, managing and accessing data for model validation,
analysis and visualization, and more generally to data dissemination.
In particular, the current process for moving output files to local
storage where data can be extracted and subsequently analyzed by
primarily serial tools, breaks down. An alternative approach, where
high performance file systems are combined with sophisticated parallel
data extraction and analysis tools co-located with the data, is
necessary. Additionally, a user environment that provides flexible
mechanisms to browse data and metadata, request extracted subsets, or
request derived data products that can be moved to local storage and
analyzed will increase the value of these new models to the scientific
community ultimately enabling the highest rate of scientific
understanding. As illustrated in the figure, paradigm changing models
such as a GCRM require coupled compute, storage, and analysis
resources. The software services that provide data access to the broad
community are a vital link in the flow of information.
This project is a Scidac Scientific Application Partnership (SAP) dedicated to providing efficient, flexible access to logical subsets of this data as well as selected derived products and visualizations, thus enabling a range of analyses by the broader climate research community. The proposed work focuses on four main tasks: parallel tools to efficiently access and extract subsets of data performing averaging and binning if requested, visualizations services, user services that enable researches to browse, search and make specific data requests, and development of the necessary metadata models and services.


