Overview: Climate Data Processing

 

The first rule of data processing is to look at your data!

What data format is being used? What variables are available and what are the units? What type of grid: rectilinear, curvlinear or unstructured? What is the orientation of the grid: north-to-south or south-to-north? What longitude order? What temporal and vertical coordinate(s) are used? Are there missing values present? (If so, how are they identified?) Does the file contents adhere to any convention? Do not make the asssumption that the data are without problems.  Unfortunately, processing climate data can be rather intimidating due the large sizes of the data sets. Some climate model components (eg, ocean) can produce individual monthly mean files that exceed 17GB while some observationally based reanalysis datasets exceed several tera bytes . Since climate, by definition, is the statistics of one or more variables over a period of time, it often means that there are many files to be processed. Further, some climate projection scenarios may span 50-100 years into the future and there may be multiple model runs (ensembles) to assess model sensitivity. Assorted data formats and a variety of processing tools may be used by different groups to archive and process the data.

Experts contributing to this page: