Data and Subsetting
ClimateTools uses YAXArrays and DimensionalData selectors for reading and manipulating gridded climate datasets.
Opening Datasets
The usual entry point is open_dataset, followed by Cube when you want a single variable.
using YAXArrays
cube = Cube(open_dataset("data.nc"))When a file contains several variables, keep the dataset object and select explicitly.
ds = open_dataset("data.nc")
tasmax = Cube(ds[:tasmax])
pr = Cube(ds[:pr])This is often clearer in climate-scenario workflows where the observational and simulation files contain several candidate variables.
Inspecting Axes
Before computing on a cube, inspect the axes and dimension names.
axes(tasmax)
size(tasmax)ClimateTools functions usually expect a time dimension and one or more spatial dimensions. Common names are:
timeorTilongitude/latitudelon/latrlon/rlatfor rotated grids
Many workflows fail not because of the data values themselves, but because time or spatial coordinates are missing or named unexpectedly.
Selecting Time Ranges
Use DimensionalData selectors directly.
using Dates
using DimensionalData
hist = tasmax[time=DateTime(1980, 1, 1)..DateTime(2009, 12, 31)]If the cube uses a different time dimension name, inspect axes(cube) and adapt the selector.
Selecting Spatial Ranges
For regular lon-lat grids, you can subset the same way.
regional = tasmax[
longitude=-80.0..-60.0,
latitude=45.0..55.0,
]This is usually a good first step before regridding or bias correction when you only need a study region.
Coordinate Conventions
Climate datasets often mix longitude conventions:
-180 .. 1800 .. 360
ClimateTools spatial utilities handle both conventions in many cases, but you should still verify that:
- the polygon or bounding box is expressed in the same convention as the grid
- the observational reference and model grid are geographically compatible before regridding
Polygon Subsetting with spatialsubset
Use spatialsubset to crop and mask a cube with a polygon in lon-lat coordinates.
The polygon can be provided as 2xN or Nx2, with NaN separating polygon segments.
using ClimateTools
using YAXArrays
cube = Cube(open_dataset("data.nc"))
poly = [NaN -80.0 -72.0 -72.0 -80.0 -80.0;
NaN 45.0 45.0 50.0 50.0 45.0]
sub = spatialsubset(cube, poly)With shapefiles:
poly = ClimateTools.extractpoly("region.shp", n=1)
sub = spatialsubset(cube, poly)Notes:
spatialsubsetreturns a lazily-evaluatedYAXArraywith a bounding-box crop and polygon mask applied.- The input cube is not fully loaded into memory during the subsetting step.
- If the polygon does not overlap the grid, the function raises an error.
Missing Data and Masks
ClimateTools functions usually propagate missing or NaN values rather than guessing replacements. Before bias correction or index calculation, inspect whether the domain has:
- permanent land-sea masks
- missing months or days
- irregular missingness in the calibration period
Large masked areas can affect regridding and bias-correction robustness.
Rotated and Curvilinear Grids
If your simulation uses rlon and rlat dimensions or has a grid_mapping attribute, prefer dataset-aware regridding workflows rather than manual slicing of a plain variable cube. See Interpolation and Regridding.
Role in Scenario Workflows
Dataset preparation is the first stage of climate-scenario construction. A good rule is:
- open the observation and model datasets
- inspect the dimensions and calendars
- subset the study period and region
- only then regrid and bias-correct
See Building Climate Scenarios for the full workflow.