NetCDF IO

ClimateBase.jl has support for "file.nc" ⇆ ClimArray. Usually this is done using NCDatasets.jl, but see below for a function that translates a loaded xarray (from Python) into ClimArray.

Read

To load a ClimArray directly from an .nc file do:

ClimateBase.ncread — Function

ncread(file, var [, selection]; kwargs...) → A

Load the variable var from the file and convert it into a ClimArray with proper dimension mapping and also containing the variable attributes as a dictionary. Dimension attributes are also given to the dimensions of A, if any exist. See keywords below for specifications for unstructured grids.

file can be a string to a .nc file. Or, it can be an NCDataset, which allows you to lazily combine different .nc data (typically split by time), e.g.

using Glob # for getting all files
alldata = glob("toa_fluxes_*.nc")
file = NCDataset(alldata; aggdim = "time")
A = ClimArray(file, "tow_sw_all")

var is a String denoting which variable to load. For .nc data containing groups var can also be a tuple ("group_name", "var_name") that loads a specific variable from a specific group. In this case, the attributes of both the group and the CF-variable are attributed to the created ClimArray.

Optionally you can provide a selection for selecting a smaller part of the full array. The selection must be a tuple of indices that compose the selection and you must specify exactly as many ranges as the dimensions of the array and in the correct order. For example, if var corresponds to an array with three dimensions, such syntaxes are possible:

(:, :, 1:3)
(1:5:100, 1:1, [1,5,6])

The function ncsize can be useful for selection.

See also ncdetails, nckeys and ncwrite.

Smart loading

The following things make loading data with ncread smarter than directly trying to use NCDatasets.jl and then convert to some kind of dimensional container.

Data are directly transformed into ClimArray, conserving metadata and dimension names.
If there are no missing values in the data (according to CF standards), the returned array is automatically converted to a concrete type (i.e. Union{Float32, Missing} becomes Float32).
Dimensions that are ranges (i.e. sampled with constant step size) are automatically transformed to a standard Julia Range type (which makes sub-selecting faster).
Automatically deducing whether the spatial information is in an orthogonal grid or not, and creating a single Coord dimension if not.

Keywords

name optionally rename loaded array.
grid = nothing optionally specify whether the underlying grid is grid = OrthogonalSpace() or grid = CoordinateSpace(), see Types of spatial information. If nothing, we try to deduce automatically based on the names of dimensions and other keys of the NCDataset.
lon, lat. These two keywords are useful in unstructured grid data where the grid information is provided in a separate .nc file. What we need is the user to provide vectors of the central longitude and central latitude of each grid point. This is done e.g. by
```
ds = NCDataset("path/to/grid.nc");
lon = Array(ds["clon"]);
lat = Array(ds["clat"]);
```
If lon, lat are given, grid is automatically assumed CoordinateSpace().

source

Notice that (at the moment) we use a pre-defined mapping of common names to proper dimensions - please feel free to extend the following via a Pull Request:

ClimateBase.COMMONNAMES

Dict{String, UnionAll} with 16 entries:
  "lat"       => Lat
  "altitude"  => Hei
  "time"      => Ti
  "pressure"  => Pre
  "xc"        => Lon
  "x"         => Lon
  "lon"       => Lon
  "age"       => Ti
  "level"     => Pre
  "latitude"  => Lat
  "height"    => Hei
  "long"      => Lon
  "longitude" => Lon
  "t"         => Ti
  "yc"        => Lat
  "y"         => Lat

Also, the following convenience functions are provided for examining the content of on-disk .nc files without loading all data on memory.

ClimateBase.nckeys — Function

nckeys(file::String)

Return all keys of the .nc file in file.

source

ClimateBase.ncdetails — Function

ncdetails(file, io = stdout)

Print details about the .nc file in file on io.

source

ClimateBase.ncsize — Function

ncsize(file, var)

Return the size of the variable of the .nc file without actually loading any data.

source

Missing docstring.

Missing docstring for globalattr. Check Documenter's build log for details.

Write

You can also write a bunch of ClimArrays directly into an .nc file with

ClimateBase.ncwrite — Function

ncwrite(file::String, Xs; globalattr = Dict())

Write the given ClimArray instances (any iterable of ClimArrays or a single ClimArray) to a .nc file following CF standard conventions using NCDatasets.jl. Optionally specify global attributes for the .nc file.

The metadata of the arrays in Xs, as well as their dimensions, are properly written in the .nc file and any necessary type convertions happen automatically.

WARNING: We assume that any dimensions shared between the Xs are identical.

xarray

You can use the following functions (which are not defined and exported in ClimateBase to avoid dependency on PyCall.jl) to load data using Python's xarray.

using ClimateBase, Dates
# This needs to numpy, xarray and dask installed from Conda
using PyCall
xr = pyimport("xarray")
np = pyimport("numpy")

function climarray_from_xarray(xa, fieldname, name = fieldname)
    w = getproperty(xa, Symbol(fieldname))
    raw_data = Array(w.values)
    dnames = collect(w.dims) # dimensions in string name
    dim_values, dim_attrs = extract_dimension_values_xarray(xa, dnames)
    @assert collect(size(raw_data)) == length.(dim_values)
    actual_dims = create_dims_xarray(dnames, dim_values, dim_attrs)
    ca = ClimArray(raw_data, actual_dims, name; attrib = w.attrs)
end

function extract_dimension_values_xarray(xa, dnames = collect(xa.dims))
    dim_values = []
    dim_attrs = Vector{Any}(fill(nothing, length(dnames)))
    for (i, d) in enumerate(dnames)
        dim_attrs[i] = getproperty(xa, d).attrs
        x = getproperty(xa, d).values
        if d ≠ "time"
            push!(dim_values, x)
        else
            # Dates need special handling to be transformed into `DateTime`.
            dates = [np.datetime_as_string(y)[1:19] for y in x]
            dates = DateTime.(dates)
            push!(dim_values, dates)
        end
    end
    return dim_values, dim_attrs
end

function create_dims_xarray(dnames, dim_values, dim_attrs)
    true_dims = ClimateBase.to_proper_dimensions(dnames)
    optimal_values = ClimateBase.vector2range.(dim_values)
    out = []
    for i in 1:length(true_dims)
        push!(out, true_dims[i](optimal_values[i]; metadata = dim_attrs[i]))
    end
    return (out...,)
end

# Load some data
xa = xr.open_mfdataset(ERA5_files_path)
X = climarray_from_xarray(xa, "w", "optional name")

Ensemble types

A dedicated type representing ensembles has no reason to exist in ClimateBase.jl. As the package takes advantage of standard Julia datastructures and syntax, those can be used to represent "ensembles". For example to do an "ensemble global mean" you can just do:

# load all data
E = [ClimArray("ensemble_$i.nc", "x") for i in 1:10]
# mean from all data
global_mean = mean(spacemean(X) for X in E)

where you see that the "ensemble" was represented just as a Vector{ClimArray}. Of course, this requires that all data can fit into memory, but this is so far the only way ClimateBase.jl operates anyways.