NetCDF IO
ClimateBase.jl has support for "file.nc" ⇆ ClimArray
. Usually this is done using NCDatasets.jl, but see below for a function that translates a loaded xarray
(from Python) into ClimArray
.
Read
To load a ClimArray
directly from an .nc
file do:
ClimateBase.ncread
— Functionncread(file, var [, selection]; kwargs...) → A
Load the variable var
from the file
and convert it into a ClimArray
with proper dimension mapping and also containing the variable attributes as a dictionary. Dimension attributes are also given to the dimensions of A
, if any exist. See keywords below for specifications for unstructured grids.
file
can be a string to a .nc
file. Or, it can be an NCDataset
, which allows you to lazily combine different .nc
data (typically split by time), e.g.
using Glob # for getting all files
alldata = glob("toa_fluxes_*.nc")
file = NCDataset(alldata; aggdim = "time")
A = ClimArray(file, "tow_sw_all")
var
is a String
denoting which variable to load. For .nc
data containing groups var
can also be a tuple ("group_name", "var_name")
that loads a specific variable from a specific group. In this case, the attributes of both the group and the CF-variable are attributed to the created ClimArray
.
Optionally you can provide a selection
for selecting a smaller part of the full array. The selection
must be a tuple of indices that compose the selection and you must specify exactly as many ranges as the dimensions of the array and in the correct order. For example, if var
corresponds to an array with three dimensions, such syntaxes are possible:
(:, :, 1:3)
(1:5:100, 1:1, [1,5,6])
The function ncsize
can be useful for selection
.
See also ncdetails
, nckeys
and ncwrite
.
Smart loading
The following things make loading data with ncread
smarter than directly trying to use NCDatasets.jl and then convert to some kind of dimensional container.
- Data are directly transformed into
ClimArray
, conserving metadata and dimension names. - If there are no missing values in the data (according to CF standards), the returned array is automatically converted to a concrete type (i.e.
Union{Float32, Missing}
becomesFloat32
). - Dimensions that are ranges (i.e. sampled with constant step size) are automatically transformed to a standard Julia
Range
type (which makes sub-selecting faster). - Automatically deducing whether the spatial information is in an orthogonal grid or not, and creating a single
Coord
dimension if not.
Keywords
name
optionally rename loaded array.grid = nothing
optionally specify whether the underlying grid isgrid = OrthogonalSpace()
orgrid = CoordinateSpace()
, see Types of spatial information. Ifnothing
, we try to deduce automatically based on the names of dimensions and other keys of theNCDataset
.lon, lat
. These two keywords are useful in unstructured grid data where the grid information is provided in a separate .nc file. What we need is the user to provide vectors of the central longitude and central latitude of each grid point. This is done e.g. by
Ifds = NCDataset("path/to/grid.nc"); lon = Array(ds["clon"]); lat = Array(ds["clat"]);
lon, lat
are given,grid
is automatically assumedCoordinateSpace()
.
Notice that (at the moment) we use a pre-defined mapping of common names to proper dimensions - please feel free to extend the following via a Pull Request:
ClimateBase.COMMONNAMES
Dict{String, UnionAll} with 16 entries:
"lat" => Lat
"altitude" => Hei
"time" => Ti
"pressure" => Pre
"xc" => Lon
"x" => Lon
"lon" => Lon
"age" => Ti
"level" => Pre
"latitude" => Lat
"height" => Hei
"long" => Lon
"longitude" => Lon
"t" => Ti
"yc" => Lat
"y" => Lat
Also, the following convenience functions are provided for examining the content of on-disk .nc
files without loading all data on memory.
ClimateBase.nckeys
— Functionnckeys(file::String)
Return all keys of the .nc
file in file
.
ClimateBase.ncdetails
— Functionncdetails(file, io = stdout)
Print details about the .nc
file in file
on io
.
ClimateBase.ncsize
— Functionncsize(file, var)
Return the size of the variable of the .nc
file without actually loading any data.
Missing docstring for globalattr
. Check Documenter's build log for details.
Write
You can also write a bunch of ClimArray
s directly into an .nc
file with
ClimateBase.ncwrite
— Functionncwrite(file::String, Xs; globalattr = Dict())
Write the given ClimArray
instances (any iterable of ClimArray
s or a single ClimArray
) to a .nc
file following CF standard conventions using NCDatasets.jl. Optionally specify global attributes for the .nc
file.
The metadata of the arrays in Xs
, as well as their dimensions, are properly written in the .nc
file and any necessary type convertions happen automatically.
WARNING: We assume that any dimensions shared between the Xs
are identical.
See also ncread
.
xarray
You can use the following functions (which are not defined and exported in ClimateBase
to avoid dependency on PyCall.jl) to load data using Python's xarray
.
using ClimateBase, Dates
# This needs to numpy, xarray and dask installed from Conda
using PyCall
xr = pyimport("xarray")
np = pyimport("numpy")
function climarray_from_xarray(xa, fieldname, name = fieldname)
w = getproperty(xa, Symbol(fieldname))
raw_data = Array(w.values)
dnames = collect(w.dims) # dimensions in string name
dim_values, dim_attrs = extract_dimension_values_xarray(xa, dnames)
@assert collect(size(raw_data)) == length.(dim_values)
actual_dims = create_dims_xarray(dnames, dim_values, dim_attrs)
ca = ClimArray(raw_data, actual_dims, name; attrib = w.attrs)
end
function extract_dimension_values_xarray(xa, dnames = collect(xa.dims))
dim_values = []
dim_attrs = Vector{Any}(fill(nothing, length(dnames)))
for (i, d) in enumerate(dnames)
dim_attrs[i] = getproperty(xa, d).attrs
x = getproperty(xa, d).values
if d ≠ "time"
push!(dim_values, x)
else
# Dates need special handling to be transformed into `DateTime`.
dates = [np.datetime_as_string(y)[1:19] for y in x]
dates = DateTime.(dates)
push!(dim_values, dates)
end
end
return dim_values, dim_attrs
end
function create_dims_xarray(dnames, dim_values, dim_attrs)
true_dims = ClimateBase.to_proper_dimensions(dnames)
optimal_values = ClimateBase.vector2range.(dim_values)
out = []
for i in 1:length(true_dims)
push!(out, true_dims[i](optimal_values[i]; metadata = dim_attrs[i]))
end
return (out...,)
end
# Load some data
xa = xr.open_mfdataset(ERA5_files_path)
X = climarray_from_xarray(xa, "w", "optional name")
Ensemble types
A dedicated type representing ensembles has no reason to exist in ClimateBase.jl. As the package takes advantage of standard Julia datastructures and syntax, those can be used to represent "ensembles". For example to do an "ensemble global mean" you can just do:
# load all data
E = [ClimArray("ensemble_$i.nc", "x") for i in 1:10]
# mean from all data
global_mean = mean(spacemean(X) for X in E)
where you see that the "ensemble" was represented just as a Vector{ClimArray}
. Of course, this requires that all data can fit into memory, but this is so far the only way ClimateBase.jl operates anyways.