Introduction

ClimateBase is a Julia package offering basic functionality for analyzing data that are typically in the form used by climate sciences. These data are dimensional & spatiotemporal but the corresponding dimensions all need special handling. For example the most common dimensions are longitude, latitude and time.

  • longitude is by definition a periodic dimension
  • latitude is a linear dimension. However because the coordinate system most often used in climate sciences is a grid of longitude × latitude (in equal degrees) the area element of space depends on latitude and this needs to be taken into account.
  • time is a linear dimension in principle, but its values are <: AbstractDateTime instead of <: Real. The human calendar (where these values come from) is periodic but each period may not correspond to the same physical time, and this also needs to be taken into account.

ClimateBase is structured to deal with these intricacies, and in addition offer several functionalities commonly used, and sought after, by climate scientists. It also serves as the base building block for ClimateTools, which offers more advanced functionalities.

The focus of ClimateBase is not loading data, nor operating on data on disk. It is designed for in-memory climate data exploration and manipulation. That being said, basic data loading functionality is offered in terms of NCDatasets, see below.

Installation

This package is registered and you can install it with

using Pkg; Pkg.add("ClimateBase")

Make sure your installed version coincides with the one in this docs (see bottom left corner of this page).

ClimArray: the core data structure

This project treats "climate data" as an instance of ClimArray. At the moment ClimArray is a subtype of DimensionalArray from DimensionalData.jl. A brief introduction to DimensionalData.jl is copied here from its docs, because basic handling of a ClimArray comes from DimensionalData.jl. DimensionalData.jl allows to dimensionally-index data by their values.

E.g. you can create an array with

using ClimateBase, Dates
Time = ClimateBase.Ti # `Time` is more intuitive than `Ti`
lats = -90:5:90
lons = 0:10:359
t = Date(2000, 3, 15):Month(1):Date(2020, 3, 15)
# Here we wrap all dimension data into proper dimensions:
dimensions = (Lon(lons), Lat(lats), Time(t))
# where `Lon, Lat, Time` are `Dimension`s exported by ClimateBase
# combining the array data with dimensions makes a `ClimArray`:
data = rand(36, 37, 241) # same size as `dimensions`
A = ClimArray(data, dimensions)
ClimArray with dimensions:
 Longitude (type Lon): 0:10:350 (Sampled: Ordered Regular Points)
 Latitude (type Lat): -90:5:90 (Sampled: Ordered Regular Points)
 Time (type Ti): Dates.Date("2000-03-15"):Dates.Month(1):Dates.Date("2020-03-15") (Sampled: Ordered Regular Points)
and data: 36×37×241 Array{Float64, 3}
[:, :, 1]
 0.372289   0.761795   0.64888   …  0.563392   0.17818    0.562557
 0.0210326  0.275287   0.89695      0.88629    0.720891   0.543605
 0.2401     0.367919   0.922973     0.314936   0.18386    0.208239
 0.632349   0.742304   0.431288     0.693043   0.702006   0.900397
 ⋮                               ⋱             ⋮          
 0.530805   0.39691    0.836993     0.837572   0.938766   0.198736
 0.65629    0.0297031  0.429784     0.149964   0.0134807  0.341748
 0.647691   0.489742   0.278577     0.910197   0.106604   0.487852
 0.587226   0.982668   0.737898  …  0.0920825  0.734158   0.803751
[and 240 more slices...]

You can then select a specific time slice at Date(2011,5,15) and a longitude interval between 0 and 30 degrees like so:

B = A[Lon(Between(0, 30)), Time(At(Date(2011,5,15)))]
ClimArray with dimensions:
 Longitude (type Lon): 0:10:30 (Sampled: Ordered Regular Points)
 Latitude (type Lat): -90:5:90 (Sampled: Ordered Regular Points)
and data: 4×37 Matrix{Float64}
 0.889908   0.948054   0.970697  0.909238  …  0.923429   0.907899  0.291852
 0.769786   0.634448   0.869712  0.686247     0.4899     0.318876  0.165462
 0.983074   0.76535    0.610077  0.613931     0.279377   0.7599    0.912461
 0.0127156  0.0953284  0.796113  0.58147      0.0823305  0.278491  0.523512

With ClimArray you can use convenience, physically-inspired functions that do automatic (and correct) weighting. For example the latitudinal mean of B is simply

C = latmean(B)
ClimArray with dimensions:
 Longitude (type Lon): 0:10:30 (Sampled: Ordered Regular Points)
and data: 4-element Vector{Float64}
[0.585146, 0.480966, 0.479628, 0.467821]

where in this averaging process each data point is weighted by the cosine of its latitude.

Making a ClimArray

You can create a ClimArray yourself, or you can load data from an .nc file with CF-conventions, see NetCDF IO.

ClimateBase.ClimArrayMethod
ClimArray(A::Array, dims::Tuple; name = "", attrib = nothing)

ClimArray is a structure that contains numerical array data bundled with dimensional information, a name and an attrib field (typically a dictionary) that holds general attributes. You can think of ClimArray as a in-memory representation of a CFVariable.

At the moment, a ClimArray is using DimensionalArray from DimensionalData.jl, and all basic handling of ClimArray is offered by DimensionalData (see below).

ClimArray is created by passing in standard array data A and a tuple of dimensions dims.

Example

using ClimateBase, Dates
Time = ClimateBase.Ti # more intuitive name for time dimension
lats = -90:5:90
lons = 0:10:359
t = Date(2000, 3, 15):Month(1):Date(2020, 3, 15)
# dimensional information:
dimensions = (Lon(lons), Lat(lats), Time(t))
data = rand(36, 37, 241) # numeric data
A = ClimArray(data, dimensions)
source

It is strongly recommended to use the dimensions we export (because we dispatch on them and use their information):

for D in ClimateBase.STANDARD_DIMS
    println(D, " (full name = $(DimensionalData.name(D)))")
end
Lon (full name = Longitude)
Lat (full name = Latitude)
Ti (full name = Time)
Hei (full name = Height)
Pre (full name = Pressure)
Coord (full name = Spatial Coordinates)

We explicitly assume that Lon, Lat are measured in degrees and not radians or meters (extremely important for spatial averaging processes).

NetCDF IO

ClimateBase.jl has support for file.nc ⇆ ClimArray. Usually this is done using NCDatasets.jl, but see below for a function that translates a loaded xarray (from Python) into ClimArray.

Read

To load a ClimArray directly from an .nc file do:

ClimateBase.ClimArrayMethod
ClimArray(file::Union{String,NCDataset}, var::String, name = var) -> A

Load the variable var from the file and convert it into a ClimArray which also contains the variable attributes as a dictionary. Dimension attributes are also given to the dimensions of A, if any exist.

Notice that file can be an NCDataset, which allows you to lazily combine different .nc data (typically split by time), e.g.

alldata = ["toa_fluxes_2020_$(i).nc" for i in 1:12]
file = NCDataset(alldata; aggdim = "time")
A = ClimArray(file, "tow_sw_all")

(but you can also directly give the string to a single file "file.nc" in ClimArray if data are contained in a single file for single files).

We do two performance improvements while loading the data:

  1. If there are no missing values in the data (according to CF standards), the returned array is automatically converted to a concrete type (i.e. Union{Float32, Missing} becomes Float32).
  2. Dimensions that are ranges (i.e. sampled with constant step size) are automatically transformed to a standard Julia Range type (which makes sub-selecting faster).
source

Notice that (at the moment) we use a pre-defined mapping of common names to proper dimensions - please feel free to extend the following via a Pull Request:

ClimateBase.COMMONNAMES
Dict{String, UnionAll} with 15 entries:
  "lat"       => Lat
  "altitude"  => Hei
  "time"      => Ti
  "pressure"  => Pre
  "xc"        => Lon
  "x"         => Lon
  "lon"       => Lon
  "level"     => Pre
  "latitude"  => Lat
  "height"    => Hei
  "long"      => Lon
  "longitude" => Lon
  "t"         => Ti
  "yc"        => Lat
  "y"         => Lat

Also, the following convenience functions are provided for examining the content of on-disk .nc files without loading all data on memory.

Write

You can also write a bunch of ClimArrays directly into an .nc file with

ClimateBase.climarrays_to_ncFunction
climarrays_to_nc(file::String, Xs; globalattr = Dict())

Write the given ClimArray instances (any iterable of ClimArrays or a single ClimArray) to a .nc file following CF standard conventions using NCDatasets.jl. Optionally specify global attributes for the .nc file.

The metadata of the arrays in Xs, as well as their dimensions, are properly written in the .nc file and any necessary type convertions happen automatically.

WARNING: We assume that any dimensions shared between the Xs are identical.

source

xarray

You can use the following functions (which are not defined and exported in ClimateBase to avoid dependency on PyCall.jl) to load data using Python's xarray.

using ClimateBase, Dates
# This needs to numpy, xarray and dask installed from Conda
using PyCall
xr = pyimport("xarray")
np = pyimport("numpy")

function climarray_from_xarray(xa, fieldname, name = fieldname)
    w = getproperty(xa, Symbol(fieldname))
    raw_data = Array(w.values)
    dnames = collect(w.dims) # dimensions in string name
    dim_values, dim_attrs = extract_dimension_values_xarray(xa, dnames)
    @assert collect(size(raw_data)) == length.(dim_values)
    actual_dims = create_dims_xarray(dnames, dim_values, dim_attrs)
    ca = ClimArray(raw_data, actual_dims, name; attrib = w.attrs)
end

function extract_dimension_values_xarray(xa, dnames = collect(xa.dims))
    dim_values = []
    dim_attrs = Vector{Any}(fill(nothing, length(dnames)))
    for (i, d) in enumerate(dnames)
        dim_attrs[i] = getproperty(xa, d).attrs
        x = getproperty(xa, d).values
        if d ≠ "time"
            push!(dim_values, x)
        else
            dates = [np.datetime_as_string(y)[1:19] for y in x]
            dates = DateTime.(dates)
            push!(dim_values, dates)
        end
    end
    return dim_values, dim_attrs
end

function create_dims_xarray(dnames, dim_values, dim_attrs)
    true_dims = ClimateBase.to_proper_dimensions(dnames)
    optimal_values = ClimateBase.vector2range.(dim_values)
    out = []
    for i in 1:length(true_dims)
        push!(out, true_dims[i](optimal_values[i]; metadata = dim_attrs[i]))
    end
    return (out...,)
end

# Load some data
xa = xr.open_mfdataset(ERA5_files_path)
X = climarray_from_xarray(xa, "w", "optional name")

Temporal

Functions related with the Time dimension.

ClimateBase.timeaggFunction
timeagg(f, A::ClimArray, W = nothing)

Perform a proper temporal aggregation of the function f (e.g. mean, std) on A where:

  • Only full year spans of A are included, see maxyearspan (because most processes are affected by yearly cycle, and averaging over an uneven number of cycles typically results in artifacts)
  • Each month in A is weighted with its length in days (for monthly sampled data)

If you don't want these features, just do dropagg(f, A, Time, W).

W are possible statistical weights that are used in conjuction to the temporal weighting, to weight each time point differently. If they are not a vector (a weight for each time point), then they have to be a dimensional array of same dimensional layout as A (a weight for each data point).

See also monthlyagg, yearlyagg, seasonalyagg.

timeagg(f, t::Vector, x::Vector, w = nothing)

Same as above, but for arbitrary vector x accompanied by time vector t.

source
ClimateBase.monthlyaggFunction
monthlyagg(A::ClimArray, f = mean; mday = 15) -> B

Create a new array where the temporal information has been aggregated into months using the function f. The dates of the new array always have day number of mday.

source
ClimateBase.yearlyaggFunction
yearlyagg(A::ClimArray, f = mean) -> B

Create a new array where the temporal information has been aggregated into years using the function f. By convention, the dates of the new array always have month and day number of 1.

source
ClimateBase.seasonalyaggFunction
seasonalyagg(A::ClimArray, f = mean) -> B

Create a new array where the temporal information has been aggregated into seasons using the function f. By convention, seasons are represented as Dates spaced 3-months apart, where only the months December, March, June and September are used to specify the date, with day 1.

source
ClimateBase.temporalrangeFunction
temporalrange(A::ClimArray, f = Dates.month) → r
temporalrange(t::AbstractVector{<:TimeType}}, f = Dates.month) → r

Return a vector of ranges so that each range of indices are values of t that belong in either the same month, year, day, or season, depending on f. f can take the values: Dates.year, Dates.month, Dates.day or season (all are functions).

Used in e.g. monthlyagg, yearlyagg or seasonalyagg.

source
ClimateBase.maxyearspanFunction
maxyearspan(A::ClimArray) = maxyearspan(dims(A, Time))
maxyearspan(t::Vector{<:DateTime}) → i

Find the maximum index i of t so that t[1:i] covers exact(*) multiples of years.

(*) For monthly spaced data i is a multiple of 12 while for daily data we find the largest possible multiple of DAYS_IN_ORBIT.

source
ClimateBase.temporal_samplingFunction
temporal_sampling(x) → symbol

Return the temporal sampling type of x, which is either an array of Dates or a dimensional array (with Time dimension).

Possible return values are:

  • :hourly, where the temporal difference between entries is exactly 1 hour.
  • :daily, where the temporal difference between dates is exactly 1 day.
  • :monthly, where all dates have the same day, but different month.
  • :yearly, where all dates have the same month+day, but different year.
  • :other, which means that x doesn't fall to any of the above categories.
source
ClimateBase.time_in_daysFunction
time_in_days(t::AbstractArray{<:TimeType}, T = Float32)

Convert a given date time array into measurement units of days: a real-valued array which counts time in days, always increasing (cumulative).

source

Spatial

Spatial aggregation

All functions in this section work for both types of space, see Types of spatial coordinates.

ClimateBase.latmeanFunction
latmean(A::ClimArray)

Return the latitude-mean A (mean across dimension Lat). This function properly weights by the cosine of the latitude.

source
ClimateBase.spacemeanFunction
spacemean(A::ClimArray [, W]) = spaceagg(mean, A, W)

Average given A over its spatial coordinates. Optionally provide statistical weights in W.

source
ClimateBase.spaceaggFunction
spaceagg(f, A::ClimArray, W = nothing)

Aggregate A using function f (e.g. mean, std) over all available space (i.e. longitude and latitude) of A, weighting every part of A by its spatial area.

W can be extra weights, to weight each spatial point with. W can either be just a ClimArray with same space as A, or of exactly same shape as A.

source
ClimateBase.hemispheric_meansFunction
hemispheric_means(A) → nh, sh

Return the (proper) averages of A over the northern and southern hemispheres. Notice that this function explicitly does both zonal as well as meridional averaging. Use hemispheric_functions to just split A into two hemispheres.

source
ClimateBase.hemispheric_functionsFunction
hemispheric_functions(A::ClimArray) → north, south

Return two arrays north, south, by splitting A to its northern and southern hemispheres, appropriately translating the latitudes of south so that both arrays have the same latitudinal dimension (and thus can be compared and do opperations between them).

source
ClimateBase.lonlatfirstFunction
lonlatfirst(A::ClimArray, args...) → B

Permute the dimensions of A to make a new array B that has first dimension longitude, second dimension latitude, with the remaining dimensions of A following (useful for most plotting functions). Optional extra dimensions can be given as args..., specifying a specific order for the remaining dimensions.

Example:

B = lonlatfirst(A)
C = lonlatfirst(A, Time)
source

Types of spatial coordinates

Most of the time the spatial information of your data is in the form of a Longitude × Latitude grid. This is simply achieved via the existence of two dimensions (Lon, Lat) in your dimensional data array. This type of space is called LonLatGrid. It is assumed throughout that longitude and latitude are measured in degrees. Height, although representing physical space as well, is not considered part of the "spatial dimensions", and is treated as any other additional dimension.

Another type of spatial coordinates is supported, and that is of equal-area. Currently only a single type, GaussianEqualArea, exists for this purpose, which represents coordinates in a Gaussian grid as shown here: https://en.wikipedia.org/wiki/Gaussian_grid. In GaussianEqualArea the spatial information is instead given by single dimension whose elements are coordinate locations, i.e. 2-element SVector(longitude, latitude). The dimension name is Coord. Each point in this dimension corresponds to a polygon (for GaussianEqualArea a trapezoid) that covers equal amount of spatial area as any other point. The actual limits of each polygon are not included in the dimension for performance reasons.

ClimateBase.jl works with either type of spatial coordinate system. Therefore, physically inspired averaging functions, like spacemean or zonalmean, work for both types of spatial coordinates. In addition, the function spatialidxs returns an iterator over the spatial coordinates of the data, and works for both types (grid or equal-area):

ClimateBase.spatialidxsFunction
spatialidxs(A::ClimArray) → idxs

Return an iterable that can be used to access all spatial points of A with the syntax

idxs = spatialidxs(A)
for i in idxs
    slice_at_give_space_point = A[i...]
end

Works for all types of space (... is necessary because i is a Tuple).

source

Equal area creation

Warn

Equal area functionality is currently in an experimental phase! You can try using the function ClimArray_eqarea to load a ClimArray with Gaussian grid directly from a .nc file. This function assumes that this grid was created with CDO, using e.g. cdo remapbil,gea250 IN.nc OUT.nc.

To manually make a Gaussian grid ClimArray, try the following approach:

file = NCDataset("some_file_with_eqarea.nc")
# the following lines make the coordinates of the equal area, which depends on
# how your .nc file is structured
lons = Array(file["lon"])
lats = Array(file["lat"])
coords = [SVector(lo, la) for (lo, la) in zip(lons, lats)]
# Sort points by their latitude (important!)
si = sortperm(coords, by = reverse)
# Load some remaining dimensions and create the proper `Dimension` tuple:
t = Array(file["time"])
dimensions = (Coord(coords), Time(t))
# Finally load the array data and make a ClimArray
data = Array(file["actual_data_like_radiation"])
data = data[si, :] # permute like the coordinate
A = ClimArray(data, dimensions)

General aggregation

The physical averages of the previous section are done by taking advantage of a general aggregation syntax, which works with any aggregating function like mean, sum, std, etc.

ClimateBase.dropaggFunction
dropagg(f, A, dims)

Apply statistics/aggregating function f (e.g. sum or mean) on array A across dimension(s) dims and drop the corresponding dimension(s) from the result (Julia inherently keeps singleton dimensions).

If A is one dimensional, dropagg will return the single number of applying f(A).

source
ClimateBase.collapseFunction
collapse(f, A, dim)

Reduce A towards dimension dim using the collapsing function f (e.g. mean). This means that f is applied across all other dimensions of A, each of which are subsequently dropped, leaving only the collapsed result of A vs. the remaining dimension.

source

Timeseries Analysis

ClimateBase.sinusoidal_continuationFunction
sinusoidal_continuation(T::ClimArray, fs = [1, 2]; Tmin = -Inf, Tmax = Inf)

Fill-in the missing values of spatiotemporal field T, by fitting sinusoidals to the non-missing values, and then using the fitted sinusoidals for the missing values.

Frequencies are given per year (frequency 2 means 1/2 of a year).

Tmin, Tmax limits are used to clamp the result into this range (no clamping in the default case).

source
ClimateBase.seasonal_decompositionFunction
seasonal_decomposition(A::ClimArray, fs = [1, 2]) → seasonal, residual

Decompose A into a seasonal and residual components, where the seasonal contains the periodic parts of A, with frequencies given in fs, and residual contains what's left.

fs is measured in 1/year. This function works even for non-equispaced time axis (e.g. monthly averages) and uses LPVSpectral.jl and SignalDecomposition.jl.

source

Climate quantities

Functions that calculate climate-related quantities.

ClimateBase.insolationFunction
insolation(t, ϕ; kwargs...)

Calculate daily averaged insolation in W/m² at given time and latitude t, φ. φ is given in degrees, and t in days (real number or date).

Keywords:

Ya = DAYS_IN_ORBIT # = 365.26 # days
t_VE = 76.0 # days of vernal equinox
S_0 = 1362.0 # W/m^2
γ=23.44
ϖ=282.95
e=0.017 # eccentricity
source
ClimateBase.surface_atmosphere_contributionsFunction
surface_atmosphere_contributions(I, F_toa_⬆, F_s_⬆, F_s_⬇) → α_ATM, α_SFC

Calculate the atmospheric and surface contributions of the planetary albedo, so that the TOA albedo is α = α_ATM + α_SFC, using the simple 1-layer radiative transfer model by Donohoe & Battisti (2011) or G. Stephens (2015). Stephens' formulas are incorrect and I have corrected them!

source
ClimateBase.total_toa_albedoFunction
total_toa_albedo(a, s, t) = a + s*t^2/(1-a*s)

Combine given atmosphere albedo a, surface albedo s and atmosphere transmittance t into a total top-of-the-atmosphere albedo α according to the model of Donohoe & Battisti (2011).

source

Plotting

Currently ClimateBase.jl does not have integrated plotting support. In the near future it will have this based on the upcoming GeoMakie.jl.

For now, you can use PyCall.jl, matplotlib, and the Python library cartopy. In the file ClimateBase/plotting/python.jl we provide two functions that plot maps of ClimArray in arbitrary projections: earthsurface for LonLatGrid and earthscatter for GaussianEqualArea. You can incorporate these in your source code as a temporary solution.

Ensemble types

A dedicated type representing ensembles has no reason to exist in ClimateBase.jl. As the package takes advantage of standard Julia datastructures and syntax, those can be used to represent "ensembles". For example to do an "ensemble global mean" you can just do:

E = [ClimArray("ensemble_$i.nc", "x") for i in 1:10]
global_mean = mean(spacemean(X) for X in E)

where you see that the "ensemble" was represented just as a Vector{ClimArray}.

Crash-course to DimensionalData.jl

DimensionalDataModule

DimensionalData

CI Codecov Aqua.jl Quality Assurance

DimensionalData.jl provides tools and abstractions for working with datasets that have named dimensions. It's a pluggable, generalised version of AxisArrays.jl with a cleaner syntax, and additional functionality found in NamedDims.jl. It has similar goals to pythons xarray, and is primarily written for use with spatial data in GeoData.jl.

Dimensions

Dimensions are just wrapper types. They store the dimension index and define details about the grid and other metadata, and are also used to index into the array, wrapping a value or a Selector. X, Y, Z and Ti are the exported defaults.

A generalised Dim type is available to use arbitrary symbols to name dimensions. Custom dimensions can be defined using the @dim macro.

We can use dim wrappers for indexing, so that the dimension order in the underlying array does not need to be known:

julia> using DimensionalData

julia> A = DimArray(rand(40, 50), (X, Y));

julia> A[Y(1), X(1:10)]
DimArray with dimensions:
 X: 1:10 (NoIndex)
and referenced dimensions:
 Y: 1 (NoIndex)
and data: 10-element Array{Float64,1}
[0.515774, 0.575247, 0.429075, 0.234041, 0.4484, 0.302562, 0.911098, 0.541537, 0.267234, 0.370663]

And this has no runtime cost:

julia> A = DimArray(rand(40, 50), (X, Y));

julia> @btime $A[X(1), Y(2)]
  2.092 ns (0 allocations: 0 bytes)
0.27317596504655417

julia> @btime parent($A)[1, 2]
  2.092 ns (0 allocations: 0 bytes)
0.27317596504655417

Dims can be used for indexing and views without knowing dimension order:

julia> A[X(10)]
DimArray with dimensions:
 Y (type Y): Base.OneTo(50) (NoIndex)
and referenced dimensions:
 X (type X): 10 (NoIndex)
and data: 50-element Array{Float64,1}
[0.0850249, 0.313408, 0.0762157, 0.549103, 0.297763, 0.309075, 0.854535, 0.659537, 0.392969, 0.89998  …  0.63791, 0.875881, 0.437688, 0.925918, 0.291636, 0.358024, 0.692283, 0.606932, 0.629122, 0.284592]

julia> view(A, Y(30:40), X(1:5))
DimArray with dimensions:
 X (type X): 1:5 (NoIndex)
 Y (type Y): 30:40 (NoIndex)
and data: 5×11 view(::Array{Float64,2}, 1:5, 30:40) with eltype Float64
 0.508793   0.721117  0.558849  …  0.505518   0.532322
 0.869126   0.754219  0.328315     0.0148934  0.778308
 0.0596468  0.458492  0.250458     0.980508   0.524938
 0.446838   0.659638  0.632399     0.33478    0.549402
 0.292962   0.995038  0.26026      0.526124   0.589176

And for indicating dimensions to reduce or permute in julia Base and Statistics functions that have dims arguments:

julia> using Statistics

julia> A = DimArray(rand(3, 4, 5), (X, Y, Ti));

julia> mean(A, dims=Ti)
DimArray with dimensions:
 X (type X): Base.OneTo(3) (NoIndex)
 Y (type Y): Base.OneTo(4) (NoIndex)
 Time (type Ti): 1 (NoIndex)
and data: 3×4×1 Array{Float64,3}
[:, :, 1]
 0.495295  0.650432  0.787521  0.502066
 0.576573  0.568132  0.770812  0.504983
 0.39432   0.5919    0.498638  0.337065
[and 0 more slices...]

julia> permutedims(A, [Ti, Y, X])
DimArray with dimensions:
 Time (type Ti): Base.OneTo(5) (NoIndex)
 Y (type Y): Base.OneTo(4) (NoIndex)
 X (type X): Base.OneTo(3) (NoIndex)
and data: 5×4×3 Array{Float64,3}
[:, :, 1]
 0.401374  0.469474  0.999326  0.265688
 0.439387  0.57274   0.493883  0.88678
 0.425845  0.617372  0.998552  0.650999
 0.852777  0.954702  0.928367  0.0045136
 0.357095  0.637873  0.517476  0.702351
[and 2 more slices...]

You can also use symbols to create Dim{X} dimensions:

julia> A = DimArray(rand(10, 20, 30), (:a, :b, :c));

julia> A[a=2:5, c=9]

DimArray with dimensions:
 Dim{:a}: 2:5 (NoIndex)
 Dim{:b}: Base.OneTo(20) (NoIndex)
and referenced dimensions:
 Dim{:c}: 9 (NoIndex)
and data: 4×20 Array{Float64,2}
 0.868237   0.528297   0.32389   …  0.89322   0.6776    0.604891
 0.635544   0.0526766  0.965727     0.50829   0.661853  0.410173
 0.732377   0.990363   0.728461     0.610426  0.283663  0.00224321
 0.0849853  0.554705   0.594263     0.217618  0.198165  0.661853

Selectors

Selectors find indices in the dimension based on values At, Near, or Between the index value(s). They can be used in getindex, setindex! and view to select indices matching the passed in value(s)

  • At(x): get indices exactly matching the passed in value(s)
  • Near(x): get the closest indices to the passed in value(s)
  • Where(f::Function): filter the array axis by a function of dimension index values.
  • Between(a, b): get all indices between two values (inclusive)
  • Contains(x): get indices where the value x falls in the interval. Only used for Sampled Intervals, for Points us At.

We can use selectors with dim wrappers:

using Dates, DimensionalData
timespan = DateTime(2001,1):Month(1):DateTime(2001,12)
A = DimArray(rand(12,10), (Ti(timespan), X(10:10:100)))

julia> A[X(Near(35)), Ti(At(DateTime(2001,5)))]
0.658404535807791

julia> A[Near(DateTime(2001, 5, 4)), Between(20, 50)]
DimArray with dimensions:
 X: 20:10:50
and referenced dimensions:
 Time (type Ti): 2001-05-01T00:00:00
and data: 4-element Array{Float64,1}
[0.456175, 0.737336, 0.658405, 0.520152]

Without dim wrappers selectors must be in the right order:

using Unitful

julia> A = DimArray(rand(10, 20), (X((1:10:100)u"m"), Ti((1:5:100)u"s")))

julia> A[Between(10.5u"m", 50.5u"m"), Near(23u"s")]
DimArray with dimensions:
 X: (11:10:41) m (Sampled: Ordered Regular Points)
and referenced dimensions:
 Time (type Ti): 21 s (Sampled: Ordered Regular Points)
and data: 4-element Array{Float64,1}
[0.819172, 0.418113, 0.461722, 0.379877]

For values other than Int/AbstractArray/Colon (which are set aside for regular indexing) the At selector is assumed, and can be dropped completely:

julia> A = DimArray(rand(3, 3), (X(Val((:a, :b, :c))), Y([25.6, 25.7, 25.8])));

julia> A[:b, 25.8]
0.61839141062599

Compile-time selectors

Using all Val indexes (only recommended for small arrays) you can index with named dimensions At arbitrary values with no runtime cost:

julia> A = DimArray(rand(3, 3), (cat=Val((:a, :b, :c)),
                                 val=Val((5.0, 6.0, 7.0))));

julia> @btime $A[:a, 7.0]
  2.094 ns (0 allocations: 0 bytes)
0.25620608873275397

julia> @btime $A[cat=:a, val=7.0]
  2.091 ns (0 allocations: 0 bytes)
0.25620608873275397

Methods where dims can be used containing indices or Selectors

getindex, setindex! view

Methods where dims can be used

  • size, axes, firstindex, lastindex
  • cat
  • reverse
  • dropdims
  • reduce, mapreduce
  • sum, prod, maximum, minimum,
  • mean, median, extrema, std, var, cor, cov
  • permutedims, adjoint, transpose, Transpose
  • mapslices, eachslice
  • fill

Warnings

Indexing with unordered or reverse order arrays has undefined behaviour. It will trash the dimension index, break searchsorted and nothing will make sense any more. So do it at you own risk. However, indexing with sorted vectors of Int can be useful. So it's allowed. But it will still do strange things to your interval sizes if the dimension span is Irregular.

Alternate Packages

There are a lot of similar Julia packages in this space. AxisArrays.jl, NamedDims.jl, NamedArrays.jl are registered alternative that each cover some of the functionality provided by DimensionalData.jl. DimensionalData.jl should be able to replicate most of their syntax and functionality.

AxisKeys.jl and AbstractIndices.jl are some other interesting developments. For more detail on why there are so many similar options and where things are headed, read this thread.