Introduction
ClimateBase
is a Julia package offering basic functionality for analyzing data that are typically in the form used by climate sciences. These data are dimensional & spatiotemporal but the corresponding dimensions all need special handling. For example the most common dimensions are longitude, latitude and time.
- longitude is by definition a periodic dimension
- latitude is a linear dimension. However because the coordinate system most often used in climate sciences is a grid of longitude × latitude (in equal degrees) the area element of space depends on latitude and this needs to be taken into account.
- time is a linear dimension in principle, but its values are
<: AbstractDateTime
instead of<: Real
. The human calendar (where these values come from) is periodic but each period may not correspond to the same physical time, and this also needs to be taken into account.
ClimateBase
is structured to deal with these intricacies, and in addition offer several functionalities commonly used, and sought after, by climate scientists. It also serves as the base building block for ClimateTools
, which offers more advanced functionalities.
At the moment the focus of ClimateBase
is not operating on data on disk. It is designed for in-memory climate data exploration and manipulation.
Installation
This package is registered and you can install it with
using Pkg; Pkg.add("ClimateBase")
Make sure your installed version coincides with the one in this docs (see bottom left corner of this page).
ClimArray
: the core data structure
This project treats "climate data" as an instance of ClimArray
. At the moment ClimArray
is a subtype of DimensionalArray
from DimensionalData.jl. A brief introduction to DimensionalData.jl is copied here from its docs, because basic handling of a ClimArray
comes from DimensionalData.jl. DimensionalData.jl allows to dimensionally-index data by their values.
E.g. you can create an array with
using ClimateBase, Dates
Time = ClimateBase.Ti # `Time` is more intuitive than `Ti`
lats = -90:5:90
lons = 0:10:359
t = Date(2000, 3, 15):Month(1):Date(2020, 3, 15)
# Here we wrap all dimension data into proper dimensions:
dimensions = (Lon(lons), Lat(lats), Time(t))
# where `Lon, Lat, Time` are `Dimension`s exported by ClimateBase
# combining the array data with dimensions makes a `ClimArray`:
data = rand(36, 37, 241) # same size as `dimensions`
A = ClimArray(data, dimensions)
ClimArray with dimensions: Lon (Longitude): 0:10:350 (Sampled - Ordered Regular Points) Lat (Latitude): -90:5:90 (Sampled - Ordered Regular Points) Ti (Time): Dates.Date("2000-03-15"):Dates.Month(1):Dates.Date("2020-03-15") (Sampled - Ordered Regular Points) and data: 36×37×241 Array{Float64, 3} [:, :, 1] 0.682785 0.196441 0.482166 0.107196 … 0.883372 0.149045 0.200619 0.225177 0.157344 0.83398 0.385147 0.288433 0.472002 0.0709818 0.107591 0.955497 0.194281 0.262429 0.187629 0.172732 0.669288 0.0727175 0.784876 0.847719 0.454173 0.637603 0.857081 0.133102 ⋮ ⋱ ⋮ 0.94931 0.892815 0.205435 0.185668 0.509361 0.158738 0.0690689 0.281482 0.252278 0.951422 0.748526 0.941638 0.018185 0.662576 0.406237 0.525258 0.359254 0.515256 0.144243 0.740783 0.303651 0.340329 0.762549 0.31811 0.966986 … 0.972916 0.617025 0.822253 [and 240 more slices...]
You can then select a specific time slice at Date(2011,5,15)
and a longitude interval between 0 and 30 degrees like so:
B = A[Lon(Between(0, 30)), Time(At(Date(2011,5,15)))]
ClimArray with dimensions: Lon (Longitude): 0:10:30 (Sampled - Ordered Regular Points) Lat (Latitude): -90:5:90 (Sampled - Ordered Regular Points) and data: 4×37 Matrix{Float64} 0.946979 0.0395582 0.548299 0.567949 … 0.508963 0.210096 0.362661 0.544981 0.619442 0.724581 0.479423 0.508748 0.769238 0.96043 0.323452 0.389164 0.174521 0.614649 0.14908 0.918143 0.68552 0.950714 0.885875 0.319967 0.648986 0.350994 0.0109536 0.278759
With ClimArray
you can use convenience, physically-inspired functions that do automatic (and correct) weighting. For example the latitudinal mean of B
is simply
C = latmean(B)
ClimArray with dimensions: Lon (Longitude): 0:10:30 (Sampled - Ordered Regular Points) and data: 4-element Vector{Float64} [0.54119, 0.458217, 0.526482, 0.487998]
where in this averaging process each data point is weighted by the cosine of its latitude.
Making a ClimArray
You can create a ClimArray
yourself, or you can load data from an .nc
file with CF-conventions, see NetCDF IO.
ClimateBase.ClimArray
— MethodClimArray(A::Array, dims::Tuple; name = "", attrib = nothing)
ClimArray
is a structure that contains numerical array data bundled with dimensional information, a name and an attrib
field (typically a dictionary) that holds general attributes. You can think of ClimArray
as a in-memory representation of a CFVariable.
At the moment, a ClimArray
is using DimensionalArray
from DimensionalData.jl, and all basic handling of ClimArray
is offered by DimensionalData
(see below).
ClimArray
is created by passing in standard array data A
and a tuple of dimensions dims
. See ncread
to automatically create a ClimArray
from a .nc file.
Example
using ClimateBase, Dates
Time = ClimateBase.Ti # more intuitive name for time dimension
lats = -90:5:90
lons = 0:10:359
t = Date(2000, 3, 15):Month(1):Date(2020, 3, 15)
# dimensional information:
dimensions = (Lon(lons), Lat(lats), Time(t))
data = rand(36, 37, 241) # numeric data
A = ClimArray(data, dimensions)
It is strongly recommended to use the dimensions we export (because we dispatch on them and use their information):
for D in ClimateBase.STANDARD_DIMS
println(D, " (full name = $(DimensionalData.name(D)))")
end
Lon (full name = Longitude) Lat (full name = Latitude) Ti (full name = Time) Hei (full name = Height) Pre (full name = Pressure) Coord (full name = )
We explicitly assume that Lon, Lat
are measured in degrees and not radians or meters (extremely important for spatial averaging processes).
NetCDF IO
ClimateBase.jl has support for file.nc ⇆ ClimArray
. Usually this is done using NCDatasets.jl, but see below for a function that translates a loaded xarray
(from Python) into ClimArray
.
Read
To load a ClimArray
directly from an .nc
file do:
ClimateBase.ncread
— Functionncread(file, var; name, kwargs...) → A
Load the variable var
from the file
and convert it into a ClimArray
with proper dimension mapping and also containing the variable attributes as a dictionary. Dimension attributes are also given to the dimensions of A
, if any exist. See keywords below for specifications for unstructured grids.
file
can be a string to a .nc
file. Or, it can be an NCDataset
, which allows you to lazily combine different .nc
data (typically split by time), e.g.
using Glob # for getting all files
alldata = glob("toa_fluxes_2020_*.nc")
file = NCDataset(alldata; aggdim = "time")
A = ClimArray(file, "tow_sw_all")
var
is a String
denoting which variable to load. For .nc
data containing groups var
can also be a tuple ("group_name", "var_name")
that loads a specific variable from a specific group. In this case, the attributes of both the group and the CF-variable are attributed to the created ClimArray
.
See also ncdetails
, nckeys
and ncwrite
.
We do two performance improvements while loading the data:
- If there are no missing values in the data (according to CF standards), the returned array is automatically converted to a concrete type (i.e.
Union{Float32, Missing}
becomesFloat32
). - Dimensions that are ranges (i.e. sampled with constant step size) are automatically transformed to a standard Julia
Range
type (which makes sub-selecting faster).
Keywords
name
optionally rename loaded array.grid = nothing
optionally specify whether the underlying grid isgrid = LonLatGrid()
orgrid = UnstructuredGrid()
, see Types of spatial coordinates. Ifnothing
, we try to deduce automatically based on the names of dimensions and other keys of theNCDataset
.lon, lat
. These two keywords are useful in unstructured grid data where the grid information is provided in a separate .nc file. What we need is the user to provide vectors of the central longitude and central latitude of each grid point. This is done e.g. by
Ifds = NCDataset("path/to/grid.nc"); lon = Array(ds["clon"]); lat = Array(ds["clat"]);
lon, lat
are given,grid
is automatically assumedUnstructuredGrid()
.
Notice that (at the moment) we use a pre-defined mapping of common names to proper dimensions - please feel free to extend the following via a Pull Request:
ClimateBase.COMMONNAMES
Dict{String, UnionAll} with 15 entries: "lat" => Lat "altitude" => Hei "time" => Ti "pressure" => Pre "xc" => Lon "x" => Lon "lon" => Lon "level" => Pre "latitude" => Lat "height" => Hei "long" => Lon "longitude" => Lon "t" => Ti "yc" => Lat "y" => Lat
Also, the following convenience functions are provided for examining the content of on-disk .nc
files without loading all data on memory.
ClimateBase.nckeys
— Functionnckeys(file::String)
Return all keys of the .nc
file in file
.
ClimateBase.ncdetails
— Functionncdetails(file::String, io = stdout)
Print details about the .nc
file in file
on io
.
ClimateBase.globalattr
— Functionglobalattr(file::String) → Dict
Return the global attributes of the .nc file.
Write
You can also write a bunch of ClimArray
s directly into an .nc
file with
ClimateBase.ncwrite
— Functionncwrite(file::String, Xs; globalattr = Dict())
Write the given ClimArray
instances (any iterable of ClimArray
s or a single ClimArray
) to a .nc
file following CF standard conventions using NCDatasets.jl. Optionally specify global attributes for the .nc
file.
The metadata of the arrays in Xs
, as well as their dimensions, are properly written in the .nc
file and any necessary type convertions happen automatically.
WARNING: We assume that any dimensions shared between the Xs
are identical.
See also ncread
.
xarray
You can use the following functions (which are not defined and exported in ClimateBase
to avoid dependency on PyCall.jl) to load data using Python's xarray
.
using ClimateBase, Dates
# This needs to numpy, xarray and dask installed from Conda
using PyCall
xr = pyimport("xarray")
np = pyimport("numpy")
function climarray_from_xarray(xa, fieldname, name = fieldname)
w = getproperty(xa, Symbol(fieldname))
raw_data = Array(w.values)
dnames = collect(w.dims) # dimensions in string name
dim_values, dim_attrs = extract_dimension_values_xarray(xa, dnames)
@assert collect(size(raw_data)) == length.(dim_values)
actual_dims = create_dims_xarray(dnames, dim_values, dim_attrs)
ca = ClimArray(raw_data, actual_dims, name; attrib = w.attrs)
end
function extract_dimension_values_xarray(xa, dnames = collect(xa.dims))
dim_values = []
dim_attrs = Vector{Any}(fill(nothing, length(dnames)))
for (i, d) in enumerate(dnames)
dim_attrs[i] = getproperty(xa, d).attrs
x = getproperty(xa, d).values
if d ≠ "time"
push!(dim_values, x)
else
dates = [np.datetime_as_string(y)[1:19] for y in x]
dates = DateTime.(dates)
push!(dim_values, dates)
end
end
return dim_values, dim_attrs
end
function create_dims_xarray(dnames, dim_values, dim_attrs)
true_dims = ClimateBase.to_proper_dimensions(dnames)
optimal_values = ClimateBase.vector2range.(dim_values)
out = []
for i in 1:length(true_dims)
push!(out, true_dims[i](optimal_values[i]; metadata = dim_attrs[i]))
end
return (out...,)
end
# Load some data
xa = xr.open_mfdataset(ERA5_files_path)
X = climarray_from_xarray(xa, "w", "optional name")
Temporal
Functions related with the Time
dimension.
ClimateBase.timemean
— Functiontimemean(A::ClimArray [, w]) = timeagg(mean, A, w)
Temporal average of A
, see timeagg
.
ClimateBase.timeagg
— Functiontimeagg(f, A::ClimArray, W = nothing)
Perform a proper temporal aggregation of the function f
(e.g. mean, std
) on A
where:
- Only full year spans of
A
are included, seemaxyearspan
(because most processes are affected by yearly cycle, and averaging over an uneven number of cycles typically results in artifacts) - Each month in
A
is weighted with its length in days (for monthly sampled data)
If you don't want these features, just do dropagg
(f, A, Time, W)
. This is also done in the case where the time sampling is unknown.
W
are possible statistical weights that are used in conjuction to the temporal weighting, to weight each time point differently. If they are not a vector (a weight for each time point), then they have to be a dimensional array of same dimensional layout as A
(a weight for each data point).
See also monthlyagg
, yearlyagg
, seasonalyagg
.
timeagg(f, t::Vector, x::Vector, w = nothing)
Same as above, but for arbitrary vector x
accompanied by time vector t
.
ClimateBase.monthlyagg
— Functionmonthlyagg(A::ClimArray, f = mean; mday = 15) -> B
Create a new array where the temporal information has been aggregated into months using the function f
. The dates of the new array always have day number of mday
.
ClimateBase.yearlyagg
— Functionyearlyagg(A::ClimArray, f = mean) -> B
Create a new array where the temporal information has been aggregated into years using the function f
. By convention, the dates of the new array always have month and day number of 1
.
ClimateBase.seasonalyagg
— Functionseasonalyagg(A::ClimArray, f = mean) -> B
Create a new array where the temporal information has been aggregated into seasons using the function f
. By convention, seasons are represented as Dates spaced 3-months apart, where only the months December, March, June and September are used to specify the date, with day 1.
ClimateBase.temporalrange
— Functiontemporalrange(A::ClimArray, f = Dates.month) → r
temporalrange(t::AbstractVector{<:TimeType}}, f = Dates.month) → r
Return a vector of ranges so that each range of indices are values of t
that belong in either the same month, year, day, or season, depending on f
. f
can take the values: Dates.year, Dates.month, Dates.day
or season
(all are functions).
Used in e.g. monthlyagg
, yearlyagg
or seasonalyagg
.
ClimateBase.maxyearspan
— Functionmaxyearspan(A::ClimArray) = maxyearspan(dims(A, Time))
maxyearspan(t::Vector{<:DateTime}) → i
Find the maximum index i
of t
so that t[1:i]
covers exact(*) multiples of years.
(*) For monthly spaced data i
is a multiple of 12
while for daily data we find the largest possible multiple of DAYS_IN_ORBIT
.
ClimateBase.temporal_sampling
— Functiontemporal_sampling(x) → symbol
Return the temporal sampling type of x
, which is either an array of Date
s or a dimensional array (with Time
dimension).
Possible return values are:
:hourly
, where the temporal difference between successive entries is exactly 1 hour.:daily
, where the temporal difference between successive entries is exactly 1 day.:monthly
, where all dates have the same day, but different month.:yearly
, where all dates have the same month and day, but different year.:other
, which means thatx
doesn't fall to any of the above categories.
ClimateBase.time_in_days
— Functiontime_in_days(t::AbstractArray{<:TimeType}, T = Float32)
Convert a given date time array into measurement units of days: a real-valued array which counts time in days, always increasing (cumulative).
Spatial
Spatial aggregation
All functions in this section work for both types of space, see Types of spatial coordinates.
ClimateBase.zonalmean
— Functionzonalmean(A::ClimArray)
Return the zonal mean of A
.
ClimateBase.latmean
— Functionlatmean(A::ClimArray)
Return the latitude-mean A
(mean across dimension Lat
). This function properly weights by the cosine of the latitude.
ClimateBase.spacemean
— Functionspacemean(A::ClimArray [, W]) = spaceagg(mean, A, W)
Average given A
over its spatial coordinates. Optionally provide statistical weights in W
.
ClimateBase.spaceagg
— Functionspaceagg(f, A::ClimArray, W = nothing)
Aggregate A
using function f
(e.g. mean, std
) over all available space (i.e. longitude and latitude) of A
, weighting every part of A
by its spatial area.
W
can be extra weights, to weight each spatial point with. W
can either be just a ClimArray
with same space as A
, or of exactly same shape as A
.
ClimateBase.hemispheric_means
— Functionhemispheric_means(A) → nh, sh
Return the (proper) averages of A
over the northern and southern hemispheres. Notice that this function explicitly does both zonal as well as meridional averaging. Use hemispheric_functions
to just split A
into two hemispheres.
ClimateBase.hemispheric_functions
— Functionhemispheric_functions(A::ClimArray) → north, south
Return two arrays north, south
, by splitting A
to its northern and southern hemispheres, appropriately translating the latitudes of south
so that both arrays have the same latitudinal dimension (and thus can be compared and do opperations between them).
ClimateBase.lonlatfirst
— Functionlonlatfirst(A::ClimArray, args...) → B
Permute the dimensions of A
to make a new array B
that has first dimension longitude, second dimension latitude, with the remaining dimensions of A
following (useful for most plotting functions). Optional extra dimensions can be given as args...
, specifying a specific order for the remaining dimensions.
Example:
B = lonlatfirst(A)
C = lonlatfirst(A, Time)
Types of spatial coordinates
At the moment the following type of spatial coordinates are supported:
ClimateBase.LonLatGrid
— TypeSpace coordinates are represented by two orthogonal dimensions Lon, Lat
, one being longitude and the other being latitude.
ClimateBase.UnstructuredGrid
— TypeSpace coordinates are represented by a single dimension Coord
, whose elements are coordinate locations, i.e. 2-element SVector(longitude, latitude)
. Each coordinate represents an equal area polygon corresponding to the point in space. The actual limits of each polygon are not included in the dimension for performance reasons.
This dimension allows indexing according to the underlying Lon, Lat
representation, e.g. you can do
A # some `ClimArray` with unstructured grid type.
A[Coord(Lon(Between(0, 30)), Lat(Between(-30, 30)))]
To use functions such as zonalmean
or hemispheric_means
with this grid, you must first sort the ClimArray
so that the latitudes of its coordinates are sorted in ascending order. I.e.
A # some `ClimArray` with unstructured grid type.
coords = dims(A, Coord).val
si = sortperm(coords, by = reverse)
A = A[Coord(si)]
This is done automatically by ncread
.
UnstructuredGrid
functionality is currently in an experimental phase! Notice that non-equal area unstructured grids are not supported yet.
ClimateBase.jl works with either type of spatial coordinate system. Therefore, physically inspired averaging functions, like spacemean
or zonalmean
, work for both types of spatial coordinates. In addition, the function spatialidxs
returns an iterator over the spatial coordinates of the data, and works for both types (grid or equal-area):
ClimateBase.spatialidxs
— Functionspatialidxs(A::ClimArray) → idxs
Return an iterable that can be used to access all spatial points of A
with the syntax
idxs = spatialidxs(A)
for i in idxs
slice_at_give_space_point = A[i...]
end
Works for all types of space (...
is necessary because i
is a Tuple
).
ncread
tries to automatically deduce the correct space type and create the appropriate dimension.
General aggregation
The physical averages of the previous section are done by taking advantage of a general aggregation syntax, which works with any aggregating function like mean, sum, std
, etc.
ClimateBase.dropagg
— Functiondropagg(f, A, dims)
Apply statistics/aggregating function f
(e.g. sum
or mean
) on array A
across dimension(s) dims
and drop the corresponding dimension(s) from the result (Julia inherently keeps singleton dimensions).
If A
is one dimensional, dropagg
will return the single number of applying f(A)
.
ClimateBase.collapse
— Functioncollapse(f, A, dim)
Reduce A
towards dimension dim
using the collapsing function f
(e.g. mean
). This means that f
is applied across all other dimensions of A
, each of which are subsequently dropped, leaving only the collapsed result of A
vs. the remaining dimension.
Timeseries Analysis
ClimateBase.sinusoidal_continuation
— Functionsinusoidal_continuation(T::ClimArray, fs = [1, 2]; Tmin = -Inf, Tmax = Inf)
Fill-in the missing values of spatiotemporal field T
, by fitting sinusoidals to the non-missing values, and then using the fitted sinusoidals for the missing values.
Frequencies are given per year (frequency 2 means 1/2 of a year).
Tmin, Tmax
limits are used to clamp the result into this range (no clamping in the default case).
ClimateBase.seasonal_decomposition
— Functionseasonal_decomposition(A::ClimArray, fs = [1, 2]) → seasonal, residual
Decompose A
into a seasonal and residual components, where the seasonal contains the periodic parts of A
, with frequencies given in fs
, and residual contains what's left.
fs
is measured in 1/year. This function works even for non-equispaced time axis (e.g. monthly averages) and uses LPVSpectral.jl and SignalDecomposition.jl.
Climate quantities
Functions that calculate climate-related quantities.
ClimateBase.insolation
— Functioninsolation(t, ϕ; kwargs...)
Calculate daily averaged insolation in W/m² at given time and latitude t, φ
. φ
is given in degrees, and t
in days (real number or date).
Keywords:
Ya = DAYS_IN_ORBIT # = 365.26 # days
t_VE = 76.0 # days of vernal equinox
S_0 = 1362.0 # W/m^2
γ=23.44
ϖ=282.95
e=0.017 # eccentricity
ClimateBase.surface_atmosphere_contributions
— Functionsurface_atmosphere_contributions(I, F_toa_⬆, F_s_⬆, F_s_⬇) → α_ATM, α_SFC
Calculate the atmospheric and surface contributions of the planetary albedo, so that the TOA albedo is α = α_ATM + α_SFC
, using the simple 1-layer radiative transfer model by Donohoe & Battisti (2011) or G. Stephens (2015). Stephens' formulas are incorrect and I have corrected them!
ClimateBase.total_toa_albedo
— Functiontotal_toa_albedo(a, s, t) = a + s*t^2/(1-a*s)
Combine given atmosphere albedo a
, surface albedo s
and atmosphere transmittance t
into a total top-of-the-atmosphere albedo α
according to the model of Donohoe & Battisti (2011).
Plotting
Currently ClimateBase.jl does not have integrated plotting support. In the near future it will have this based on the upcoming GeoMakie.jl.
For now, you can use PyCall.jl, matplotlib, and the Python library cartopy. In the file ClimateBase/plotting/python.jl
we provide two functions that plot maps of ClimArray
in arbitrary projections: earthsurface
for LonLatGrid
and earthscatter
for UnstructuredGrid
. You can incorporate these in your source code as a temporary solution.
Ensemble types
A dedicated type representing ensembles has no reason to exist in ClimateBase.jl. As the package takes advantage of standard Julia datastructures and syntax, those can be used to represent "ensembles". For example to do an "ensemble global mean" you can just do:
E = [ClimArray("ensemble_$i.nc", "x") for i in 1:10]
global_mean = mean(spacemean(X) for X in E)
where you see that the "ensemble" was represented just as a Vector{ClimArray}
.
Crash-course to DimensionalData.jl
DimensionalData
— ModuleDimensionalData
DimensionalData.jl provides tools and abstractions for working with datasets that have named dimensions, and optionally a lookup index. It's a pluggable, generalised version of AxisArrays.jl with a cleaner syntax, and additional functionality found in NamedDims.jl. It has similar goals to pythons xarray, and is primarily written for use with spatial data in GeoData.jl.
Broadcasting and most Base methods maintain and sync dimension context.
DimensionalData.jl also implements:
- comprehensive plot recipes for Plots.jl.
- a Tables.jl interface with
DimTable
- multi-layered
DimStack
s that can be indexed together, and have base methods applied to all layers. - the Adapt.jl interface for use on GPUs, even as GPU kernel arguments.
- traits for handling a wide range of spatial data types accurately.
Dimensions
Dimensions are wrapper types. They hold the lookup index, details about the grid, and other metadata. They are also used to index into the array. X
, Y
, Z
and Ti
are the exported defaults. A generalised Dim
type is available to use arbitrary symbols to name dimensions. Custom dimension types can also be defined using the @dim
macro.
Dimensions can be used to construct arrays in rand
, ones
, zeros
and fill
with either a range for a lookup index or a number for the dimension length:
julia> using DimensionalData
julia> A = rand(X(1:40), Y(50))
40×50 DimArray{Float64,2} with dimensions:
X: 1:40 (Sampled - Ordered Regular Points)
Y
0.929006 0.116946 0.750017 … 0.172604 0.678835 0.495294
0.0550038 0.100739 0.427026 0.778067 0.309657 0.831754
⋮ ⋱
0.647768 0.965682 0.049315 0.220338 0.0326206 0.36705
0.851769 0.164914 0.555637 0.771508 0.964596 0.30265
We can also use dim wrappers for indexing, so that the dimension order in the underlying array does not need to be known:
julia> A[Y(1), X(1:10)]
10-element DimArray{Float64,1} with dimensions:
X: 1:10 (Sampled - Ordered Regular Points)
and reference dimensions: Y(1)
0.929006
0.0550038
0.641773
⋮
0.846251
0.506362
0.0492866
And this has no runtime cost:
julia> A = ones(X(3), Y(3))
3×3 DimArray{Float64,2} with dimensions: X, Y
1.0 1.0 1.0
1.0 1.0 1.0
1.0 1.0 1.0
julia> @btime $A[X(1), Y(2)]
1.077 ns (0 allocations: 0 bytes)
1.0
julia> @btime parent($A)[1, 2]
1.078 ns (0 allocations: 0 bytes)
1.0
Dims can be used for indexing and views without knowing dimension order:
julia> A = rand(X(40), Y(50))
40×50 DimArray{Float64,2} with dimensions: X, Y
0.377696 0.105445 0.543156 … 0.844973 0.163758 0.849367
⋮ ⋱
0.431454 0.108927 0.137541 0.531587 0.592512 0.598927
julia> A[Y=3]
40-element DimArray{Float64,1} with dimensions: X
and reference dimensions: Y(3)
0.543156
⋮
0.137541
julia> view(A, Y(), X(1:5))
5×50 DimArray{Float64,2} with dimensions: X, Y
0.377696 0.105445 0.543156 … 0.844973 0.163758 0.849367
⋮ ⋱
0.875279 0.133032 0.925045 0.156768 0.736917 0.444683
And for specifying dimension number in all Base
and Statistics
functions that have a dims
argument:
julia> using Statistics
julia> A = rand(X(3), Y(4), Ti(5));
julia> mean(A; dims=Ti)
3×4×1 DimArray{Float64,3} with dimensions: X, Y, Ti (Time)
[:, :, 1]
0.168058 0.52353 0.563065 0.347025
0.472786 0.395884 0.307846 0.518926
0.365028 0.381367 0.423553 0.369339
You can also use symbols to create Dim{X}
dimensions, although we can't use the rand
method directly with Symbols, and insteadd use the regular DimArray
constructor:
julia> A = DimArray(rand(10, 20, 30), (:a, :b, :c));
julia> A[a=2:5, c=9]
4×20 DimArray{Float64,2} with dimensions: Dim{:a}, Dim{:b}
and reference dimensions: Dim{:c}(9)
0.134354 0.581673 0.422615 … 0.410222 0.687915 0.753441
0.573664 0.547341 0.835962 0.0353398 0.794341 0.490831
0.166643 0.133217 0.879084 0.695685 0.956644 0.698638
0.325034 0.147461 0.149673 0.560843 0.889962 0.75733
Selectors
Selectors find indices in the lookup index for each dimension:
At(x)
: get the index exactly matching the passed in value(s)Near(x)
: get the closest index to the passed in value(s)Where(f::Function)
: filter the array axis by a function of the dimension index values.Between(a, b)
: get all indices between two values, excluding the high value.Contains(x)
: get indices where the value x falls within the interval, exluding the upper value. Only used forSampled
Intervals
, forPoints
, useAt
.
(Between
and Contains
exlude the upper boundary so that adjacent selections never contain the same index)
Selectors can be used in getindex
, setindex!
and view
to select indices matching the passed in value(s)
We can use selectors inside dim wrappers:
julia> using Dates
julia> timespan = DateTime(2001,1):Month(1):DateTime(2001,12)
DateTime("2001-01-01T00:00:00"):Month(1):DateTime("2001-12-01T00:00:00")
julia> A = DimArray(rand(12,10), (Ti(timespan), X(10:10:100)))
12×10 DimArray{Float64,2} with dimensions:
Ti (Time): DateTime("2001-01-01T00:00:00"):Month(1):DateTime("2001-12-01T00:00:00") (Sampled - Ordered Regular Points)
X: 10:10:100 (Sampled - Ordered Regular Points)
0.14106 0.476176 0.311356 0.454908 … 0.464364 0.973193 0.535004
⋮ ⋱
0.522759 0.390414 0.797637 0.686718 0.901123 0.704603 0.0740788
julia> @btime A[X(Near(35)), Ti(At(DateTime(2001,5)))]
0.3133109280208961
Without dim wrappers selectors must be in the right order:
using Unitful
julia> A = rand(X((1:10:100)u"m"), Ti((1:5:100)u"s"));
julia> A[Between(10.5u"m", 50.5u"m"), Near(23u"s")]
4-element DimArray{Float64,1} with dimensions:
X: (11:10:41) m (Sampled - Ordered Regular Points)
and reference dimensions:
Ti(21 s) (Time): 21 s (Sampled - Ordered Regular Points)
0.584028
⋮
0.716715
For values other than Int
/AbstractArray
/Colon
(which are set aside for regular indexing) the At
selector is assumed, and can be dropped completely:
julia> A = rand(X([:a, :b, :c]), Y([25.6, 25.7, 25.8]));
julia> A[:b, 25.8]
0.61839141062599
Compile-time selectors
Using all Val
indexes (only recommended for small arrays) you can index with named dimensions At
arbitrary values with no runtime cost:
julia> A = rand(X(Val((:a, :b, :c))), Y(Val((5.0, 6.0, 7.0))))
3×3 DimArray{Float64,2} with dimensions:
X: Val{(:a, :b, :c)}() (Categorical - Unordered)
Y: Val{(5.0, 6.0, 7.0)}() (Categorical - Unordered)
0.5808 0.835037 0.528461
0.8924 0.431394 0.506915
0.66386 0.955305 0.774132
julia> @btime $A[:c, 6.0]
2.777 ns (0 allocations: 0 bytes)
0.9553052910459472
julia> @btime $A[Val(:c), Val(6.0)]
1.288 ns (0 allocations: 0 bytes)
0.9553052910459472
Methods where dims can be used containing indices or Selectors
getindex
, setindex!
view
Methods where dims, dim types, or Symbol
s can be used to indicate the array dimension:
size
,axes
,firstindex
,lastindex
cat
,reverse
,dropdims
reduce
,mapreduce
sum
,prod
,maximum
,minimum
,mean
,median
,extrema
,std
,var
,cor
,cov
permutedims
,adjoint
,transpose
,Transpose
mapslices
,eachslice
Methods where dims can be used to construct DimArray
s:
fill
,ones
,zeros
,rand
Warnings
Indexing with unordered or reverse order arrays has undefined behaviour. It will trash the dimension index, break searchsorted
and nothing will make sense any more. So do it at you own risk. However, indexing with sorted vectors of Int can be useful. So it's allowed. But it will still do strange things to your interval sizes if the dimension span is Irregular
.
Alternate Packages
There are a lot of similar Julia packages in this space. AxisArrays.jl, NamedDims.jl, NamedArrays.jl are registered alternative that each cover some of the functionality provided by DimensionalData.jl. DimensionalData.jl should be able to replicate most of their syntax and functionality.
AxisKeys.jl and AbstractIndices.jl are some other interesting developments. For more detail on why there are so many similar options and where things are headed, read this thread.