Introduction

ClimateBase is a Julia package offering basic functionality for analyzing data that are typically in the form used by climate sciences. These data are dimensional & spatiotemporal but the corresponding dimensions all need special handling. For example the most common dimensions are longitude, latitude and time.

  • longitude is by definition a periodic dimension
  • latitude is a linear dimension. However because the coordinate system most often used in climate sciences is a grid of longitude × latitude (in equal degrees) the area element of space depends on latitude and this needs to be taken into account.
  • time is a linear dimension in principle, but its values are <: AbstractDateTime instead of <: Real. The human calendar (where these values come from) is periodic but each period may not correspond to the same physical time, and this also needs to be taken into account.

ClimateBase is structured to deal with these intricacies, and in addition offer several functionalities commonly used, and sought after, by climate scientists. It also serves as the base building block for ClimateTools, which offers more advanced functionalities.

At the moment the focus of ClimateBase is not on operating on data on disk. It is designed for in-memory climate data exploration and manipulation.

Installation

This package is registered and you can install it with

using Pkg; Pkg.add("ClimateBase")

Make sure your installed version coincides with the one in this docs (see bottom left corner of this page).

ClimArray: the core data structure

This project treats "climate data" as an instance of ClimArray. At the moment ClimArray is a subtype of DimArray from DimensionalData.jl. A brief introduction to DimensionalData.jl is copied here from its docs, because basic handling of a ClimArray comes from DimensionalData.jl. DimensionalData.jl allows to dimensionally-index data by their values.

E.g. you can create an array with

using ClimateBase, Dates
Time = ClimateBase.Ti # `Time` is more intuitive than `Ti`
lats = -90:5:90
lons = 0:10:359
t = Date(2000, 3, 15):Month(1):Date(2020, 3, 15)
# Here we wrap all dimension data into proper dimensions:
dimensions = (Lon(lons), Lat(lats), Time(t))
# where `Lon, Lat, Time` are `Dimension`s exported by ClimateBase
# combining the array data with dimensions makes a `ClimArray`:
data = rand(36, 37, 241) # same size as `dimensions`
A = ClimArray(data, dimensions)
ClimArray with dimensions: 
  Lon Sampled{Int64} 0:10:350 ForwardOrdered Regular Points,
  Lat Sampled{Int64} -90:5:90 ForwardOrdered Regular Points,
  Ti Sampled{Dates.Date} Dates.Date("2000-03-15"):Dates.Month(1):Dates.Date("2020-03-15") ForwardOrdered Regular Points
and data: 36×37×241 Array{Float64, 3}
[:, :, 1]
 0.12513    0.00334507  0.390391  …  0.418868   0.34221   0.0856435
 0.0439363  0.358537    0.483008     0.0894804  0.926388  0.848983
 0.574948   0.649255    0.23747      0.206712   0.3111    0.748428
 0.63448    0.761706    0.352222     0.209859   0.755624  0.0484568
 ⋮                                ⋱             ⋮         
 0.547074   0.781831    0.90923      0.44488    0.524651  0.907125
 0.175488   0.409018    0.732659     0.951737   0.731203  0.326083
 0.507028   0.385159    0.863656     0.634589   0.274263  0.0660601
 0.0300957  0.760518    0.759446  …  0.799404   0.304929  0.995428
[and 240 more slices...]

You can then select a specific time slice at Date(2011,5,15) and a longitude interval between 0 and 30 degrees like so:

B = A[Lon(Between(0, 30)), Time(At(Date(2011,5,15)))]
ClimArray with dimensions: 
  Lon Sampled{Int64} 0:10:30 ForwardOrdered Regular Points,
  Lat Sampled{Int64} -90:5:90 ForwardOrdered Regular Points
and data: 4×37 Matrix{Float64}
 0.155311   0.316027   0.876127  0.474311   …  0.11738   0.152006   0.749453
 0.209277   0.470143   0.950087  0.372802      0.942857  0.247625   0.274171
 0.0487065  0.0757588  0.568458  0.0581099     0.864305  0.927821   0.259457
 0.299593   0.79507    0.56487   0.53909       0.434198  0.0738453  0.318599

With ClimArray you can use convenience, physically-inspired functions that do automatic (and correct) weighting. For example the latitudinal mean of B is simply

C = latmean(B)
ClimArray with dimensions: 
  Lon Sampled{Int64} 0:10:30 ForwardOrdered Regular Points
attributes: DimensionalData.Dimensions.LookupArrays.NoMetadata()
and data: 4-element Vector{Float64}
[0.442424, 0.540568, 0.509642, 0.49338]

where in this averaging process each data point is weighted by the cosine of its latitude.

Making a ClimArray

You can create a ClimArray yourself, or you can load data from an .nc file with CF-conventions, see NetCDF IO.

ClimateBase.ClimArrayMethod
ClimArray(A::Array, dims::Tuple; name = "", attrib = nothing)

ClimArray is a structure that contains numerical array data bundled with dimensional information, a name and an attrib field (typically a dictionary) that holds general attributes. You can think of ClimArray as a in-memory representation of a CFVariable.

At the moment, a ClimArray is using DimArray from DimensionalData.jl, and all basic handling of ClimArray is offered by DimensionalData (see below).

ClimArray is created by passing in standard array data A and a tuple of dimensions dims. See ncread to automatically create a ClimArray from a .nc file. For obtaining the raw numeric values of a ClimArray or any of its dimensions, use the function gnv.

Example

using ClimateBase, Dates
Time = ClimateBase.Ti # more intuitive name for time dimension
lats = -90:5:90
lons = 0:10:359
t = Date(2000, 3, 15):Month(1):Date(2020, 3, 15)
# dimensional information:
dimensions = (Lon(lons), Lat(lats), Time(t))
data = rand(36, 37, 241) # numeric data
A = ClimArray(data, dimensions)
source
ClimateBase.gnvFunction
gnv(object) → x

Short for "get numeric value", this function will return the pure numeric value(s) of the given object. Convenience function for quickly getting the numeric data of dimensional arrays or dimensions.

source

It is strongly recommended to use the dimensions we export (because we dispatch on them and use their information):

for D in ClimateBase.STANDARD_DIMS
    println(D, " (full name = $(DimensionalData.name(D)))")
end
Lon (full name = Longitude)
Lat (full name = Latitude)
Ti (full name = Time)
Hei (full name = Height)
Pre (full name = Pressure)
Coord (full name = )

We explicitly assume that Lon, Lat are measured in degrees and not radians or meters (extremely important for spatial averaging processes).

Crash-course to DimensionalData.jl

DimensionalDataModule

DimensionalData

CI Codecov Aqua.jl Quality Assurance

DimensionalData.jl provides tools and abstractions for working with datasets that have named dimensions, and optionally a lookup index. It provides no-cost abstractions for named indexing, and fast index lookups.

DimensionalData is a pluggable, generalised version of AxisArrays.jl with a cleaner syntax, and additional functionality found in NamedDims.jl. It has similar goals to pythons xarray, and is primarily written for use with spatial data in Rasters.jl.

The basic syntax is:

julia> using DimensionalData

julia> A = rand(X(50), Y(10.0:40.0))
50×31 DimArray{Float64,2} with dimensions: 
  X,
  Y Sampled{Float64} 10.0:1.0:40.0 ForwardOrdered Regular Points
 10.0         11.0       12.0       13.0       14.0        15.0       16.0        17.0       …  32.0       33.0        34.0       35.0       36.0        37.0       38.0        39.0       40.0
  0.293347     0.737456   0.986853   0.780584   0.707698    0.804148   0.632667    0.780715      0.767575   0.555214    0.872922   0.808766   0.880933    0.624759   0.803766    0.796118   0.696768
  0.199599     0.290297   0.791926   0.564099   0.0241986   0.239102   0.0169679   0.186455      0.644238   0.467091    0.524335   0.42627    0.982347    0.324083   0.0356058   0.306446   0.117187
  ⋮                                                         ⋮                                ⋱                                     ⋮                                                        ⋮
  0.720404     0.388392   0.635609   0.430277   0.943823    0.661993   0.650442    0.91391   …   0.299713   0.518607    0.411973   0.410308   0.438817    0.580232   0.751231    0.519257   0.598583
  0.00602102   0.270036   0.696129   0.139551   0.924883    0.190963   0.164888    0.13436       0.717962   0.0452556   0.230943   0.848782   0.0362465   0.363868   0.709489    0.644131   0.801824

julia> A[Y=1:10, X=1]
10-element DimArray{Float64,1} with dimensions: 
  Y Sampled{Float64} 10.0:1.0:19.0 ForwardOrdered Regular Points
and reference dimensions: X
 10.0  0.293347
 11.0  0.737456
 12.0  0.986853
 13.0  0.780584
  ⋮    
 17.0  0.780715
 18.0  0.472306
 19.0  0.20442

See the docs for more details

Some properties of DimensionalData.jl objects:

  • broadcasting and most Base methods maintain and sync dimension context.
  • comprehensive plot recipes for Plots.jl.
  • a Tables.jl interface with DimTable
  • multi-layered DimStacks that can be indexed together, and have base methods applied to all layers.
  • the Adapt.jl interface for use on GPUs, even as GPU kernel arguments.
  • traits for handling a wide range of spatial data types accurately.

Methods where dims can be used containing indices or Selectors

getindex, setindex! view

Methods where dims, dim types, or Symbols can be used to indicate the array dimension:

  • size, axes, firstindex, lastindex
  • cat, reverse, dropdims
  • reduce, mapreduce
  • sum, prod, maximum, minimum,
  • mean, median, extrema, std, var, cor, cov
  • permutedims, adjoint, transpose, Transpose
  • mapslices, eachslice

Methods where dims can be used to construct DimArrays:

  • fill, ones, zeros, falses, trues, rand

Note: recent changes have greatly reduced the exported API

Previously exported methods can me brought into global scope by using the sub-modules they have been moved to - LookupArrays and Dimensions:

using DimensionalData
using DimensionalData.LookupArrays, DimensionalData.Dimensions

Alternate Packages

There are a lot of similar Julia packages in this space. AxisArrays.jl, NamedDims.jl, NamedArrays.jl are registered alternative that each cover some of the functionality provided by DimensionalData.jl. DimensionalData.jl should be able to replicate most of their syntax and functionality.

AxisKeys.jl and AbstractIndices.jl are some other interesting developments. For more detail on why there are so many similar options and where things are headed, read this thread.

The main functionality is explained here, but the full list of features is listed at the API page.

Available selectors

SelectorDescription
[At(x)]get the index exactly matching the passed in value(s)
[Near(x)]get the closest index to the passed in value(s)
[Contains(x)]get indices where the value x falls within an interval
[Where(f)]filter the array axis by a function of the dimension index values.
[a..b]get all indices between two values, inclusively.
[OpenInterval(a, b)]get all indices between a and b, exclusively.
[Interval{A,B}(a, b)]get all indices between a and b, as :closed or :open.