Statistics

Temporal

Functions related with the Time dimension.

ClimateBase.timeaggFunction
timeagg(f, A::ClimArray, W = nothing)

Perform a proper temporal aggregation of the function f (e.g. mean, std) on A where:

  • Only full year spans of A are included, see maxyearspan (because most processes are affected by yearly cycle, and averaging over an uneven number of cycles typically results in artifacts)
  • Each month in A is weighted with its length in days (for monthly sampled data)

If you don't want these features, just do dropagg(f, A, Time, W). This is also done in the case where the time sampling is unknown.

W are possible statistical weights that are used in conjuction to the temporal weighting, to weight each time point differently. If they are not a vector (a weight for each time point), then they have to be a dimensional array of same dimensional layout as A (a weight for each data point).

See also monthlyagg, yearlyagg, seasonalyagg.

timeagg(f, t::Vector, x::Vector, w = nothing)

Same as above, but for arbitrary vector x accompanied by time vector t.

source
ClimateBase.monthlyaggFunction
monthlyagg(A::ClimArray, f = mean; mday = 15) -> B

Create a new array where the temporal information has been aggregated into months using the function f. The dates of the new array always have day number of mday.

source
ClimateBase.yearlyaggFunction
yearlyagg(A::ClimArray, f = mean) -> B

Create a new array where the temporal information has been aggregated into years using the function f. By convention, the dates of the new array always have month and day number of 1.

source
ClimateBase.seasonalyaggFunction
seasonalyagg(A::ClimArray, f = mean) -> B

Create a new array where the temporal information has been aggregated into seasons using the function f. By convention, seasons are represented as Dates spaced 3-months apart, where only the months December, March, June and September are used to specify the date, with day 1.

source
ClimateBase.temporalrangesFunction
temporalranges(A::ClimArray, f = Dates.month) → r
temporalranges(t::AbstractVector{<:TimeType}}, f = Dates.month) → r

Return a vector of ranges so that each range of indices are values of t that belong in either the same month, year, day, or season, depending on f. f can take the values: Dates.year, Dates.month, Dates.day or season (all are functions).

Used in e.g. monthlyagg, yearlyagg or seasonalyagg.

source
ClimateBase.maxyearspanFunction
maxyearspan(A::ClimArray) = maxyearspan(dims(A, Time))
maxyearspan(t::Vector{<:DateTime}) → i

Find the maximum index i of t so that t[1:i] covers exact(*) multiples of years.

(*) For monthly spaced data i is a multiple of 12 while for daily data we find the largest possible multiple of DAYS_IN_ORBIT.

source
ClimateBase.temporal_samplingFunction
temporal_sampling(x) → symbol

Return the temporal sampling type of x, which is either an array of Dates or a dimensional array (with Time dimension).

Possible return values are:

  • :hourly, where the temporal difference between successive entries is exactly 1 hour.
  • :daily, where the temporal difference between successive entries is exactly 1 day.
  • :monthly, where all dates have the same day, but different month.
  • :yearly, where all dates have the same month and day, but different year.
  • :other, which means that x doesn't fall to any of the above categories.
source
ClimateBase.realtime_daysFunction
realtime_days(t::AbstractVector{<:TimeType}, T = Float32)

Convert the given sequential date time vector t in a vector in a format of "real time", where time is represented by real numbers, increasing cumulatively, as is the case when representing a timeseries x(t). As only differences matter in this form, the returned vector always starts from 0. The measurement unit of time here is days.

For temporal sampling less than daily return realtime_milliseconds(t) ./ (24*60*60*1000).

Example:

julia> t = Date(2004):Month(1):Date(2004, 6)
Date("2004-01-01"):Month(1):Date("2004-06-01")

julia> realtime_days(t)
6-element Vector{Float32}:
   0.0
  29.0
  60.0
  90.0
 121.0
 151.0
source
ClimateBase.seasonalityFunction
seasonality(t, x; y0 = year(t[1])) → dates, vals

Calculate the "seasonality" of a vector x defined with respect to a datetime vector t and return dates, vals. dates are all unique dates present in t disregarding the year (so only the month and day are compared). The dates have as year entry y0. vals is a vector of vectors, where vals[i] are all the values of x that have day and month same as dates[i]. The elements of vals are sorted as encountered in x.

Typically one is interested in mean.(vals), which actually is the seasonality, and std.(vals) which is the interannual variability at each date.

seasonality(A::ClimArray) → dates, vals

If given a ClimArray, then the array must have only one dimension (time).

source
ClimateBase.sametimespanFunction
sametimespan(Xs; mintime = nothing, maxtime = nothing) → Ys

Given a container of ClimArrays, return the same ClimArrays but now accessed in the Time dimension so that they all span the same time interval. Also works for dictionaries with values ClimArrays.

Optionally you can provide Date or DateTime values for the keywords mintime, maxtime that can further limit the minimum/maximum time span accessed.

sametimespan takes into consideration the temporal sampling of the arrays for better accuracy.

source

Spatial

All functions in this section work for both types of space, see Types of spatial information.

ClimateBase.latmeanFunction
latmean(A::ClimArray)

Return the latitude-mean A (mean across dimension Lat). This function properly weights by the cosine of the latitude.

source
ClimateBase.spacemeanFunction
spacemean(A::ClimArray [, W]) = spaceagg(mean, A, W)

Average A over its spatial coordinates. Optionally provide statistical weights in W.

source
ClimateBase.spaceaggFunction
spaceagg(f, A::ClimArray, W = nothing)

Aggregate A using function f (e.g. mean, std) over all available space (i.e. longitude and latitude) of A, weighting every part of A by its spatial area.

W can be extra weights, to weight each spatial point with. W can either be a ClimArray with same spatial information as A, or having exactly same dimensions as A.

source
ClimateBase.hemispheric_meansFunction
hemispheric_means(A [,W]) → nh, sh

Return the (proper) averages of A over the northern and southern hemispheres. Notice that this function explicitly does both zonal as well as meridional averaging. Use hemispheric_functions to just split A into two hemispheres.

Optionally provide weights W that need to have the same structure as spaceagg.

source
ClimateBase.hemispheric_functionsFunction
hemispheric_functions(A::ClimArray) → north, south

Return two arrays north, south, by splitting A to its northern and southern hemispheres, appropriately translating the latitudes of south so that both arrays have the same latitudinal dimension (and thus can be compared and do opperations between them).

source
ClimateBase.tropics_extratropicsFunction
tropics_extratropics(A::ClimArray; lower_lat=30, higher_lat=90) → tropics, extratropics

Separate the given array into two arrays: one having latitudes ℓ ∈ [-lowerlat, +lowerlat], and one having [-higherlat:-lowerlat, lowerlat:higherlat].

source
ClimateBase.lonlatfirstFunction
lonlatfirst(A::ClimArray, args...) → B

Permute the dimensions of A to make a new array B that has first dimension longitude, second dimension latitude, with the remaining dimensions of A following (useful for most plotting functions). Optional extra dimensions can be given as args..., specifying a specific order for the remaining dimensions.

Example:

B = lonlatfirst(A)
C = lonlatfirst(A, Time)
source
ClimateBase.longitude_circshiftFunction
longitude_circshift(X::ClimArray [, l]; wrap = true) → Y::ClimArray

Perform the same action as Base.circshift, but only for the longitudinal dimension of X with shift l. If wrap = true the longitudes are wrapped to (-180, 180) degrees using the modulo operation.

If l is not given, it is as much as necessary so that all longitudes > 180 are wrapped.

source

Types of spatial information

Spatial information (excluding height/pressure dimensions) in ClimateBase.jl exists in one of two forms:

ClimateBase.CoordinateSpaceType

Space information is represented by a single dimension Coord, whose elements are coordinates, i.e. 2-element SVector(longitude, latitude). Each coordinate represents the center of an arbitrary polygon in space. The actual limits of each polygon are not included in the dimension for performance reasons.

In statistical functions such as spaceagg, it is assumed that entry of the coordinates covers an equal amount of area. If this is not the case, you can simply provide an additional weights vector which would correspond to the area covered.

This dimension also allows indexing by latitude ranges, e.g. you can do

A # some `ClimArray` with a `Coord` dimension
A[Coord(Lat(-30..30)))]

Most functions of ClimateBase.jl implicitly assume that the coordinates are sorted by latidude. You can achieve this with the following code:

A # some `ClimArray` with a `Coord` dimension
coords = gnv(dims(A, Coord))
si = sortperm(coords; by = reverse)
A = A[Coord(si)]

This is done automatically by ncread.

source

ClimateBase.jl works with either type of spatial coordinate system. Therefore, physically inspired averaging functions, like spacemean or zonalmean, work for both types of spatial coordinates. In addition, the function spatialidxs returns an iterator over the spatial coordinates of the data, and works for both types (grid or equal-area):

ClimateBase.spatialidxsFunction
spatialidxs(A::ClimArray) → idxs

Return an iterable that can be used to access all spatial points of A with the syntax

idxs = spatialidxs(A)
for i in idxs
    slice_at_give_space_point = A[i...]
end

Works for all types of space (... is necessary because i is a Tuple).

source

ncread tries to automatically deduce the correct space type and create the appropriate dimension.

General aggregation

The physical averages of the previous section are done by taking advantage of a general aggregation syntax, which works with any aggregating function like mean, sum, std, etc.

ClimateBase.dropaggFunction
dropagg(f, A::ClimArray, d [, W])

Apply statistics/aggregating function f (e.g. sum or mean) on array A across dimension(s) d and drop the corresponding dimension(s) from the result (Julia inherently keeps singleton dimensions).

If A is one dimensional, dropagg will return the single number of applying f(A).

Optionally you can provide statistical weights in the form of a W::ClimArray. W must either have same size as A or have only one dimension and satisfy length(W) == length(dims(A, d)) (i.e., a weight for each value of the dimension d). The latter case can only work when d is a single dimension. See also missing_weights for (properly) dealing with data that have missing values.

source
ClimateBase.collapseFunction
collapse(f, A, dim)

Reduce A towards dimension dim using the collapsing function f (e.g. mean). This means that f is applied across all other dimensions of A, each of which are subsequently dropped, leaving only the collapsed result of A vs. the remaining dimension.

source

Missing data

When loading an array with ncread, the values of the return array may contain missing values if the actual data contain missing values according to the CF-standards. In other packages or other programming languages these missing values are handled "internally" and e.g. in statistical operations like mean, the statistics explicitly skip over missing values. For example this is a typical workflow of creating an array, assigning missing to all values of an array over land, and then taking the mean of the array, which would be the "mean over ocean".

ClimateBase.jl does not follow this approach for two reasons: 1) it does not comply with Julia's missing propagation logic, 2) using proper statistical weights gives more power to the user. As you have already seen in the documentation strings of e.g. timeagg, spaceagg or dropagg, you can provide explicit statistical weights of various forms. This gives you more power, because in the case of missing your statistical weights can only be 0 (missing value) or 1 (non-missing value). As an example, "pixel" of your spatial grid will have ambiguous values if it is not 100% covered by ocean, and to do a proper average over ocean you should instead provide weights W whose value is quite simply the ocean fraction of each pixel.

But what if you already have an array with missing values and you want to do what was described in the beginning, e.g. average by skipping the missings? Do not worry, we have you covered! Use the function missing_weights! See also sinusoidal_continuation if the missing values are only in a subset of your temporal coverage.

ClimateBase.missing_weightsFunction
missing_weights(A::ClimArray, val = missing_val(A)) → B, W

Generate a new array B with values like A, but with A's missing values replaced with val. Also generate an array of weights, which has the value 0 when A had missing, and the value 1 otherwise.

The output of this function should be used in conjunction with any of ClimateBase.jl aggregating functions like spacemean, timemean, ..., when your data have missing values which you want to completely skip during the aggregation process.

This function returns A, nothing if A has no missing values.

source
ClimateBase.missing_valFunction
missing_val(A)

Return the value that represents "missing" data in A, according to A's metadata. If A does not have the _FillValue metadata, return 0 instead.

source