Statistics
Temporal
Functions related with the Time
dimension.
ClimateBase.timemean
— Functiontimemean(A::ClimArray [, w]) = timeagg(mean, A, w)
Temporal average of A
, see timeagg
.
ClimateBase.timeagg
— Functiontimeagg(f, A::ClimArray, W = nothing)
Perform a proper temporal aggregation of the function f
(e.g. mean, std
) on A
where:
- Only full year spans of
A
are included, seemaxyearspan
(because most processes are affected by yearly cycle, and averaging over an uneven number of cycles typically results in artifacts) - Each month in
A
is weighted with its length in days (for monthly sampled data)
If you don't want these features, just do dropagg
(f, A, Time, W)
. This is also done in the case where the time sampling is unknown.
W
are possible statistical weights that are used in conjuction to the temporal weighting, to weight each time point differently. If they are not a vector (a weight for each time point), then they have to be a dimensional array of same dimensional layout as A
(a weight for each data point).
See also monthlyagg
, yearlyagg
, seasonalyagg
.
timeagg(f, t::Vector, x::Vector, w = nothing)
Same as above, but for arbitrary vector x
accompanied by time vector t
.
ClimateBase.monthlyagg
— Functionmonthlyagg(A::ClimArray, f = mean; mday = 15) -> B
Create a new array where the temporal information has been aggregated into months using the function f
. The dates of the new array always have day number of mday
.
ClimateBase.yearlyagg
— Functionyearlyagg(A::ClimArray, f = mean) -> B
Create a new array where the temporal information has been aggregated into years using the function f
. By convention, the dates of the new array always have month and day number of 1
.
ClimateBase.seasonalyagg
— Functionseasonalyagg(A::ClimArray, f = mean) -> B
Create a new array where the temporal information has been aggregated into seasons using the function f
. By convention, seasons are represented as Dates spaced 3-months apart, where only the months December, March, June and September are used to specify the date, with day 1.
ClimateBase.temporalranges
— Functiontemporalranges(A::ClimArray, f = Dates.month) → r
temporalranges(t::AbstractVector{<:TimeType}}, f = Dates.month) → r
Return a vector of ranges so that each range of indices are values of t
that belong in either the same month, year, day, or season, depending on f
. f
can take the values: Dates.year, Dates.month, Dates.day
or season
(all are functions).
Used in e.g. monthlyagg
, yearlyagg
or seasonalyagg
.
ClimateBase.maxyearspan
— Functionmaxyearspan(A::ClimArray) = maxyearspan(dims(A, Time))
maxyearspan(t::Vector{<:DateTime}) → i
Find the maximum index i
of t
so that t[1:i]
covers exact(*) multiples of years.
(*) For monthly spaced data i
is a multiple of 12
while for daily data we find the largest possible multiple of DAYS_IN_ORBIT
.
ClimateBase.temporal_sampling
— Functiontemporal_sampling(x) → symbol
Return the temporal sampling type of x
, which is either an array of Date
s or a dimensional array (with Time
dimension).
Possible return values are:
:hourly
, where the temporal difference between successive entries is exactly 1 hour.:daily
, where the temporal difference between successive entries is exactly 1 day.:monthly
, where all dates have the same day, but different month.:yearly
, where all dates have the same month and day, but different year.:other
, which means thatx
doesn't fall to any of the above categories.
ClimateBase.realtime_days
— Functionrealtime_days(t::AbstractVector{<:TimeType}, T = Float32)
Convert the given sequential date time vector t
in a vector in a format of "real time", where time is represented by real numbers, increasing cumulatively, as is the case when representing a timeseries x(t)
. As only differences matter in this form, the returned vector always starts from 0. The measurement unit of time here is days.
For temporal sampling less than daily return realtime_milliseconds(t) ./ (24*60*60*1000)
.
Example:
julia> t = Date(2004):Month(1):Date(2004, 6)
Date("2004-01-01"):Month(1):Date("2004-06-01")
julia> realtime_days(t)
6-element Vector{Float32}:
0.0
29.0
60.0
90.0
121.0
151.0
ClimateBase.realtime_milliseconds
— Functionrealtime_milliseconds(t::AbstractArray{<:TimeType}, T = Float64)
Similar with realtime_days
, but now the measurement unit is millisecond. For extra accuracy, direct differences in t
are used.
ClimateBase.seasonality
— Functionseasonality(t, x; y0 = year(t[1])) → dates, vals
Calculate the "seasonality" of a vector x
defined with respect to a datetime vector t
and return dates, vals
. dates
are all unique dates present in t
disregarding the year (so only the month and day are compared). The dates
have as year entry y0
. vals
is a vector of vectors, where vals[i]
are all the values of x
that have day and month same as dates[i]
. The elements of vals
are sorted as encountered in x
.
Typically one is interested in mean.(vals)
, which actually is the seasonality, and std.(vals)
which is the interannual variability at each date.
seasonality(A::ClimArray) → dates, vals
If given a ClimArray
, then the array must have only one dimension (time).
ClimateBase.sametimespan
— Functionsametimespan(Xs; mintime = nothing, maxtime = nothing) → Ys
Given a container of ClimArray
s, return the same ClimArray
s but now accessed in the Time
dimension so that they all span the same time interval. Also works for dictionaries with values ClimArray
s.
Optionally you can provide Date
or DateTime
values for the keywords mintime, maxtime
that can further limit the minimum/maximum time span accessed.
sametimespan
takes into consideration the temporal sampling of the arrays for better accuracy.
Spatial
All functions in this section work for both types of space, see Types of spatial information.
ClimateBase.zonalmean
— Functionzonalmean(A::ClimArray [, W])
Return the zonal mean of A
. Works for both OrthogonalSpace
as well as CoordinateSpace
. Optionally provide statistical weights W
. These can be the same size
as A
or only having the same latitude structure as A
.
ClimateBase.latmean
— Functionlatmean(A::ClimArray)
Return the latitude-mean A
(mean across dimension Lat
). This function properly weights by the cosine of the latitude.
ClimateBase.spacemean
— Functionspacemean(A::ClimArray [, W]) = spaceagg(mean, A, W)
Average A
over its spatial coordinates. Optionally provide statistical weights in W
.
ClimateBase.spaceagg
— Functionspaceagg(f, A::ClimArray, W = nothing)
Aggregate A
using function f
(e.g. mean, std
) over all available space (i.e. longitude and latitude) of A
, weighting every part of A
by its spatial area.
W
can be extra weights, to weight each spatial point with. W
can either be a ClimArray
with same spatial information as A
, or having exactly same dimensions as A
.
ClimateBase.hemispheric_means
— Functionhemispheric_means(A [,W]) → nh, sh
Return the (proper) averages of A
over the northern and southern hemispheres. Notice that this function explicitly does both zonal as well as meridional averaging. Use hemispheric_functions
to just split A
into two hemispheres.
Optionally provide weights W
that need to have the same structure as spaceagg
.
ClimateBase.hemispheric_functions
— Functionhemispheric_functions(A::ClimArray) → north, south
Return two arrays north, south
, by splitting A
to its northern and southern hemispheres, appropriately translating the latitudes of south
so that both arrays have the same latitudinal dimension (and thus can be compared and do opperations between them).
ClimateBase.tropics_extratropics
— Functiontropics_extratropics(A::ClimArray; lower_lat=30, higher_lat=90) → tropics, extratropics
Separate the given array into two arrays: one having latitudes ℓ ∈ [-lowerlat, +lowerlat], and one having [-higherlat:-lowerlat, lowerlat:higherlat].
ClimateBase.lonlatfirst
— Functionlonlatfirst(A::ClimArray, args...) → B
Permute the dimensions of A
to make a new array B
that has first dimension longitude, second dimension latitude, with the remaining dimensions of A
following (useful for most plotting functions). Optional extra dimensions can be given as args...
, specifying a specific order for the remaining dimensions.
Example:
B = lonlatfirst(A)
C = lonlatfirst(A, Time)
ClimateBase.longitude_circshift
— Functionlongitude_circshift(X::ClimArray [, l]; wrap = true) → Y::ClimArray
Perform the same action as Base.circshift
, but only for the longitudinal dimension of X
with shift l
. If wrap = true
the longitudes are wrapped to (-180, 180) degrees using the modulo operation.
If l
is not given, it is as much as necessary so that all longitudes > 180 are wrapped.
Types of spatial information
Spatial information (excluding height/pressure dimensions) in ClimateBase.jl exists in one of two forms:
ClimateBase.OrthogonalSpace
— TypeSpace information is represented by two orthogonal dimensions Lon, Lat
, one being longitude and the other being latitude.
ClimateBase.CoordinateSpace
— TypeSpace information is represented by a single dimension Coord
, whose elements are coordinates, i.e. 2-element SVector(longitude, latitude)
. Each coordinate represents the center of an arbitrary polygon in space. The actual limits of each polygon are not included in the dimension for performance reasons.
In statistical functions such as spaceagg
, it is assumed that entry of the coordinates covers an equal amount of area. If this is not the case, you can simply provide an additional weights vector which would correspond to the area covered.
This dimension also allows indexing by latitude ranges, e.g. you can do
A # some `ClimArray` with a `Coord` dimension
A[Coord(Lat(-30..30)))]
Most functions of ClimateBase.jl implicitly assume that the coordinates are sorted by latidude. You can achieve this with the following code:
A # some `ClimArray` with a `Coord` dimension
coords = gnv(dims(A, Coord))
si = sortperm(coords; by = reverse)
A = A[Coord(si)]
This is done automatically by ncread
.
ClimateBase.jl works with either type of spatial coordinate system. Therefore, physically inspired averaging functions, like spacemean
or zonalmean
, work for both types of spatial coordinates. In addition, the function spatialidxs
returns an iterator over the spatial coordinates of the data, and works for both types (grid or equal-area):
ClimateBase.spatialidxs
— Functionspatialidxs(A::ClimArray) → idxs
Return an iterable that can be used to access all spatial points of A
with the syntax
idxs = spatialidxs(A)
for i in idxs
slice_at_give_space_point = A[i...]
end
Works for all types of space (...
is necessary because i
is a Tuple
).
ncread
tries to automatically deduce the correct space type and create the appropriate dimension.
General aggregation
The physical averages of the previous section are done by taking advantage of a general aggregation syntax, which works with any aggregating function like mean, sum, std
, etc.
ClimateBase.dropagg
— Functiondropagg(f, A::ClimArray, d [, W])
Apply statistics/aggregating function f
(e.g. sum
or mean
) on array A
across dimension(s) d
and drop the corresponding dimension(s) from the result (Julia inherently keeps singleton dimensions).
If A
is one dimensional, dropagg
will return the single number of applying f(A)
.
Optionally you can provide statistical weights in the form of a W::ClimArray
. W
must either have same size as A
or have only one dimension and satisfy length(W) == length(dims(A, d))
(i.e., a weight for each value of the dimension d
). The latter case can only work when d
is a single dimension. See also missing_weights
for (properly) dealing with data that have missing
values.
ClimateBase.collapse
— Functioncollapse(f, A, dim)
Reduce A
towards dimension dim
using the collapsing function f
(e.g. mean
). This means that f
is applied across all other dimensions of A
, each of which are subsequently dropped, leaving only the collapsed result of A
vs. the remaining dimension.
Missing data
When loading an array with ncread
, the values of the return array may contain missing values if the actual data contain missing values according to the CF-standards. In other packages or other programming languages these missing values are handled "internally" and e.g. in statistical operations like mean
, the statistics explicitly skip over missing values. For example this is a typical workflow of creating an array, assigning missing
to all values of an array over land, and then taking the mean
of the array, which would be the "mean over ocean".
ClimateBase.jl does not follow this approach for two reasons: 1) it does not comply with Julia's missing
propagation logic, 2) using proper statistical weights gives more power to the user. As you have already seen in the documentation strings of e.g. timeagg
, spaceagg
or dropagg
, you can provide explicit statistical weights of various forms. This gives you more power, because in the case of missing
your statistical weights can only be 0 (missing value) or 1 (non-missing value). As an example, "pixel" of your spatial grid will have ambiguous values if it is not 100% covered by ocean, and to do a proper average over ocean you should instead provide weights W
whose value is quite simply the ocean fraction of each pixel.
But what if you already have an array with missing
values and you want to do what was described in the beginning, e.g. average by skipping the missings? Do not worry, we have you covered! Use the function missing_weights
! See also sinusoidal_continuation
if the missing values are only in a subset of your temporal coverage.
ClimateBase.missing_weights
— Functionmissing_weights(A::ClimArray, val = missing_val(A)) → B, W
Generate a new array B
with values like A
, but with A
's missing
values replaced with val
. Also generate an array of weights, which has the value 0 when A
had missing
, and the value 1
otherwise.
The output of this function should be used in conjunction with any of ClimateBase.jl aggregating functions like spacemean, timemean, ...
, when your data have missing
values which you want to completely skip during the aggregation process.
This function returns A, nothing
if A
has no missing
values.
ClimateBase.missing_val
— Functionmissing_val(A)
Return the value that represents "missing" data in A
, according to A
's metadata. If A
does not have the _FillValue
metadata, return 0 instead.