# Datasets

## Importing a NetCDF dataset

The entry point of ClimateTools is to load data with the load function. The return structure of the load function is a in-memory representation of the variable contained in the netCDF file.

For a single-file dataset.

C = load(filename::String, vari::String; poly::Array, data_units::String, start_date::Tuple, end_date::Tuple, dimension::Bool=true)

or, for a multi-file datasets.

C = load(filename::Array{String}, vari::String; poly::Array, data_units::String, start_date::Tuple, end_date::Tuple, dimension::Bool=true)

load return a ClimGrid type. The ClimGrid represent a single variable. By default, the function tries to attach physical units to the data array by using the Unitful.jl package. The advantage behind physical units is that one can subtract a ClimGrid with Kelvin unit with a ClimGrid with Celsius unit and get coherent results. Be warned that some operations on some units are not allowed (you cannot "add" Celsius for instance). In the event that a user wants to do some calculations without physical logic, it is possible to load the dataset without the units by specifying dimension=false argument.

Using the optional poly argument, the user can provide a polygon and the returned ClimGrid will only contains the grid points inside the provided polygon. The polygon provided should be in the -180, +180 longitude format. If the polygon crosses the International Date Line, the polygon should be splitted in multiple parts (i.e. multi-polygons).

start_date and end_date can also be provided. It is useful when climate simulations file spans multiple decades/centuries and only a temporal subset is needed. Dates should be provided as a Tuple of the form (year, month, day, hour, minute, seconds), where only year is mandatory (e.g. (2000,) can be provided and will defaults to (2000, 01, 01)).

For some variable, the optional keyword argument data_units can be provided. For example, precipitation in climate models are usually provided as kg/m^2/s. By specifying data_units = mm, the load function returns accumulation at the data time resolution. Similarly, the user can provide Celsius as data_units and load will return Celsius instead of Kelvin.

struct ClimGrid
data::AxisArray # Data
longrid::AbstractArray{N,2} where N # the longitude grid
latgrid::AbstractArray{N,2} where N # the latitude grid
msk::Array{N, 2} where N # Data mask (NaNs and 1.0)
grid_mapping::Dict#{String, Any} # bindings for native grid
dimension_dict::Dict
model::String
frequency::String # Day, month, years
experiment::String # Historical, RCP4.5, RCP8.5, etc.
run::String
project::String # CORDEX, CMIP5, etc.
institute::String # UQAM, DMI, etc.
filename::String # Path of the original file
dataunits::String # Celsius, kelvin, etc.
latunits::String # latitude coordinate unit
lonunits::String # longitude coordinate unit
variable::String # Type of variable (i.e. can be the same as "typeofvar", but it is changed when calculating indices)
typeofvar::String # Variable type (e.g. tasmax, tasmin, pr)
typeofcal::String # Calendar type
timeattrib::Dict # Time attributes (e.g. days since ... )
varattribs::Dict # Variable attributes dictionary
globalattribs::Dict # Global attributes dictionary
end

## Manipulations

Once the data is loaded in a ClimGrid struct, options to further subset the data are available.

### Spatial

spatialsubset function acts on ClimGrid type and subset the data through a spatial subset using a provided polygon. The function returns a ClimGrid. Polygons needs to be on a -180, +180 longitude coordinates, as data coordinates defaults to such grid. For instance, global models are often on a 0-360 degrees grid but the load function shift the data onto a -180,+180 coordinates.

C = spatialsubset(C::ClimGrid, poly:Array{N, 2} where N)

### Temporal

Temporal subset of the data is also possible with the temporalsubset function:

C = temporalsubset(C::ClimGrid, startdate::Tuple, enddate::Tuple)

### Discontinuous temporal (e.g. resampling)

It is also possible to only keep a given non-continuous period for a given timeframe. For example, we might be interested in keeping only northern summer months (June-July-August) from a continuous ClimGrid covering 1961-2100. resample returns such a subsetted ClimGrid.

Csub = resample(C, "JJA") # hardcoded ClimateTools's season
Csub = resample(C, 6, 8) # custom subset example for June-July-August
Csub = resample(C, 1, 2) # custom subset example for January-February