Depth in Functional space#
- class DepthFunc[source]
Functional Data-Depth
Return the depth of each sample w.r.t. a dataset, D(x,data) in a functional space, using a chosen depth notion.
Data depth computes the centrality (similarity, belongness) of a sample ‘x’ given a dataset ‘data’.
Notes
Possible depth notions are : mahalanobis, halfspace, zonoid, projection, aprojection, cexpchullstar, cexpchull, geometrical.
- For each discretization point i = 1, …, L:
Extract the data slice data[:, i, :] (shape: N_data x D)
Extract the query vector x[i, :] (shape: D)
Compute the multivariate depth of the query vector relative to the data slice
Average the results over all L time points
Methods
load_dataset([data, y, timestamp_col, ...])Load the dataset X for reference calculations.
projection_based_func_depth(query[, notion, ...])Compute projection-based functional depth for query functional data with respect to a reference dataset.
- load_dataset(data: DataFrame = None, y: ndarray = None, timestamp_col: str | int = 'timestamp', value_cols: str | list | int = 'value', case_id: str | int = 'case_id', interpolate_grid: bool = True, N_grid: int = 10, interpolation_type: str = 'linear')[source]
Load the dataset X for reference calculations. Depth is computed with respect to this dataset.
- Parameters:
data (dataframe or array_like.) – Dataset that will be used as base for depth computation
y (Ignored, default=None) – Not used, present for API consistency by convention.
timestamp_col (str|int, default = timestamp) – Column used for discretization of the dataset. timestamp_col can be a string indicating the name of the column or an integer for the position of the column. If the dataset is an array, timestamp_col must be an integer or it will not be considered and a new timestamp_col is created.
case_id (str | int, default= "case_id") – Column used to separate different functions. For pandas dataframe, it must be either the column name or its position. For numpy array, it must be eather an integer for the position or it wil be considered as the dimension separation.
value_cols (str|list, default = 'value') – Columns used for the multivariate depth computation. If value_cols is a string, columns with such word in the name are used. If value_cols is a list, it is considered the list of columns to be used. If data is an array, value_cols must be an integer or a list of integers, if not all columns that are not timestamp or case_id will be considered.
interpolate_grid (bool, default = True) – Interpolates the timestamp grid using an equaly spaced array from the minimum to the maximum timestamp value.
N_grid (int, default = 10) – Determines the number of grid points in the interpolation process.
- interpolation_typestr, default=’linear’
Interpolation method to use. Supported options are those provided by scipy.interpolate.interp1d, including: - ‘linear’ : linear interpolation (default) - ‘nearest’ : nearest-neighbor interpolation - ‘cubic’ : cubic spline interpolation - ‘quadratic’ : quadratic spline interpolation - ‘previous’, ‘next’ : stepwise interpolation
- Return type:
DepthFunc model
Notes
- For each discretization point i = 1, …, L:
Extract the data slice data[:, i, :] (shape: N_data x D)
Extract the query vector query_point[i, :] (shape: D)
Compute the multivariate depth of the query vector relative to the data slice
Average the results over all L time points
- projection_based_func_depth(query, notion='halfspace', solver='neldermead', NRandom=100, output_option: Literal['lowest_depth', 'final_depth_dir'] = 'lowest_depth', **kwargs)[source]
Compute projection-based functional depth for query functional data with respect to a reference dataset.
This function computes depth values of functional observations (in query) relative to a reference dataset (df) using projection-based methods such as halfspace depth. Each function (trajectory) is represented by a sequence of multivariate values over time.
- Parameters:
query (pandas.DataFrame) – Query dataset containing functional observations whose depth will be computed relative to df. Must have the same column structure as df.
notion ({"mahalanobis", "halfspace", "zonoid", "projection", "aprojection", "cexpchullstar", "cexpchull", "geometrical"}, default='halfspace') – Type of functional depth to compute. Currently supports ‘halfspace’, but can be extended to other projection-based depths.
solver (str, default='neldermead') – {‘simplegrid’, ‘refinedgrid’, ‘simplerandom’, ‘refinedrandom’, ‘coordinatedescent’, ‘randomsimplices’, ‘neldermead’, ‘simulatedannealing’}, The type of solver used to approximate the depth. Optimization solver used within the internal depth computation.
NRandom (int, default=100) – Number of random projections or optimization restarts used in computing projection-based depth.
notion – {“mahalanobis”, “halfspace”, “zonoid”, “projection”, “aprojection”, “cexpchullstar”, “cexpchull”, “geometrical”}, Which depth will be computed.
n_refinements – For
solver=refinedrandomorrefinedgrid, set the maximum of iteration for computing the depth of one point.sphcap_shrink – For
solver=refinedrandomor refinedgrid, it’s the shrinking of the spherical cap.alpha_Dirichlet – For
solver=randomsimplices. it’s the parameter of the Dirichlet distribution.cooling_factor – For
solver=randomsimplices, it’s the cooling factor.cap_size – For
solver=simulatedannealingorneldermead, it’s the size of the spherical cap.start – {‘mean’, ‘random’}, For
solver=simulatedannealingorneldermead, it’s the method used to compute the first depth.space – {‘sphere’, ‘euclidean’}, For
solver=coordinatedescentorneldermead, it’s the type of spacecin which the solver is running.line_solver – {‘uniform’, ‘goldensection’}, For
solver=coordinatedescent, it’s the line searh strategy used by this solver.bound_gc – For
solver=neldermead, it’sTrueif the search is limited to the closed hemisphere.
- Returns:
depth_array (np.ndarray of shape (n_query,))
Array of depth values, where n_query is the number of functional observations
(unique case_id`s) in the `query dataset.
The first return is the lowest comuted depth regarding all explored directions in space.
The second return is the direction that best represents the analyzed point, the direction corresponfing to the lowest depth. –
- If
output_option=="lowest_depth"returns: - array_like
Lowest Asymmetrical Projection Detph
- If
output_option=="final_depth_dir"returns: - Tuple of array_like
Lowest Asymmetrical Projection Detph
Lowest depth respective sirection
- If
Notes
If timestamp is of type datetime64, it is converted internally to seconds
relative to the global minimum timestamp (t_min). - Duplicate timestamps within each case_id group are automatically dropped. - Interpolation uses linear extrapolation outside the observed time range.