Depth in Euclidean space (multivariate depth)#
- class DepthEucl[source]
Statistical data depth.
Return the depth of each sample w.r.t. a dataset, D(x,data), using a chosen depth notion.
Data depth computes the centrality (similarity, belongness) of a sample ‘x’ given a dataset ‘data.
- Parameters:
data ({array-like} of shape (n,d).) – Reference dataset to compute the depth of a sample x
x ({array-like} of shape (n_samples,d).) – Samples matrix to compute depth
exact (bool, delfaut=True) – Whether the depth computation is exact.
mah_estimate (str, {"moment", "mcd"}, default="moment") – Specifying which estimates to use when calculating the depth
mah_parMcd (float, default=0.75) – Value of the argument alpha for the function covMcd
solver (str, default="neldermead") – The type of solver used to approximate the depth.
NRandom (int, default=1000) – Total number of directions used for approximate depth
n_refinements (int, default = 10) – Number of iterations used to approximate the depth For
solver='refinedrandom'
or'refinedgrid'
sphcap_shrink (float, default = 0.5) – For
solver
=refinedrandom
or refinedgrid, it’s the shrinking of the spherical cap.alpha_Dirichlet (float, default = 1.25) – For
solver
=randomsimplices
. it’s the parameter of the Dirichlet distribution.cooling_factor (float, default = 0.95) – For
solver
=randomsimplices
, it’s the cooling factor.cap_size (int | float, default = 1) – For
solver
=simulatedannealing
orneldermead
, it’s the size of the spherical cap.start (str {'mean', 'random'}, default = mean) – For
solver
=simulatedannealing
orneldermead
, it’s the method used to compute the first depth.space (str {'sphere', 'euclidean'}, default = sphere) – For
solver
=coordinatedescent
orneldermead
, it’s the type of spacecin whichline_solver (str {'uniform', 'goldensection'}, default = goldensection) – For
solver
=coordinatedescent
, it’s the line searh strategy used by this solver.bound_gc (bool, default = True) – For
solver
=neldermead
, it’sTrue
if the search is limited to the closed hemispheroutput_option (str {"lowest_depth","final_depht_dir","all_depth","all_depth_directions}, default = final_depht_dir) – Determines what will be computated alongside with the final depth
evaluate_dataset (bool, default=False,) – Boolean to determine if the loaded dataset will be evaluate
- data
Returns loaded dataset
- Type:
{array-like}, default=None,
- {depth-name}Depth
Returns the computed depth using {depth-name} notion. Available for all depth notions. Example: halfspaceDepth, projectionDepth
- Type:
{array-like}, default=None,
- {depth-name}Dir
Returns the directoion whose {depth-name}Depth corresponds using {depth-name} notion. Available only for projection-based depths. Example: halfspaceDir, projectionDir
- Type:
{array-like}, default=None,
- {depth-name}DepthDS
Returns the computed depth of the loaded dataset using {depth-name} notion. Available for all depth notions. Example: halfspaceDepthDS, projectionDepthDS
- Type:
{array-like}, default=None,
- {depth-name}DirDS
Returns the directoion whose {depth-name}DepthDS corresponds using {depth-name} notion. Available only for projection-based depths. Example: halfspaceDirDS, projectionDirDS
- Type:
{array-like}, default=None,
Methods
L2
([x, mah_estimate, mah_parMcd, ...])Calculates the L2-depth of points w.r.t.
aprojection
([x, solver, NRandom, ...])Calculates approximately the asymmetric projection depth of points w.r.t.
betaSkeleton
([x, beta, distance, Lp_p, ...])Calculates the beta-skeleton depth of points w.r.t.
cexpchull
([x, solver, NRandom, ...])Calculates approximately the continuous explected convex hull depth of points w.r.t.
cexpchullstar
([x, solver, NRandom, option, ...])Calculates approximately the continuous modified explected convex hull depth of points w.r.t.
change_dataset
(newDataset[, newY, ...])Modify loaded dataset.
computeMCD
([mat, h, mfull, nstep, ...])Compute Minimum Covariance Determinant (MCD)
geometrical
([x, solver, NRandom, ...])Calculates approximately the geometrical depth of points w.r.t.
halfspace
([x, exact, method, solver, ...])Calculates the exact and approximated Tukey (=halfspace, location) depth (Tukey, 1975) of points w.r.t.
load_dataset
([data, distribution, CUDA, y])Load the dataset X for reference calculations.
mahalanobis
([x, exact, mah_estimate, ...])Calculates the Mahalanobis depth of points w.r.t.
potential
([x, pretransform, kernel, ...])Calculate the potential of the points w.r.t.
projection
([x, solver, NRandom, ...])Calculates approximately the projection depth of points w.r.t.
qhpeeling
([x, evaluate_dataset])Calculates the convex hull peeling depth of points w.r.t.
set_seed
([seed])Set seed for computation
simplicial
([x, exact, k, evaluate_dataset])Calculates the simplicial depth of points w.r.t.
simplicialVolume
([x, exact, k, ...])Calculates the simpicial volume depth of points w.r.t.
spatial
([x, mah_estimate, mah_parMcd, ...])Calculates the spatial depth of points w.r.t.
zonoid
([x, exact, solver, NRandom, ...])Calculates the zonoid depth of points w.r.t.
- L2(x: ndarray | None = None, mah_estimate: str = 'moment', mah_parMcd: float = 0.75, evaluate_dataset: bool = False) ndarray [source]
Calculates the L2-depth of points w.r.t. a multivariate data set.
- Arguments
- x: array-like or None, default=None
Matrix of objects (numerical vector as one object) whose depth is to be calculated; each row contains a d-variate point. Should have the same dimension as data.
- mah_estimatestr, {“moment”, “mcd”}, default=”moment”
A character string specifying which estimates to use when calculating the Mahalanobis depth; can be “‘moment’” or
'MCD'
, determining whether traditional moment or Minimum Covariance Determinant (MCD) estimates for mean and covariance are used.- mah_parMcdfloat, default=0.75
is the value of the argument alpha for the function covMcd; is used when mah.estimate =
'MCD'
.- evaluate_datasetbool, default=False
Determines if dataset loaded will be evaluated. Automatically sets x to dataset
- References
Zuo, Y. and Serfling, R. (2000). General notions of statistical depth function. The Annals of Statistics, 28, 461–482.
Mosler, K. and Mozharovskyi, P. (2022). Choosing among notions of multivariate depth statistics. Statistical Science, 37(3), 348-368.
- Examples
>>> import numpy as np >>> from depth.model import DepthEucl >>> mat1=[[1, 0, 0, 0, 0],[0, 2, 0, 0, 0],[0, 0, 3, 0, 0],[0, 0, 0, 2, 0],[0, 0, 0, 0, 1]] >>> mat2=[[1, 0, 0, 0, 0],[0, 1, 0, 0, 0],[0, 0, 1, 0, 0],[0, 0, 0, 1, 0],[0, 0, 0, 0, 1]] >>> data = np.random.multivariate_normal([0,0,0,0,0], mat1, 1000) >>> model=DepthEucl().load_dataset(data) >>> x = np.random.multivariate_normal([1,1,1,1,1], mat2, 10) >>> model.L2(x) [0.2867197 0.19718391 0.18896649 0.24623271 0.20979579 0.22055673 0.20396566 0.20779032 0.24901829 0.26734192]
- aprojection(x: ndarray | None = None, solver: str = 'neldermead', NRandom: int = 1000, n_refinements: int = 10, sphcap_shrink: float = 0.5, alpha_Dirichlet: float = 1.25, cooling_factor: float = 0.95, cap_size: int = 1, start: str = 'mean', space: str = 'sphere', line_solver: str = 'goldensection', bound_gc: bool = True, output_option: Literal['lowest_depth', 'final_depht_dir', 'all_depth', 'all_depth_directions'] = 'final_depht_dir', evaluate_dataset: bool = False, CUDA: bool = False) ndarray [source]
Calculates approximately the asymmetric projection depth of points w.r.t. a multivariate data set.
- Arguments
- x: array-like or None, default=None
Matrix of objects (numerical vector as one object) whose depth is to be calculated; each row contains a d-variate point. Should have the same dimension as data.
- solverstr {
'simplegrid'
,'refinedgrid'
,'simplerandom'
,'refinedrandom'
,'coordinatedescent'
,'randomsimplices'
,'neldermead'
,'simulatedannealing'
}, default=”neldermead” The type of solver used to approximate the depth.
- NRandomint, default=1000
The total number of iterations to compute the depth. Some solvers are converging faster so they are run several time to achieve
NRandom
iterations.- n_refinementsint, default = 10
Set the maximum of iteration for computing the depth of one point. For
solver='refinedrandom'
or'refinedgrid'
.- sphcap_shrinkfloat, default = 0.5
It’s the shrinking of the spherical cap. For
solver='refinedrandom'
or'refinedgrid'
.- alpha_Dirichletfloat, default = 1.25
It’s the parameter of the Dirichlet distribution. For
solver='randomsimplices'
.- cooling_factorfloat, default = 0.95
It’s the cooling factor. For
solver='simulatedannealing'
.- cap_sizeint | float, default = 1
It’s the size of the spherical cap. For
solver='simulatedannealing'
or'neldermead'
.- startstr {‘mean’, ‘random’}, default = mean
For
solver='simulatedannealing'
or'neldermead'
, it’s the method used to compute the first depth.- spacestr {‘sphere’, ‘euclidean’}, default = ‘sphere’
For
solver='coordinatedescent'
or'neldermead'
, it’s the type of spacecin which the solver is running.- line_solverstr {‘uniform’, ‘goldensection’}, default = goldensection
For
solver='coordinatedescent'
, it’s the line searh strategy used by this solver.- bound_gcbool, default = True
For
solver='neldermead'
, it’sTrue
if the search is limited to the closed hemisphere.- output_optionstr {“lowest_depth”,”final_depht_dir”,”all_depth”,”all_depth_directions}, default = final_depht_dir
Determines what will be computated alongside with the final depth | If
output_option=lowest_depth
, only approximated depths are returned. | Ifoutput_option=final_depht_dir
, best directions to approximate depths are also returned. | Ifoutput_option=all_depth
, depths calculated at every iteration are also returned. | Ifoutput_option=all_depth_directions
, random directions used to project depths are also returned with indices of converging for the solver selected.- evaluate_datasetbool, default=False
Determines if dataset loaded will be evaluated. Automatically sets x to dataset
- References
Dyckerhoff, R. (2004). Data depths satisfying the projection property. Allgemeines Statistisches Archiv, 88, 163–190.
Dyckerhoff, R., Mozharovskyi, P., and Nagy, S. (2021). Approximate computation of projection depths. Computational Statistics and Data Analysis, 157, 107166.
- Examples
>>> import numpy as np >>> from depth.model import DepthEucl >>> np.random.seed(0) >>> mat1=[[1, 0, 0, 0, 0],[0, 2, 0, 0, 0],[0, 0, 3, 0, 0],[0, 0, 0, 2, 0],[0, 0, 0, 0, 1]] >>> mat2=[[1, 0, 0, 0, 0],[0, 1, 0, 0, 0],[0, 0, 1, 0, 0],[0, 0, 0, 1, 0],[0, 0, 0, 0, 1]] >>> x = np.random.multivariate_normal([1,1,1,1,1], mat2, 10) >>> data = np.random.multivariate_normal([0,0,0,0,0], mat1, 100) >>> model = DepthEucl().load_dataset(data) >>> model.aprojection(x, NRandom=1000) [0.090223 0.19577999 0.15769263 0.20123535 0.10375507 0.14635662 0.20611053 0.17846703 0.19801984 0.23230606]
- betaSkeleton(x: ndarray | None = None, beta: int = 2, distance: str = 'Lp', Lp_p: int = 2, mah_estimate: str = 'moment', mah_parMcd: float = 0.75, evaluate_dataset: bool = False) ndarray [source]
Calculates the beta-skeleton depth of points w.r.t. a multivariate data set.
- Arguments
- x{array-like} of shape (n_samples,d).
Matrix of objects (numerical vector as one object) whose depth is to be calculated. Each row contains a d-variate point and should have the same dimension as data.
beta : int, default=2 The parameter defining the positionning of the balls’ centers, see `Yang and Modarres (2017)`_ for details. By default (together with other arguments) equals
2
, which corresponds to the lens depth, see Liu and Modarres (2011).distance : str, default=’Lp’ A character string defining the distance to be used for determining inclusion of a point into the lens (influence region), see Yang and Modarres (2017) for details. Possibilities are
'Lp'
for the Lp-metric (default) or'Mahalanobis'
for the Mahalanobis distance adjustment.- Lp_pint, default=2
A non-negative number defining the distance’s power equal
2
by default (Euclidean distance) is used only whendistance='Lp'
.
- mah_estimatestr, {“moment”, “mcd”}, default=”moment”
A character string specifying which estimates to use when calculating sample covariance matrix; can be
'none'
,'moment'
or'MCD'
, determining whether traditional moment or Minimum Covariance Determinant (MCD) estimates for mean and covariance are used. By default'moment'
is used. Is used only whendistance='Mahalanobis'
.- mah_parMcdfloat, default=0.75
The value of the argument alpha for Minimum Covariance Determinant (MCD); is used when
distance='Mahalanobis'
andmah.estimate='MCD'
.- evaluate_datasetbool, default=False
Determines if dataset loaded will be evaluated. Automatically sets x to dataset
- References
Elmore, R. T., Hettmansperger, T. P. and Xuan, F. (2006). Spherical data depth and a multivariate median. In R. Y. Lui, R. Serfling, and D. L. Souvaine, (Eds.), Data Depth: Robust Multivariate Analysis, Computational Geometry and Applications, DIMACS Series Discrete Mathematics and Theoretical Computer Science, 72, American Mathematical Society, Providence, RI, 87–101.
Liu, Z. and Modarres, R. (2011). Lens data depth and median. Journal of Nonparametric Statistics, 23, 1063–1074.
Kleindessner, M. and Von Luxburg, U. (2017). Lens depth function and k-relative neighborhood graph: Versatile tools for ordinal data analysis. Journal of Machine Learning Research, 18, 58, 52.
Yang, M. and Modarres, R. (2018). \({\beta}\)-skeleton depth functions and medians. Communications in Statistics - Theory and Methods, 47, 5127–5143.
- Examples
>>> import numpy as np >>> from depth.model import DepthEucl >>> mat1=[[1, 0, 0, 0, 0],[0, 2, 0, 0, 0],[0, 0, 3, 0, 0],[0, 0, 0, 2, 0],[0, 0, 0, 0, 1]] >>> mat2=[[1, 0, 0, 0, 0],[0, 1, 0, 0, 0],[0, 0, 1, 0, 0],[0, 0, 0, 1, 0],[0, 0, 0, 0, 1]] >>> x = np.random.multivariate_normal([1,1,1,1,1], mat2, 10) >>> data = np.random.multivariate_normal([0,0,0,0,0], mat1, 1000) >>> model=DepthEucl().load_dataset(data) >>> model.BetaSkeleton(x) [0.16467668 0.336002 0.43702102 0.25827828 0.4204044 0.46894895 0.27825225 0.11572372 0.4663003 0.18778579]
- cexpchull(x: ndarray | None = None, solver: str = 'neldermead', NRandom: int = 1000, n_refinements: int = 10, sphcap_shrink: float = 0.5, alpha_Dirichlet: float = 1.25, cooling_factor: float = 0.95, cap_size: int | float = 1, start: str = 'mean', space: str = 'sphere', line_solver: str = 'goldensection', bound_gc: bool = True, output_option: Literal['lowest_depth', 'final_depht_dir', 'all_depth', 'all_depth_directions'] = 'final_depht_dir', evaluate_dataset: bool = False) ndarray [source]
Calculates approximately the continuous explected convex hull depth of points w.r.t. a multivariate data set.
- Arguments
- x: array-like or None, default=None
Matrix of objects (numerical vector as one object) whose depth is to be calculated; each row contains a d-variate point. Should have the same dimension as data.
- solverstr {
'simplegrid'
,'refinedgrid'
,'simplerandom'
,'refinedrandom'
,'coordinatedescent'
,'randomsimplices'
,'neldermead'
,'simulatedannealing'
}, default=”neldermead” The type of solver used to approximate the depth.
- NRandomint, default=1000
The total number of iterations to compute the depth. Some solvers are converging faster so they are run several time to achieve
NRandom
iterations.- n_refinementsint, default = 10
Set the maximum of iteration for computing the depth of one point. For
solver='refinedrandom'
or'refinedgrid'
.- sphcap_shrinkfloat, default = 0.5
It’s the shrinking of the spherical cap. For
solver='refinedrandom'
or'refinedgrid'
.- alpha_Dirichletfloat, default = 1.25
It’s the parameter of the Dirichlet distribution. For
solver='randomsimplices'
.- cooling_factorfloat, default = 0.95
It’s the cooling factor. For
solver='simulatedannealing'
.- cap_sizeint | float, default = 1
It’s the size of the spherical cap. For
solver='simulatedannealing'
or'neldermead'
.- startstr {‘mean’, ‘random’}, default = mean
For
solver='simulatedannealing'
or'neldermead'
, it’s the method used to compute the first depth.- spacestr {‘sphere’, ‘euclidean’}, default = ‘sphere’
For
solver='coordinatedescent'
or'neldermead'
, it’s the type of spacecin which the solver is running.- line_solverstr {‘uniform’, ‘goldensection’}, default = goldensection
For
solver='coordinatedescent'
, it’s the line searh strategy used by this solver.- bound_gcbool, default = True
For
solver='neldermead'
, it’sTrue
if the search is limited to the closed hemisphere.- output_optionstr {“lowest_depth”,”final_depht_dir”,”all_depth”,”all_depth_directions}, default = final_depht_dir
Determines what will be computated alongside with the final depth | If
output_option=lowest_depth
, only approximated depths are returned. | Ifoutput_option=final_depht_dir
, best directions to approximate depths are also returned. | Ifoutput_option=all_depth
, depths calculated at every iteration are also returned. | Ifoutput_option=all_depth_directions
, random directions used to project depths are also returned with indices of converging for the solver selected.- evaluate_datasetbool, default=False
Determines if dataset loaded will be evaluated. Automatically sets x to dataset
- References
Dyckerhoff, R. and Mosler, K. (2011). Weighted-mean trimming of multivariate data. Journal of Multivariate Analysis, 102, 405–421.
Dyckerhoff, R., Mozharovskyi, P., and Nagy, S. (2021). Approximate computation of projection depths. Computational Statistics and Data Analysis, 157, 107166.
- Examples
>>> import numpy as np >>> from depth.model import DepthEucl >>> np.random.seed(0) >>> mat1=[[1, 0, 0, 0, 0],[0, 2, 0, 0, 0],[0, 0, 3, 0, 0],[0, 0, 0, 2, 0],[0, 0, 0, 0, 1]] >>> mat2=[[1, 0, 0, 0, 0],[0, 1, 0, 0, 0],[0, 0, 1, 0, 0],[0, 0, 0, 1, 0],[0, 0, 0, 0, 1]] >>> x = np.random.multivariate_normal([1,1,1,1,1], mat2, 10) >>> data = np.random.multivariate_normal([0,0,0,0,0], mat1, 100) >>> mode=DepthEucl().load_dataset(data) >>> mode.cexpchull(x, data, NRandom=1000) [0.090223 0.19577999 0.15769263 0.20123535 0.10375507 0.14635662 0.20611053 0.17846703 0.19801984 0.23230606]
- cexpchullstar(x: ndarray | None = None, solver: str = 'neldermead', NRandom: int = 1000, option: int = 1, n_refinements: int = 10, sphcap_shrink: float = 0.5, alpha_Dirichlet: float = 1.25, cooling_factor: float = 0.95, cap_size: int = 1, start: str = 'mean', space: str = 'sphere', line_solver: str = 'goldensection', bound_gc: bool = True, output_option: Literal['lowest_depth', 'final_depht_dir', 'all_depth', 'all_depth_directions'] = 'final_depht_dir', evaluate_dataset: bool = False) ndarray [source]
Calculates approximately the continuous modified explected convex hull depth of points w.r.t. a multivariate data set.
- Arguments
- x: array-like or None, default=None
Matrix of objects (numerical vector as one object) whose depth is to be calculated; each row contains a d-variate point. Should have the same dimension as data.
- solverstr {
'simplegrid'
,'refinedgrid'
,'simplerandom'
,'refinedrandom'
,'coordinatedescent'
,'randomsimplices'
,'neldermead'
,'simulatedannealing'
}, default=”neldermead” The type of solver used to approximate the depth.
- NRandomint, default=1000
The total number of iterations to compute the depth. Some solvers are converging faster so they are run several time to achieve
NRandom
iterations.- n_refinementsint, default = 10
Set the maximum of iteration for computing the depth of one point. For
solver='refinedrandom'
or'refinedgrid'
.- sphcap_shrinkfloat, default = 0.5
It’s the shrinking of the spherical cap. For
solver='refinedrandom'
or'refinedgrid'
.- alpha_Dirichletfloat, default = 1.25
It’s the parameter of the Dirichlet distribution. For
solver='randomsimplices'
.- cooling_factorfloat, default = 0.95
It’s the cooling factor. For
solver='simulatedannealing'
.- cap_sizeint | float, default = 1
It’s the size of the spherical cap. For
solver='simulatedannealing'
or'neldermead'
.- startstr {‘mean’, ‘random’}, default = mean
For
solver='simulatedannealing'
or'neldermead'
, it’s the method used to compute the first depth.- spacestr {‘sphere’, ‘euclidean’}, default = ‘sphere’
For
solver='coordinatedescent'
or'neldermead'
, it’s the type of spacecin which the solver is running.- line_solverstr {‘uniform’, ‘goldensection’}, default = goldensection
For
solver='coordinatedescent'
, it’s the line searh strategy used by this solver.- bound_gcbool, default = True
For
solver='neldermead'
, it’sTrue
if the search is limited to the closed hemisphere.- output_optionstr {“lowest_depth”,”final_depht_dir”,”all_depth”,”all_depth_directions}, default = final_depht_dir
Determines what will be computated alongside with the final depth | If
output_option=lowest_depth
, only approximated depths are returned. | Ifoutput_option=final_depht_dir
, best directions to approximate depths are also returned. | Ifoutput_option=all_depth
, depths calculated at every iteration are also returned. | Ifoutput_option=all_depth_directions
, random directions used to project depths are also returned with indices of converging for the solver selected.- evaluate_datasetbool, default=False
Determines if dataset loaded will be evaluated. Automatically sets x to dataset
- References
Dyckerhoff, R. and Mosler, K. (2011). Weighted-mean trimming of multivariate data. Journal of Multivariate Analysis, 102, 405–421.
Dyckerhoff, R., Mozharovskyi, P., and Nagy, S. (2021). Approximate computation of projection depths. Computational Statistics and Data Analysis, 157, 107166.
- Examples
>>> import numpy as np >>> from depth.model import DepthEucl >>> np.random.seed(0) >>> mat1=[[1, 0, 0, 0, 0],[0, 2, 0, 0, 0],[0, 0, 3, 0, 0],[0, 0, 0, 2, 0],[0, 0, 0, 0, 1]] >>> mat2=[[1, 0, 0, 0, 0],[0, 1, 0, 0, 0],[0, 0, 1, 0, 0],[0, 0, 0, 1, 0],[0, 0, 0, 0, 1]] >>> x = np.random.multivariate_normal([1,1,1,1,1], mat2, 10) >>> data = np.random.multivariate_normal([0,0,0,0,0], mat1, 100) >>> model=DepthEucl().load_dataset(data) >>> model.cexpchull(x, NRandom=1000) [0.090223 0.19577999 0.15769263 0.20123535 0.10375507 0.14635662 0.20611053 0.17846703 0.19801984 0.23230606]
- change_dataset(newDataset: ndarray, newY: ndarray | None = None, newDistribution: ndarray | None = None, keepOld: bool = False) None [source]
Modify loaded dataset.
- Arguments
- newDataset:np.ndarray
New dataset
- newDistribution:np.ndarray|None, default=None,
Distribution related to the dataset
- newY:np.ndarray|None, default=None,
Only for convention.
- keepOld:bool, default=False,
Boolean to determine if current dataset is kept or not. If True, newDataset is added in the end of the old one.
- Returns
None
- Examples
>>> import numpy as np >>> from depth.model import DepthEucl >>> mat1=[[1, 0, 0, 0, 0],[0, 2, 0, 0, 0],[0, 0, 3, 0, 0],[0, 0, 0, 2, 0],[0, 0, 0, 0, 1]] >>> mat2=[[1, 0, 0, 0, 0],[0, 1, 0, 0, 0],[0, 0, 1, 0, 0],[0, 0, 0, 1, 0],[0, 0, 0, 0, 1]] >>> np.random.seed(0) >>> data = np.random.multivariate_normal([0,0,0,0,0], mat1, 1000) >>> data2 = np.random.multivariate_normal([0,0,0,0,0], mat1, 1000) >>> model = DepthEucl().load_dataset(data) >>> model.change_dataset(data2,)
- computeMCD(mat: ndarray | None = None, h: int | float = 1, mfull: int = 10, nstep: int = 7, hiRegimeCompleteLastComp: bool = True) None [source]
Compute Minimum Covariance Determinant (MCD)
- geometrical(x: ndarray | None = None, solver: str = 'neldermead', NRandom: int = 1000, n_refinements: int = 10, sphcap_shrink: float = 0.5, alpha_Dirichlet: float = 1.25, cooling_factor: float = 0.95, cap_size: int = 1, start: str = 'mean', space: str = 'sphere', line_solver: str = 'goldensection', bound_gc: bool = True, output_option: Literal['lowest_depth', 'final_depht_dir', 'all_depth', 'all_depth_directions'] = 'final_depht_dir', evaluate_dataset: bool = False) ndarray [source]
Calculates approximately the geometrical depth of points w.r.t. a multivariate data set.
- Arguments
- x: array-like or None, default=None
Matrix of objects (numerical vector as one object) whose depth is to be calculated; each row contains a d-variate point. Should have the same dimension as data.
- solverstr {
'simplegrid'
,'refinedgrid'
,'simplerandom'
,'refinedrandom'
,'coordinatedescent'
,'randomsimplices'
,'neldermead'
,'simulatedannealing'
}, default=”neldermead” The type of solver used to approximate the depth.
- NRandomint, default=1000
The total number of iterations to compute the depth. Some solvers are converging faster so they are run several time to achieve
NRandom
iterations.- n_refinementsint, default = 10
Set the maximum of iteration for computing the depth of one point. For
solver='refinedrandom'
or'refinedgrid'
.- sphcap_shrinkfloat, default = 0.5
It’s the shrinking of the spherical cap. For
solver='refinedrandom'
or'refinedgrid'
.- alpha_Dirichletfloat, default = 1.25
It’s the parameter of the Dirichlet distribution. For
solver='randomsimplices'
.- cooling_factorfloat, default = 0.95
It’s the cooling factor. For
solver='simulatedannealing'
.- cap_sizeint | float, default = 1
It’s the size of the spherical cap. For
solver='simulatedannealing'
or'neldermead'
.- startstr {‘mean’, ‘random’}, default = mean
For
solver='simulatedannealing'
or'neldermead'
, it’s the method used to compute the first depth.- spacestr {‘sphere’, ‘euclidean’}, default = ‘sphere’
For
solver='coordinatedescent'
or'neldermead'
, it’s the type of spacecin which the solver is running.- line_solverstr {‘uniform’, ‘goldensection’}, default = goldensection
For
solver='coordinatedescent'
, it’s the line searh strategy used by this solver.- bound_gcbool, default = True
For
solver='neldermead'
, it’sTrue
if the search is limited to the closed hemisphere.- output_optionstr {“lowest_depth”,”final_depht_dir”,”all_depth”,”all_depth_directions}, default = final_depht_dir
Determines what will be computated alongside with the final depth | If
output_option=lowest_depth
, only approximated depths are returned. | Ifoutput_option=final_depht_dir
, best directions to approximate depths are also returned. | Ifoutput_option=all_depth
, depths calculated at every iteration are also returned. | Ifoutput_option=all_depth_directions
, random directions used to project depths are also returned with indices of converging for the solver selected.- evaluate_datasetbool, default=False
Determines if dataset loaded will be evaluated. Automatically sets x to dataset
- References
Dyckerhoff, R. and Mosler, K. (2011). Weighted-mean trimming of multivariate data. Journal of Multivariate Analysis, 102, 405–421.
Dyckerhoff, R., Mozharovskyi, P., and Nagy, S. (2021). Approximate computation of projection depths. Computational Statistics and Data Analysis, 157, 107166.
- Examples
>>> import numpy as np >>> from depth.model import DepthEucl >>> np.random.seed(0) >>> mat1=[[1, 0, 0, 0, 0],[0, 2, 0, 0, 0],[0, 0, 3, 0, 0],[0, 0, 0, 2, 0],[0, 0, 0, 0, 1]] >>> mat2=[[1, 0, 0, 0, 0],[0, 1, 0, 0, 0],[0, 0, 1, 0, 0],[0, 0, 0, 1, 0],[0, 0, 0, 0, 1]] >>> x = np.random.multivariate_normal([1,1,1,1,1], mat2, 10) >>> data = np.random.multivariate_normal([0,0,0,0,0], mat1, 100) >>> model=DepthEucl().load_dataset(data) >>> model.geometrical(x, NRandom=1000) [0.090223 0.19577999 0.15769263 0.20123535 0.10375507 0.14635662 0.20611053 0.17846703 0.19801984 0.23230606]
- halfspace(x: ndarray | None = None, exact: bool = True, method: str = 'recursive', solver: str = 'neldermead', NRandom: int = 1000, n_refinements: int = 10, sphcap_shrink: float = 0.5, alpha_Dirichlet: float = 1.25, cooling_factor: float = 0.95, cap_size: int = 1, start: str = 'mean', space: str = 'sphere', line_solver: str = 'goldensection', bound_gc: bool = True, CUDA: bool = False, output_option: Literal['lowest_depth', 'final_depht_dir', 'all_depth', 'all_depth_directions'] = 'final_depht_dir', evaluate_dataset: bool = False) ndarray [source]
Calculates the exact and approximated Tukey (=halfspace, location) depth (Tukey, 1975) of points w.r.t. a multivariate data set.
- Arguments
- x: array-like or None, default=None
Matrix of objects (numerical vector as one object) whose depth is to be calculated; each row contains a d-variate point. Should have the same dimension as data.
- exactbool, default=False
The type of the used method. The default is
exact=False
, which leads to approx- imate computation of the Tukey depth. Ifexact=True
, the Tukey depth is computed exactly, withmethod='recursive'
by default.- method: str, default=’recursive’
For
exact=True
, the Tukey depth is calculated as the minimum over all combinations of k points from data (see Details below). In this case parameter method specifies k, with possible values 1 formethod='recursive'
(by default), d−2 formethod='plane'
, d−1 for'method=line'
. The name of the method may be given as well as just parameter exact, in which case the default method will be used.- solverstr {
'simplegrid'
,'refinedgrid'
,'simplerandom'
,'refinedrandom'
,'coordinatedescent'
,'randomsimplices'
,'neldermead'
,'simulatedannealing'
}, default=”neldermead” The type of solver used to approximate the depth.
- NRandomint, default=1000
The total number of iterations to compute the depth. Some solvers are converging faster so they are run several time to achieve
NRandom
iterations.- n_refinementsint, default = 10
Set the maximum of iteration for computing the depth of one point. For
solver='refinedrandom'
or'refinedgrid'
.- sphcap_shrinkfloat, default = 0.5
It’s the shrinking of the spherical cap. For
solver='refinedrandom'
or'refinedgrid'
.- alpha_Dirichletfloat, default = 1.25
It’s the parameter of the Dirichlet distribution. For
solver='randomsimplices'
.- cooling_factorfloat, default = 0.95
It’s the cooling factor. For
solver='simulatedannealing'
.- cap_sizeint | float, default = 1
It’s the size of the spherical cap. For
solver='simulatedannealing'
or'neldermead'
.- startstr {‘mean’, ‘random’}, default = mean
For
solver='simulatedannealing'
or'neldermead'
, it’s the method used to compute the first depth.- spacestr {‘sphere’, ‘euclidean’}, default = ‘sphere’
For
solver='coordinatedescent'
or'neldermead'
, it’s the type of spacecin which the solver is running.- line_solverstr {‘uniform’, ‘goldensection’}, default = goldensection
For
solver='coordinatedescent'
, it’s the line searh strategy used by this solver.- bound_gcbool, default = True
For
solver='neldermead'
, it’sTrue
if the search is limited to the closed hemisphere.- CUDAbool, default=False
Determines if approximate computation will be performed in GPU. avaiable only for simplerandom or refinedrandom
- output_optionstr {“lowest_depth”,”final_depht_dir”,”all_depth”,”all_depth_directions}, default = final_depht_dir
Determines what will be computated alongside with the final depth | If
output_option=lowest_depth
, only approximated depths are returned. | Ifoutput_option=final_depht_dir
, best directions to approximate depths are also returned. | Ifoutput_option=all_depth
, depths calculated at every iteration are also returned. | Ifoutput_option=all_depth_directions
, random directions used to project depths are also returned with indices of converging for the solver selected.- evaluate_datasetbool, default=False
Determines if dataset loaded will be evaluated. Automatically sets x to dataset
- References
Tukey, J. W. (1975). Mathematics and the picturing of data. In R. James (Ed.), Proceedings of the International Congress of Mathematicians, Volume 2, Canadian Mathematical Congress, 523–531.
Donoho, D. L. and M. Gasko (1992). Breakdown properties of location estimates based on halfspace depth and projected outlyingness. The Annals of Statistics, 20(4), 1803–1827.
Dyckerhoff, R. and Mozharovskyi, P. (2016): Exact computation of the halfspace depth. Computational Statistics and Data Analysis, 98, 19–30.
Dyckerhoff, R., Mozharovskyi, P., and Nagy, S. (2021). Approximate computation of projection depths. Computational Statistics and Data Analysis, 157, 107166.
- Examples
>>> import numpy as np >>> from depth.model import DepthEucl >>> mat1=[[1, 0, 0],[0, 2, 0],[0, 0, 1]] >>> mat2=[[1, 0, 0],[0, 1, 0],[0, 0, 1]] >>> x = np.random.multivariate_normal([1,1,1], mat2, 10) >>> data = np.random.multivariate_normal([0,0,0], mat1, 200) >>> model=DepthEucl().load_dataset(data) >>> model.halfspace(x,) [0. 0.005 0.005 0. 0.04 0.01 0. 0. 0.04 0.01 ] >>> model.halfspace(x, exact=True) [0. 0.005 0.005 0. 0.04 0.01 0. 0. 0.04 0.01 ]
- load_dataset(data: ndarray = None, distribution: ndarray | None = None, CUDA: bool = False, y: ndarray | None = None) None [source]
Load the dataset X for reference calculations. Depth is computed with respect to this dataset.
- Parameters:
data ({array-like} of shape (n,d).) – Dataset that will be used for depth computation
distribution (Ignored, default=None) – Not used, present for API consistency by convention.
CUDA (bool, default=False) – Determine with device CUDA will be used
y (Ignored, default=None) – Not used, present for API consistency by convention.
- Return type:
loaded dataset
- mahalanobis(x: ndarray | None = None, exact: bool = True, mah_estimate: Literal['moment', 'mcd'] = 'moment', mah_parMcd: float = 0.75, solver='neldermead', NRandom=1000, n_refinements=10, sphcap_shrink=0.5, alpha_Dirichlet=1.25, cooling_factor=0.95, cap_size=1, start='mean', space='sphere', line_solver='goldensection', bound_gc=True, output_option: Literal['lowest_depth', 'final_depht_dir', 'all_depth', 'all_depth_directions'] = 'final_depht_dir', evaluate_dataset: bool = False) ndarray [source]
Calculates the Mahalanobis depth of points w.r.t. a multivariate data set.
- Arguments
- x: array-like or None, default=None
Matrix of objects (numerical vector as one object) whose depth is to be calculated; each row contains a d-variate point. Should have the same dimension as data.
- exactbool, delfaut=True
The type of the used method. The default is
exact=False
, which leads to approx- imate computation of the Mahalanobis depth using the method defined by the argumentsolver
. Ifexact=True
, the Mahalanobis depth is computed exactly, using the closed-form expression.- mah_estimatestr, {“moment”, “mcd”}, default=”moment”
A character string specifying which estimates to use when calculating the Mahalanobis depth; can be “‘moment’” or
'MCD'
, determining whether traditional moment or Minimum Covariance Determinant (MCD) estimates for mean and covariance are used. By default'moment'
is used.- mah_parMcdfloat, default=0.75
is the value of the argument alpha for the function covMcd; is used when mah.estimate =
'MCD'
.- solverstr, default=”neldermead”
The type of solver used to approximate the depth. {
'simplegrid'
,'refinedgrid'
,'simplerandom'
,'refinedrandom'
,'coordinatedescent'
,'randomsimplices'
,'neldermead'
,'simulatedannealing'
}- NRandomint, default=1000
The total number of iterations to compute the depth. Some solvers are converging faster so they are run several time to achieve
NRandom
iterations.- n_refinementsint, default = 10
Set the maximum of iteration for computing the depth of one point. For
solver='refinedrandom'
or'refinedgrid'
.- sphcap_shrinkfloat, default = 0.5
It’s the shrinking of the spherical cap. For
solver='refinedrandom'
or'refinedgrid'
.- alpha_Dirichletfloat, default = 1.25
It’s the parameter of the Dirichlet distribution. For
solver='randomsimplices'
.- cooling_factorfloat, default = 0.95
It’s the cooling factor. For
solver='simulatedannealing'
.- cap_sizeint | float, default = 1
It’s the size of the spherical cap. For
solver='simulatedannealing'
or'neldermead'
.- startstr {‘mean’, ‘random’}, default = mean
For
solver='simulatedannealing'
or'neldermead'
, it’s the method used to compute the first depth.- spacestr {‘sphere’, ‘euclidean’}, default = ‘sphere’
For
solver='coordinatedescent'
or'neldermead'
, it’s the type of spacecin which the solver is running.- line_solverstr {‘uniform’, ‘goldensection’}, default = goldensection
For
solver='coordinatedescent'
, it’s the line searh strategy used by this solver.- bound_gcbool, default = True
For
solver='neldermead'
, it’sTrue
if the search is limited to the closed hemisphere.- output_optionstr {“lowest_depth”,”final_depht_dir”,”all_depth”,”all_depth_directions}, default = final_depht_dir
Determines what will be computated alongside with the final depth | If
output_option=lowest_depth
, only approximated depths are returned. | Ifoutput_option=final_depht_dir
, best directions to approximate depths are also returned. | Ifoutput_option=all_depth
, depths calculated at every iteration are also returned. | Ifoutput_option=all_depth_directions
, random directions used to project depths are also returned with indices of converging for the solver selected.- evaluate_datasetbool, default=False
Determines if dataset loaded will be evaluated. Automatically sets x to dataset
- References
Mahalanobis, P. C. (1936). On the generalized distance in statistics. Proceedings of the National Institute of Sciences of India, 12, 49–55.
Mosler, K. and Mozharovskyi, P. (2022). Choosing among notions of multivariate depth statistics. Statistical Science, 37(3), 348-368.
- Examples
>>> import numpy as np >>> from depth.model import DepthEucl >>> mat1=[[1, 0, 0, 0, 0],[0, 2, 0, 0, 0],[0, 0, 3, 0, 0],[0, 0, 0, 2, 0],[0, 0, 0, 0, 1]] >>> mat2=[[1, 0, 0, 0, 0],[0, 1, 0, 0, 0],[0, 0, 1, 0, 0],[0, 0, 0, 1, 0],[0, 0, 0, 0, 1]] >>> np.random.seed(0) >>> x = np.random.multivariate_normal([1,1,1,1,1], mat2, 10) >>> data = np.random.multivariate_normal([0,0,0,0,0], mat1, 1000) >>> model = DepthEucl().load_dataset(data) >>> model.mahalanobis(x) [0.17849871 0.10412453 0.1331417 0.13578021 0.3154836 0.29103769 0.13398989 0.13913017 0.59339051 0.10556139] >>> model.mahalanobisDepth [0.17849871 0.10412453 0.1331417 0.13578021 0.3154836 0.29103769 0.13398989 0.13913017 0.59339051 0.10556139] >>> model.mahalanobis(x, exact="True", mah_estimate="MCD", mah_parMcd = 0.75) [0.17758703 0.10367974 0.131705 0.13575221 0.31847867 0.29034948 0.13291613 0.13792774 0.59094958 0.10491694]
- potential(x: ndarray | None = None, pretransform: str = '1Mom', kernel: str = 'EDKernel', mah_parMcd: float = 0.75, kernel_bandwidth: int = 0, evaluate_dataset: bool = False) ndarray [source]
Calculate the potential of the points w.r.t. a multivariate data set. The potential is the kernel-estimated density multiplied by the prior probability of a class. Different from the data depths, a density estimate measures at a given point how much mass is located around it.
- Arguments
- x: array-like or None, default=None
Matrix of objects (numerical vector as one object) whose depth is to be calculated; each row contains a d-variate point. Should have the same dimension as data.
- pretransform: str, default=”1Mom”
The method of data scaling.
'1Mom'
or'NMom'
for scaling using data moments.'1MCD'
or'NMCD'
for scaling using robust data moments (Minimum Covariance Determinant (MCD).- kernel: str, default=”EDKernel”
'EDKernel'
for the kernel of type 1/(1+kernel.bandwidth*EuclidianDistance2(x,y)),'GKernel'
[default and recommended] for the simple Gaussian kernel,'EKernel'
exponential kernel: exp(-kernel.bandwidth*EuclidianDistance(x, y)),'VarGKernel'
variable Gaussian kernel, where kernel.bandwidth is proportional to the depth.zonoid of a point.- kernel_bandwidth: int, default=0
the single bandwidth parameter of the kernel. If
0
- the Scott`s rule of thumb is used.- mah_parMcdfloat, default=0.75
Value of the argument alpha for the function covMcd
- evaluate_datasetbool, default=False
Determines if dataset loaded will be evaluated. Automatically sets x to dataset
- References
Pokotylo, O. and Mosler, K. (2019). Classification with the pot-pot plot. Statistical Papers, 60, 903-931.
- Examples
>>> import numpy as np >>> from depth.model import DepthEucl >>> mat1=[[1, 0, 0],[0, 2, 0],[0, 0, 1]] >>> mat2=[[1, 0, 0],[0, 1, 0],[0, 0, 1]] >>> data = np.random.multivariate_normal([0,0,0], mat1, 20) >>> model=DepthEucl().load_dataset(data) >>> x = np.random.multivariate_normal([1,1,1], mat2, 10) >>> model.potential(x,) [7.51492797 8.34322926 5.42761506 6.25418171 4.25774485 8.09733146 6.65788017 5.11324521 5.74407939 9.26030661] >>> model.potential(x, kernel_bandwidth=0.1) [13.56510469 13.95553893 11.23251702 12.42491604 10.17527509 13.70947682 12.67352469 11.2080649 11.73402562 14.93067103] >>> model.potential(x, pretransform = "NMCD", mah_parMcd=0.6, kernel_bandwidth=0.1) [11.0603282 11.49509828 8.99303793 8.63168006 7.86456928 11.03588551 10.45468945 8.84989798 9.56799496 12.29832608]
- projection(x: ndarray | None = None, solver: str = 'neldermead', NRandom: int = 1000, n_refinements: int = 10, sphcap_shrink: float = 0.5, alpha_Dirichlet: float = 1.25, cooling_factor: float = 0.95, cap_size: int = 1, start: str = 'mean', space: str = 'sphere', line_solver: str = 'goldensection', bound_gc: bool = True, CUDA: bool = False, output_option: Literal['lowest_depth', 'final_depht_dir', 'all_depth', 'all_depth_directions'] = 'final_depht_dir', evaluate_dataset: bool = False) ndarray [source]
Calculates approximately the projection depth of points w.r.t. a multivariate data set.
- Arguments
- x: array-like or None, default=None
Matrix of objects (numerical vector as one object) whose depth is to be calculated; each row contains a d-variate point. Should have the same dimension as data.
- solverstr {
'simplegrid'
,'refinedgrid'
,'simplerandom'
,'refinedrandom'
,'coordinatedescent'
,'randomsimplices'
,'neldermead'
,'simulatedannealing'
}, default=”neldermead” The type of solver used to approximate the depth.
- NRandomint, default=1000
The total number of iterations to compute the depth. Some solvers are converging faster so they are run several time to achieve
NRandom
iterations.- n_refinementsint, default = 10
Set the maximum of iteration for computing the depth of one point. For
solver='refinedrandom'
or'refinedgrid'
.- sphcap_shrinkfloat, default = 0.5
It’s the shrinking of the spherical cap. For
solver='refinedrandom'
or'refinedgrid'
.- alpha_Dirichletfloat, default = 1.25
It’s the parameter of the Dirichlet distribution. For
solver='randomsimplices'
.- cooling_factorfloat, default = 0.95
It’s the cooling factor. For
solver='simulatedannealing'
.- cap_sizeint | float, default = 1
It’s the size of the spherical cap. For
solver='simulatedannealing'
or'neldermead'
.- startstr {‘mean’, ‘random’}, default = mean
For
solver='simulatedannealing'
or'neldermead'
, it’s the method used to compute the first depth.- spacestr {‘sphere’, ‘euclidean’}, default = ‘sphere’
For
solver='coordinatedescent'
or'neldermead'
, it’s the type of spacecin which the solver is running.- line_solverstr {‘uniform’, ‘goldensection’}, default = goldensection
For
solver='coordinatedescent'
, it’s the line searh strategy used by this solver.- bound_gcbool, default = True
For
solver='neldermead'
, it’sTrue
if the search is limited to the closed hemisphere.- CUDAbool, default=False
Determines if approximate computation will be performed in GPU. avaiable only for simplerandom or refinedrandom
- output_optionstr {“lowest_depth”,”final_depht_dir”,”all_depth”,”all_depth_directions}, default = final_depht_dir
Determines what will be computated alongside with the final depth | If
output_option=lowest_depth
, only approximated depths are returned. | Ifoutput_option=final_depht_dir
, best directions to approximate depths are also returned. | Ifoutput_option=all_depth
, depths calculated at every iteration are also returned. | Ifoutput_option=all_depth_directions
, random directions used to project depths are also returned with indices of converging for the solver selected.- evaluate_datasetbool, default=False
Determines if dataset loaded will be evaluated. Automatically sets x to dataset
- References
Zuo, Y. and Serfling, R. (2000). General notions of statistical depth function. The Annals of Statistics, 28, 461–482.
Dyckerhoff, R., Mozharovskyi, P., and Nagy, S. (2021). Approximate computation of projection depths. Computational Statistics and Data Analysis, 157, 107166.
- Examples
>>> import numpy as np >>> from depth.model import DepthEucl >>> np.random.seed(0) >>> mat1=[[1, 0, 0, 0, 0],[0, 2, 0, 0, 0],[0, 0, 3, 0, 0],[0, 0, 0, 2, 0],[0, 0, 0, 0, 1]] >>> mat2=[[1, 0, 0, 0, 0],[0, 1, 0, 0, 0],[0, 0, 1, 0, 0],[0, 0, 0, 1, 0],[0, 0, 0, 0, 1]] >>> data = np.random.multivariate_normal([0,0,0,0,0], mat1, 100) >>> model=DepthEucl().load_dataset(data) >>> x = np.random.multivariate_normal([1,1,1,1,1], mat2, 10) >>> model.projection(x, NRandom=1000) [0.090223 0.19577999 0.15769263 0.20123535 0.10375507 0.14635662 0.20611053 0.17846703 0.19801984 0.23230606]
- qhpeeling(x: ndarray | None = None, evaluate_dataset: bool = False) ndarray [source]
Calculates the convex hull peeling depth of points w.r.t. a multivariate data set.
- Usage
qhpeeling(x, data)
- Arguments
- x: array-like or None, default=None
Matrix of objects (numerical vector as one object) whose depth is to be calculated; each row contains a d-variate point. Should have the same dimension as data.
- evaluate_datasetbool, default=False
Determines if dataset loaded will be evaluated. Automatically sets x to dataset
- References
Barnett, V. (1976). The ordering of multivariate data. Journal of the Royal Statistical Society, Series A, 139, 318–355.
Eddy, W. F. (1981). Graphics for the multivariate two-sample problem: Comment. Journal of the American Statistical Association, 76, 287–289.
- Examples
>>> from depth.model import DepthEucl >>> mat1=[[1, 0, 0, 0, 0],[0, 2, 0, 0, 0],[0, 0, 3, 0, 0],[0, 0, 0, 2, 0],[0, 0, 0, 0, 1]] >>> mat2=[[1, 0, 0, 0, 0],[0, 1, 0, 0, 0],[0, 0, 1, 0, 0],[0, 0, 0, 1, 0],[0, 0, 0, 0, 1]] >>> x = np.random.multivariate_normal([1,1,1,1,1], mat2, 10) >>> data = np.random.multivariate_normal([0,0,0,0,0], mat1, 100) >>> model=DepthEucl().load_dataset(data) >>> model.qhpeeling(x) [0. 0. 0. 0. 0. 0. 0.01 0. 0. 0.01]
- set_seed(seed: int = 2801) None [source]
Set seed for computation
- simplicial(x: ndarray | None = None, exact: bool = True, k: float = 0.05, evaluate_dataset: bool = False) ndarray [source]
Calculates the simplicial depth of points w.r.t. a multivariate data set.
- Arguments
- x: array-like or None, default=None
Matrix of objects (numerical vector as one object) whose depth is to be calculated; each row contains a d-variate point. Should have the same dimension as data.
- exact: bool, default=True
exact=True
(by default) implies the exact algorithm,exact=False
implies the approximative algorithm, considering k simplices.- kfloat or int, default=0.05
- Number (
k > 1
) or portion (if0 < k < 1
) of simplices that are considered ifexact=False
.Ifk > 1
, then the algorithmic complexity is polynomial in d but is independent of the number of observations in data, given k.If0 < k < 1
,then the algorithmic complexity is exponential in the number of observations in data, but the calculation precision stays approximately the same. - evaluate_datasetbool, default=False
Determines if dataset loaded will be evaluated. Automatically sets x to dataset
- References
Liu , R. Y. (1990). On a notion of data depth based on random simplices. The Annals of Statistics, 18, 405–414.
- Examples
>>> import numpy as np >>> from depth.model import DepthEucl >>> mat1=[[1, 0, 0],[0, 1, 0],[0, 0, 1]] >>> mat2=[[1, 0, 0],[0, 1, 0],[0, 0, 1]] >>> x = np.random.multivariate_normal([1,1,1], mat2, 10) >>> data = np.random.multivariate_normal([0,0,0], mat1, 25) >>> model=DepthEucl().load_dataset(data) >>> model.simplicial(x,) [0.04458498 0. 0. 0. 0. 0. 0. 0. 0. 0. ]
- simplicialVolume(x: ndarray | None = None, exact: bool = True, k: float = 0.05, mah_estimate: str = 'moment', mah_parMCD: float = 0.75, evaluate_dataset: bool = False) ndarray [source]
Calculates the simpicial volume depth of points w.r.t. a multivariate data set.
- Arguments
- x: array-like or None, default=None
Matrix of objects (numerical vector as one object) whose depth is to be calculated; each row contains a d-variate point. Should have the same dimension as data.
- exactbool, default=True
exact=True
(by default) implies the exact algorithm,exact=False
implies the approximative algorithm, considering k simplices.- k: float or int, default=0.05
- Number (
k > 1
) or portion (if0 < k < 1
) of simplices that are considered ifexact = F
.Ifk > 1
, then the algorithmic complexity is polynomial in d but is independent of the number of observations in data, given k.If0 < k < 1
, then the algorithmic complexity is exponential in the number of observations in data, but the calculation precision stays approximately the same. - mah_estimatestr, {“moment”, “mcd”}, default=”moment”
A character string specifying which estimates to use when calculating the Mahalanobis depth; can be “‘moment’” or
'MCD'
, determining whether traditional moment or Minimum Covariance Determinant (MCD) estimates for mean and covariance are used.- mah_parMcdfloat, default=0.75
is the value of the argument alpha for the function covMcd; is used when mah.estimate =
'MCD'
.- evaluate_datasetbool, default=False
Determines if dataset loaded will be evaluated. Automatically sets x to dataset
- References
Oja, H. (1983). Descriptive statistics for multivariate distributions. Statistics and Probability Letters, 1, 327–332.
- Examples
>>> import numpy as np >>> from depth.model import DepthEucl >>> mat1=[[1, 0, 0],[0, 2, 0],[0, 0, 1]] >>> mat2=[[1, 0, 0],[0, 1, 0],[0, 0, 1]] >>> x = np.random.multivariate_normal([1,1,1], mat2, 10) >>> data = np.random.multivariate_normal([0,0,0], mat1, 20) >>> model=DepthEucl().load_dataset(data) >>> model.simplicalVolume(x, exact=True) [0.45749049 0.34956166 0.2263421 0.68742137 0.94796538 0.51112415 0.85250931 0.67914988 0.79165292 0.33192247] >>> model.simplicalVolume(x, exact=False, k=0.2) [0.46826813 0.40138917 0.23189724 0.69025277 0.938543 0.56005713 0.8113647 0.72220103 0.82036139 0.33908597]
- spatial(x: ndarray | None = None, mah_estimate: str = 'moment', mah_parMcd: float = 0.75, evaluate_dataset: bool = False) ndarray [source]
Calculates the spatial depth of points w.r.t. a multivariate data set.
- Arguments
- x: array-like or None, default=None
Matrix of objects (numerical vector as one object) whose depth is to be calculated; each row contains a d-variate point. Should have the same dimension as data.
- mah_estimatestr, {“moment”, “mcd”}, default=”moment”
A character string specifying which estimates to use when calculating the Mahalanobis depth; can be “‘moment’” or
'MCD'
, determining whether traditional moment or Minimum Covariance Determinant (MCD) estimates for mean and covariance are used.- mah_parMcdfloat, default=0.75
is the value of the argument alpha for the function covMcd; is used when mah.estimate =
'MCD'
.- evaluate_datasetbool, default=False
Determines if dataset loaded will be evaluated. Automatically sets x to dataset
- References
Serfling, R. (2002). A depth function and a scale curve based on spatial quantiles. In Dodge, Y. (Ed.), Statistical Data Analysis Based on the L1-Norm and Related Methods, Statisctics in Industry and Technology, Birkhäuser, Basel, 25–38.
- Examples
>>> import numpy as np >>> from depth.model import DepthEucl >>> mat1=[[1, 0, 0, 0, 0],[0, 2, 0, 0, 0],[0, 0, 3, 0, 0],[0, 0, 0, 2, 0],[0, 0, 0, 0, 1]] >>> mat2=[[1, 0, 0, 0, 0],[0, 1, 0, 0, 0],[0, 0, 1, 0, 0],[0, 0, 0, 1, 0],[0, 0, 0, 0, 1]] >>> x = np.random.multivariate_normal([1,1,1,1,1], mat2, 10) >>> data = np.random.multivariate_normal([0,0,0,0,0], mat1, 1000) >>> model=DepthEucl().load_dataset(data) >>> model.spatial(x, ) [0.22548919451212823, 0.14038895785356165, 0.2745517635029123, 0.35450156620496354, 0.42373722245348566, 0.34562025044812095, 0.3585616673301636, 0.16916309940691643, 0.573349631625784, 0.32017213635679687]
- zonoid(x: ndarray | None = None, exact: bool = True, solver='neldermead', NRandom=1000, n_refinements=10, sphcap_shrink=0.5, alpha_Dirichlet=1.25, cooling_factor=0.95, cap_size=1, start='mean', space='sphere', line_solver='goldensection', bound_gc=True, output_option: Literal['lowest_depth', 'final_depht_dir', 'all_depth', 'all_depth_directions'] = 'final_depht_dir', evaluate_dataset: bool = False) ndarray [source]
Calculates the zonoid depth of points w.r.t. a multivariate data set.
- Arguments
- x: array-like or None, default=None
Matrix of objects (numerical vector as one object) whose depth is to be calculated; each row contains a d-variate point. Should have the same dimension as data.
- solverstr {
'simplegrid'
,'refinedgrid'
,'simplerandom'
,'refinedrandom'
,'coordinatedescent'
,'randomsimplices'
,'neldermead'
,'simulatedannealing'
}, default=”neldermead” The type of solver used to approximate the depth.
- NRandomint, default=1000
The total number of iterations to compute the depth. Some solvers are converging faster so they are run several time to achieve
NRandom
iterations.- n_refinementsint, default = 10
Set the maximum of iteration for computing the depth of one point. For
solver='refinedrandom'
or'refinedgrid'
.- sphcap_shrinkfloat, default = 0.5
It’s the shrinking of the spherical cap. For
solver='refinedrandom'
or'refinedgrid'
.- alpha_Dirichletfloat, default = 1.25
It’s the parameter of the Dirichlet distribution. For
solver='randomsimplices'
.- cooling_factorfloat, default = 0.95
It’s the cooling factor. For
solver='simulatedannealing'
.- cap_sizeint | float, default = 1
It’s the size of the spherical cap. For
solver='simulatedannealing'
or'neldermead'
.- startstr {‘mean’, ‘random’}, default = mean
For
solver='simulatedannealing'
or'neldermead'
, it’s the method used to compute the first depth.- spacestr {‘sphere’, ‘euclidean’}, default = ‘sphere’
For
solver='coordinatedescent'
or'neldermead'
, it’s the type of spacecin which the solver is running.- line_solverstr {‘uniform’, ‘goldensection’}, default = goldensection
For
solver='coordinatedescent'
, it’s the line searh strategy used by this solver.- bound_gcbool, default = True
For
solver='neldermead'
, it’sTrue
if the search is limited to the closed hemisphere.- output_optionstr {“lowest_depth”,”final_depht_dir”,”all_depth”,”all_depth_directions}, default = final_depht_dir
Determines what will be computated alongside with the final depth | If
output_option=lowest_depth
, only approximated depths are returned. | Ifoutput_option=final_depht_dir
, best directions to approximate depths are also returned. | Ifoutput_option=all_depth
, depths calculated at every iteration are also returned. | Ifoutput_option=all_depth_directions
, random directions used to project depths are also returned with indices of converging for the solver selected.- evaluate_datasetbool, default=False
Determines if dataset loaded will be evaluated. Automatically sets x to dataset
- References
Dyckerhoff, R., Koshevoy, G. and Mosler, K. (1996). Zonoid data depth: Theory and computation. In A. Pratt, (Ed.), COMPSTAT 1996, Proceedings in Computational Statistics, Physica-Verlag, Heidelberg, 235–240.
Koshevoy, G. and Mosler, K. (1997). Zonoid trimming for multivariate distributions. The Annals of Statistics, 25, 1998–2017.
Dyckerhoff, R., Mozharovskyi, P., and Nagy, S. (2021). Approximate computation of projection depths. Computational Statistics and Data Analysis, 157, 107166.
- Examples
>>> import numpy as np >>> from depth.model import DepthEucl >>> mat1=[[1, 0, 0, 0, 0],[0, 2, 0, 0, 0],[0, 0, 3, 0, 0],[0, 0, 0, 2, 0],[0, 0, 0, 0, 1]] >>> mat2=[[1, 0, 0, 0, 0],[0, 1, 0, 0, 0],[0, 0, 1, 0, 0],[0, 0, 0, 1, 0],[0, 0, 0, 0, 1]] >>> x = np.random.multivariate_normal([1,1,1,1,1], mat2, 10) >>> data = np.random.multivariate_normal([0,0,0,0,0], mat1, 1000) >>> model=DepthEucl().load_dataset(data) >>> model.zonoid(x,) [0. 0.00769552 0.03087017 0. 0.30945453 0.0142515 0. 0.01970896 0.02169483 0. ]
- Asymmetric projection depth
- Beta-skeleton depth
- Continuous expected convex hull depth
- Continuous modified expected convex hull depth
- Geometrical depth
- Halfspace depth
- L2-depth
- Mahalanobis depth
- Potential depth
- Projection depth
- Convex-hull-peeling depth
- Simplicial depth
- Simplicial volume depth
- Spatial depth
- Zonoid depth
- Modify dataset