STUMPY API#
Have A Question?#
Overview
Compute the z-normalized matrix profile |
|
Compute the z-normalized matrix profile with a |
|
Compute the z-normalized matrix profile with one or more GPU devices |
|
Compute the distance profile using the MASS algorithm |
|
A class to ompute an approximate z-normalized matrix profile |
|
A class to compute an incremental z-normalized matrix profile for streaming data |
|
Compute the multi-dimensional z-normalized matrix profile |
|
Compute the multi-dimensional z-normalized matrix profile with a |
|
Compute the |
|
Compute the multi-dimensional number of bits needed to compress one multi-dimensional subsequence with another along each of the |
|
Compute the anchored time series chain (ATSC) |
|
Compute the all-chain set (ALLC) |
|
Compute the Fast Low-cost Unipotent Semantic Segmentation (FLUSS) for static data (i.e., batch processing) |
|
A class to compute the Fast Low-cost Online Semantic Segmentation (FLOSS) for streaming data |
|
Find the z-normalized consensus motif of multiple time series |
|
Find the z-normalized consensus motif of multiple time series with a |
|
Find the z-normalized consensus motif of multiple time series with one or more GPU devices |
|
Compute the z-normalized matrix profile distance (MPdist) measure between any two time series |
|
Compute the z-normalized matrix profile distance (MPdist) measure between any two time series with a |
|
Compute the z-normalized matrix profile distance (MPdist) measure between any two time series with one or more GPU devices |
|
Discover the top motifs for time series |
|
Find all matches of a query |
|
Discover the top motifs for the multi-dimensional time series |
|
Identify the top |
|
A class to compute the Pan Matrix Profile |
|
A class to compute the Pan Matrix Profile with a |
|
A class to compute the Pan Matrix Profile with with one or more GPU devices |
stump#
- stumpy.stump(T_A, m, T_B=None, ignore_trivial=True, normalize=True, p=2.0, k=1, T_A_subseq_isconstant=None, T_B_subseq_isconstant=None)[source]#
Compute the z-normalized matrix profile
This is a convenience wrapper around the Numba JIT-compiled parallelized
_stumpfunction which computes the (top-k) matrix profile according to STOMPopt with Pearson correlations.- Parameters:
- T_Anumpy.ndarray
The time series or sequence for which to compute the matrix profile.
- mint
Window size.
- T_Bnumpy.ndarray, default None
The time series or sequence that will be used to annotate
T_A. For every subsequence inT_A, its nearest neighbor inT_Bwill be recorded. Default isNonewhich corresponds to a self-join.- ignore_trivialbool, default True
Set to
Trueif this is a self-join. Otherwise, for AB-join, set this toFalse.- normalizebool, default True
When set to
True, this z-normalizes subsequences prior to computing distances. Otherwise, this function gets re-routed to its complementary non-normalized equivalent set in the@core.non_normalizedfunction decorator.- pfloat, default 2.0
The p-norm to apply for computing the Minkowski distance. Minkowski distance is typically used with
pbeing1or2, which correspond to the Manhattan distance and the Euclidean distance, respectively. This parameter is ignored whennormalize == True.- kint, default 1
The number of top
ksmallest distances used to construct the matrix profile. Note that this will increase the total computational time and memory usage whenk > 1. If you have access to a GPU device, then you may be able to leveragegpu_stumpfor better performance and scalability.- T_A_subseq_isconstantnumpy.ndarray or function, default None
A boolean array that indicates whether a subsequence in
T_Ais constant (True). Alternatively, a custom, user-defined function that returns a boolean array that indicates whether a subsequence inT_Ais constant (True). The function must only take two arguments,a, a 1-D array, andw, the window size, while additional arguments may be specified by currying the user-defined function usingfunctools.partial. Any subsequence with at least onenp.nan/np.infwill automatically have its corresponding value set toFalsein this boolean array.- T_B_subseq_isconstantnumpy.ndarray or function, default None
A boolean array that indicates whether a subsequence in
T_Bis constant (True). Alternatively, a custom, user-defined function that returns a boolean array that indicates whether a subsequence inT_Bis constant (True). The function must only take two arguments,a, a 1-D array, andw, the window size, while additional arguments may be specified by currying the user-defined function usingfunctools.partial. Any subsequence with at least onenp.nan/np.infwill automatically have its corresponding value set toFalsein this boolean array.
- Returns:
- outnumpy.ndarray
When
k = 1(default), the first column consists of the matrix profile, the second column consists of the matrix profile indices, the third column consists of the left matrix profile indices, and the fourth column consists of the right matrix profile indices. However, whenk > 1, the output array will contain exactly2 * k + 2columns. The firstkcolumns (i.e.,out[:, :k]) consists of the top-k matrix profile, the next set ofkcolumns (i.e.,out[:, k : 2 * k]) consists of the corresponding top-k matrix profile indices, and the last two columns (i.e.,out[:, 2 * k]andout[:, 2 * k + 1]or, equivalently,out[:, -2]andout[:, -1]) correspond to the top-1 left matrix profile indices and the top-1 right matrix profile indices, respectively.For convenience, the matrix profile (distances) and matrix profile indices can also be accessed via their corresponding named array attributes,
.P_and.I_,respectively. Similarly, the corresponding left matrix profile indices and right matrix profile indices may also be accessed via the.left_I_and.right_I_array attributes. See examples below.
See also
stumpy.stumpedCompute the z-normalized matrix profile with a
dask/rayclusterstumpy.gpu_stumpCompute the z-normalized matrix profile with one or more GPU devices
stumpy.scrumpCompute an approximate z-normalized matrix profile
Notes
DOI: 10.1007/s10115-017-1138-x
See Section 4.5
The above reference outlines a general approach for traversing the distance matrix in a diagonal fashion rather than in a row-wise fashion.
See Section 3.1 and Section 3.3
The above reference outlines the use of the Pearson correlation via Welford’s centered sum-of-products along each diagonal of the distance matrix in place of the sliding window dot product found in the original STOMP method.
See Table II
Timeseries,
T_A, will be annotated with the distance location (or index) of all its subsequences in another times series,T_B.Return: For every subsequence,
Q, inT_A, you will get a distance and index for the closest subsequence inT_B. Thus, the array returned will have lengthT_A.shape[0] - m + 1. Additionally, the left and right matrix profiles are also returned.Note: Unlike in the Table II where
T_A.shapeis expected to be equal toT_B.shape, this implementation is generalized so that the shapes ofT_AandT_Bcan be different. In the case whereT_A.shape == T_B.shape, then our algorithm reduces down to the same algorithm found in Table II.Additionally, unlike STAMP where the exclusion zone is
m/2, the default exclusion zone for STOMP ism/4 (See Definition 3 and Figure 3).For self-joins, set
ignore_trivial = Truein order to avoid the trivial match.Note that left and right matrix profiles are only available for self-joins.
Examples
>>> import stumpy >>> import numpy as np >>> mp = stumpy.stump(np.array([584., -11., 23., 79., 1001., 0., -19.]), m=3) >>> mp mparray([[0.11633857113691416, 4, -1, 4], [2.694073918063438, 3, -1, 3], [3.0000926340485923, 0, 0, 4], [2.694073918063438, 1, 1, -1], [0.11633857113691416, 0, 0, -1]], dtype=object) >>> >>> mp.P_ mparray([0.11633857, 2.69407392, 3.00009263, 2.69407392, 0.11633857]) >>> mp.I_ mparray([4, 3, 0, 1, 0])
stumped#
- stumpy.stumped(client, T_A, m, T_B=None, ignore_trivial=True, normalize=True, p=2.0, k=1, T_A_subseq_isconstant=None, T_B_subseq_isconstant=None)[source]#
Compute the z-normalized matrix profile with a
dask/rayclusterThis is a highly distributed implementation around the Numba JIT-compiled parallelized
_stumpfunction which computes the (top-k) matrix profile according to STOMPopt with Pearson correlations.- Parameters:
- clientclient
A
dask/rayclient. Setting up a cluster is beyond the scope of this library. Please refer to thedask/raydocumentation.- T_Anumpy.ndarray
The time series or sequence for which to compute the matrix profile.
- mint
Window size.
- T_Bnumpy.ndarray, default None
The time series or sequence that will be used to annotate
T_A. For every subsequence inT_A, its nearest neighbor inT_Bwill be recorded. Default isNonewhich corresponds to a self-join.- ignore_trivialbool, default True
Set to
Trueif this is a self-join. Otherwise, for AB-join, set this toFalse.- normalizebool, default True
When set to
True, this z-normalizes subsequences prior to computing distances. Otherwise, this function gets re-routed to its complementary non-normalized equivalent set in the@core.non_normalizedfunction decorator.- pfloat, default 2.0
The p-norm to apply for computing the Minkowski distance. Minkowski distance is typically used with
pbeing1or2, which correspond to the Manhattan distance and the Euclidean distance, respectively. This parameter is ignored whennormalize == True.- kint, default 1
The number of top
ksmallest distances used to construct the matrix profile. Note that this will increase the total computational time and memory usage whenk > 1. If you have access to a GPU device, then you may be able to leveragegpu_stumpfor better performance and scalability.- T_A_subseq_isconstantnumpy.ndarray or function, default None
A boolean array that indicates whether a subsequence in
T_Ais constant (True). Alternatively, a custom, user-defined function that returns a boolean array that indicates whether a subsequence inT_Ais constant (True). The function must only take two arguments,a, a 1-D array, andw, the window size, while additional arguments may be specified by currying the user-defined function usingfunctools.partial. Any subsequence with at least onenp.nan/np.infwill automatically have its corresponding value set toFalsein this boolean array.- T_B_subseq_isconstantnumpy.ndarray or function, default None
A boolean array that indicates whether a subsequence in
T_Bis constant (True). Alternatively, a custom, user-defined function that returns a boolean array that indicates whether a subsequence inT_Bis constant (True). The function must only take two arguments,a, a 1-D array, andw, the window size, while additional arguments may be specified by currying the user-defined function usingfunctools.partial. Any subsequence with at least onenp.nan/np.infwill automatically have its corresponding value set toFalsein this boolean array.
- Returns:
- outnumpy.ndarray
When
k = 1(default), the first column consists of the matrix profile, the second column consists of the matrix profile indices, the third column consists of the left matrix profile indices, and the fourth column consists of the right matrix profile indices. However, whenk > 1, the output array will contain exactly2 * k + 2columns. The firstkcolumns (i.e.,out[:, :k]) consists of the top-k matrix profile, the next set ofkcolumns (i.e.,out[:, k : 2 * k]) consists of the corresponding top-k matrix profile indices, and the last two columns (i.e.,out[:, 2 * k]andout[:, 2 * k + 1]or, equivalently,out[:, -2]andout[:, -1]) correspond to the top-1 left matrix profile indices and the top-1 right matrix profile indices, respectively.For convenience, the matrix profile (distances) and matrix profile indices can also be accessed via their corresponding named array attributes,
.P_and.I_,respectively. Similarly, the corresponding left matrix profile indices and right matrix profile indices may also be accessed via the.left_I_and.right_I_array attributes. See examples below.
See also
stumpy.stumpCompute the z-normalized matrix profile cluster
stumpy.gpu_stumpCompute the z-normalized matrix profile with one or more GPU devices
stumpy.scrumpCompute an approximate z-normalized matrix profile
Notes
DOI: 10.1007/s10115-017-1138-x
See Section 4.5
The above reference outlines a general approach for traversing the distance matrix in a diagonal fashion rather than in a row-wise fashion.
See Section 3.1 and Section 3.3
The above reference outlines the use of the Pearson correlation via Welford’s centered sum-of-products along each diagonal of the distance matrix in place of the sliding window dot product found in the original STOMP method.
See Table II
This is a
dask/rayimplementation of stump that scales across multiple servers and is a convenience wrapper around the parallelizedstump._stumpfunctionTimeseries,
T_A, will be annotated with the distance location (or index) of all its subsequences in another times series,T_B.Return: For every subsequence,
Q, inT_A, you will get a distance and index for the closest subsequence inT_B. Thus, the array returned will have lengthT_A.shape[0] - m + 1. Additionally, the left and right matrix profiles are also returned.Note: Unlike in the Table II where
T_A.shapeis expected to be equal toT_B.shape, this implementation is generalized so that the shapes ofT_AandT_Bcan be different. In the case whereT_A.shape == T_B.shape, then our algorithm reduces down to the same algorithm found in Table II.Additionally, unlike STAMP where the exclusion zone is
m/2, the default exclusion zone for STOMP ism/4 (See Definition 3 and Figure 3).For self-joins, set
ignore_trivial = Truein order to avoid the trivial match.Note that left and right matrix profiles are only available for self-joins.
Examples
>>> import stumpy >>> import numpy as np >>> from dask.distributed import Client >>> if __name__ == "__main__": ... with Client() as dask_client: ... stumpy.stumped( ... dask_client, ... np.array([584., -11., 23., 79., 1001., 0., -19.]), ... m=3) mparray([[0.11633857113691416, 4, -1, 4], [2.694073918063438, 3, -1, 3], [3.0000926340485923, 0, 0, 4], [2.694073918063438, 1, 1, -1], [0.11633857113691416, 0, 0, -1]], dtype=object) >>> >>> mp.P_ mparray([0.11633857, 2.69407392, 3.00009263, 2.69407392, 0.11633857]) >>> mp.I_ mparray([4, 3, 0, 1, 0])
Alternatively, you can also use ray
>>> import ray >>> if __name__ == "__main__": >>> ray.init() >>> stumpy.stumped( ... ray, ... np.array([584., -11., 23., 79., 1001., 0., -19.]), ... m=3) >>> ray.shutdown()
gpu_stump#
- stumpy.gpu_stump(T_A, m, T_B=None, ignore_trivial=True, device_id=0, normalize=True, p=2.0, k=1, T_A_subseq_isconstant=None, T_B_subseq_isconstant=None)#
Compute the z-normalized matrix profile with one or more GPU devices
This is a convenience wrapper around the Numba
cuda.jit_gpu_stumpfunction which computes the matrix profile according to GPU-STOMP. The default number of threads-per-block is set to512and may be changed by setting the global parameterconfig.STUMPY_THREADS_PER_BLOCKto an appropriate number based on your GPU hardware.- Parameters:
- T_Anumpy.ndarray
The time series or sequence for which to compute the matrix profile.
- mint
Window size.
- T_Bnumpy.ndarray, default None
The time series or sequence that will be used to annotate
T_A. For every subsequence inT_A, its nearest neighbor inT_Bwill be recorded. Default isNonewhich corresponds to a self-join.- ignore_trivialbool, default True
Set to
Trueif this is a self-join. Otherwise, for AB-join, set this toFalse.- device_idint or list, default 0
The (GPU) device number to use. The default value is
0. A list of valid device ids (int) may also be provided for parallel GPU-STUMP computation. A list of all valid device ids can be obtained by executing[device.id for device in numba.cuda.list_devices()].- normalizebool, default True
When set to
True, this z-normalizes subsequences prior to computing distances. Otherwise, this function gets re-routed to its complementary non-normalized equivalent set in the@core.non_normalizedfunction decorator.- pfloat, default 2.0
The p-norm to apply for computing the Minkowski distance. Minkowski distance is typically used with
pbeing1or2, which correspond to the Manhattan distance and the Euclidean distance, respectively. This parameter is ignored whennormalize == True.- kint, default 1
The number of top
ksmallest distances used to construct the matrix profile. Note that this will increase the total computational time and memory usage whenk > 1.- T_A_subseq_isconstantnumpy.ndarray or function, default None
A boolean array that indicates whether a subsequence in
T_Ais constant (True). Alternatively, a custom, user-defined function that returns a boolean array that indicates whether a subsequence inT_Ais constant (True). The function must only take two arguments,a, a 1-D array, andw, the window size, while additional arguments may be specified by currying the user-defined function usingfunctools.partial. Any subsequence with at least onenp.nan/np.infwill automatically have its corresponding value set toFalsein this boolean array.- T_B_subseq_isconstantnumpy.ndarray or function, default None
A boolean array that indicates whether a subsequence in
T_Bis constant (True). Alternatively, a custom, user-defined function that returns a boolean array that indicates whether a subsequence inT_Bis constant (True). The function must only take two arguments,a, a 1-D array, andw, the window size, while additional arguments may be specified by currying the user-defined function usingfunctools.partial. Any subsequence with at least onenp.nan/np.infwill automatically have its corresponding value set toFalsein this boolean array.
- Returns:
- outnumpy.ndarray
When
k = 1(default), the first column consists of the matrix profile, the second column consists of the matrix profile indices, the third column consists of the left matrix profile indices, and the fourth column consists of the right matrix profile indices. However, whenk > 1, the output array will contain exactly2 * k + 2columns. The firstkcolumns (i.e.,out[:, :k]) consists of the top-k matrix profile, the next set ofkcolumns (i.e.,out[:, k : 2 * k]) consists of the corresponding top-k matrix profile indices, and the last two columns (i.e.,out[:, 2 * k]andout[:, 2 * k + 1]or, equivalently,out[:, -2]andout[:, -1]) correspond to the top-1 left matrix profile indices and the top-1 right matrix profile indices, respectively.For convenience, the matrix profile (distances) and matrix profile indices can also be accessed via their corresponding named array attributes,
.P_and.I_,respectively. Similarly, the corresponding left matrix profile indices and right matrix profile indices may also be accessed via the.left_I_and.right_I_array attributes. See examples below.
See also
stumpy.stumpCompute the z-normalized matrix profile
stumpy.stumpedCompute the z-normalized matrix profile with a
dask/rayclusterstumpy.scrumpCompute an approximate z-normalized matrix profile
Notes
See Table II, Figure 5, and Figure 6
Timeseries,
T_A, will be annotated with the distance location (or index) of all its subsequences in another times series,T_B.Return: For every subsequence,
Q, inT_A, you will get a distance and index for the closest subsequence inT_B. Thus, the array returned will have lengthT_A.shape[0] - m + 1. Additionally, the left and right matrix profiles are also returned.Note: Unlike in the Table II where
T_A.shapeis expected to be equal toT_B.shape, this implementation is generalized so that the shapes ofT_AandT_Bcan be different. In the case whereT_A.shape == T_B.shape, then our algorithm reduces down to the same algorithm found in Table II.Additionally, unlike STAMP where the exclusion zone is
m/2, the default exclusion zone for STOMP ism/4 (See Definition 3 and Figure 3).For self-joins, set
ignore_trivial = Truein order to avoid the trivial match.Note that left and right matrix profiles are only available for self-joins.
Examples
>>> import stumpy >>> import numpy as np >>> from numba import cuda >>> if __name__ == "__main__": ... all_gpu_devices = [device.id for device in cuda.list_devices()] ... mp = stumpy.gpu_stump( ... np.array([584., -11., 23., 79., 1001., 0., -19.]), ... m=3, ... device_id=all_gpu_devices) >>> mp mparray([[0.11633857113691416, 4, -1, 4], [2.694073918063438, 3, -1, 3], [3.0000926340485923, 0, 0, 4], [2.694073918063438, 1, 1, -1], [0.11633857113691416, 0, 0, -1]], dtype=object) >>> >>> mp.P_ mparray([0.11633857, 2.69407392, 3.00009263, 2.69407392, 0.11633857]) >>> mp.I_ mparray([4, 3, 0, 1, 0])
mass#
- stumpy.mass(Q, T, M_T=None, Σ_T=None, normalize=True, p=2.0, T_subseq_isfinite=None, T_subseq_isconstant=None, Q_subseq_isconstant=None, query_idx=None)[source]#
Compute the distance profile using the MASS algorithm
This is a convenience wrapper around the Numba JIT compiled _mass function.
- Parameters:
- Qnumpy.ndarray
Query array or subsequence.
- Tnumpy.ndarray
Time series or sequence.
- M_Tnumpy.ndarray, default None
Sliding mean of
T.- Σ_Tnumpy.ndarray, default None
Sliding standard deviation of
T.- normalizebool, default True
When set to
True, this z-normalizes subsequences prior to computing distances. Otherwise, this function gets re-routed to its complementary non-normalized equivalent set in the@core.non_normalizedfunction decorator.- pfloat, default 2.0
The p-norm to apply for computing the Minkowski distance. This parameter is ignored when
normalize == True.- T_subseq_isfinitenumpy.ndarray, default None
A boolean array that indicates whether a subsequence in
Tcontains anp.nan/np.infvalue (False). This parameter is ignored whennormalize == True.- T_subseq_isconstantnumpy.ndarray or function, default None
A boolean array that indicates whether a subsequence in
Tis constant (True). Alternatively, a custom, user-defined function that returns a boolean array that indicates whether a subsequence inTis constant (True). The function must only take two arguments,a, a 1-D array, andw, the window size, while additional arguments may be specified by currying the user-defined function usingfunctools.partial. Any subsequence with at least onenp.nan/np.infwill automatically have its corresponding value set toFalsein this boolean array.- Q_subseq_isconstantnumpy.ndarray or function, default None
A boolean array that indicates whether the subsequence in
Qis constant (True). Alternatively, a custom, user-defined function that returns a boolean array that indicates whether the subsequence inQis constant (True). The function must only take two arguments,a, a 1-D array, andw, the window size, while additional arguments may be specified by currying the user-defined function usingfunctools.partial. Any subsequence with at least onenp.nan/np.infwill automatically have its corresponding value set toFalsein this boolean array.- query_idxint, default None
This is the index position along the time series,
T, where the query subsequence,Q, is located.query_idxshould be set toNoneifQis not a subsequence ofT. IfQis a subsequence ofT, provding this argument is optional. Ifquery_idxis provided, the distance betweenQandT[query_idx : query_idx + m]will automatically be set to zero.
- Returns:
- distance_profilenumpy.ndarray
Distance profile.
See also
stumpy.motifsDiscover the top motifs for time series
Tstumpy.matchFind all matches of a query
Qin a time seriesT
Notes
See Table II
Note that
Q,Tare not directly required to calculateDNote: Unlike the Matrix Profile I paper, here,
M_T,Σ_Tcan be calculated once for all subsequences ofTand passed in so the redundancy is removedExamples
>>> import stumpy >>> import numpy as np >>> stumpy.mass( ... np.array([-11.1, 23.4, 79.5, 1001.0]), ... np.array([584., -11., 23., 79., 1001., 0., -19.])) array([3.18792463e+00, 1.11297393e-03, 3.23874018e+00, 3.34470195e+00])
scrump#
- stumpy.scrump(T_A, m, T_B=None, ignore_trivial=True, percentage=0.01, pre_scrump=False, s=None, normalize=True, p=2.0, k=1, T_A_subseq_isconstant=None, T_B_subseq_isconstant=None)[source]#
A class to ompute an approximate z-normalized matrix profile
This is a convenience wrapper around the Numba JIT-compiled parallelized
_stumpfunction which computes the matrix profile according to SCRIMP.- Parameters:
- T_Anumpy.ndarray
The time series or sequence for which to compute the matrix profile.
- T_Bnumpy.ndarray
The time series or sequence that will be used to annotate
T_A. For every subsequence inT_A, its nearest neighbor inT_Bwill be recorded.- mint
Window size.
- ignore_trivialbool
Set to
Trueif this is a self-join. Otherwise, for AB-join, set this toFalse.- percentagefloat
Approximate percentage completed. The value is between
0.0and1.0.- pre_scrumpbool
A flag for whether or not to perform the PreSCRIMP calculation prior to computing SCRIMP. If set to
True, this is equivalent to computing SCRIMP++ and may lead to faster convergence- sint
The size of the PreSCRIMP fixed interval. If
pre_scrump = Trueands = None, thenswill automatically be set tos = int(np.ceil(m / config.STUMPY_EXCL_ZONE_DENOM)), which is the size of the exclusion zone.- normalizebool, default True
When set to
True, this z-normalizes subsequences prior to computing distances. Otherwise, this class gets re-routed to its complementary non-normalized equivalent set in the@core.non_normalizedclass decorator.- pfloat, default 2.0
The p-norm to apply for computing the Minkowski distance. Minkowski distance is typically used with
pbeing1or2, which correspond to the Manhattan distance and the Euclidean distance, respectively. This parameter is ignored whennormalize == True.- kint, default 1
The number of top
ksmallest distances used to construct the matrix profile. Note that this will increase the total computational time and memory usage whenk > 1.- T_A_subseq_isconstantnumpy.ndarray or function, default None
A boolean array that indicates whether a subsequence in
T_Ais constant (True). Alternatively, a custom, user-defined function that returns a boolean array that indicates whether a subsequence inT_Ais constant (True). The function must only take two arguments,a, a 1-D array, andw, the window size, while additional arguments may be specified by currying the user-defined function usingfunctools.partial. Any subsequence with at least onenp.nan/np.infwill automatically have its corresponding value set toFalsein this boolean array.- T_B_subseq_isconstantnumpy.ndarray or function, default None
A boolean array that indicates whether a subsequence in
T_Bis constant (True). Alternatively, a custom, user-defined function that returns a boolean array that indicates whether a subsequence inT_Bis constant (True). The function must only take two arguments,a, a 1-D array, andw, the window size, while additional arguments may be specified by currying the user-defined function usingfunctools.partial. Any subsequence with at least onenp.nan/np.infwill automatically have its corresponding value set toFalsein this boolean array.
- Attributes:
P_numpy.ndarrayGet the updated (top-k) matrix profile.
I_numpy.ndarrayGet the updated (top-k) matrix profile indices.
left_I_numpy.ndarrayGet the updated left (top-1) matrix profile indices
right_I_numpy.ndarrayGet the updated right (top-1) matrix profile indices
Methods
update()
Update the matrix profile and the matrix profile indices by computing additional new distances (limited by
percentage) that make up the full distance matrix. It updates the (top-k) matrix profile, (top-1) left matrix profile, (top-1) right matrix profile, (top-k) matrix profile indices, (top-1) left matrix profile indices, and (top-1) right matrix profile indices.See also
stumpy.stumpCompute the z-normalized matrix profile
stumpy.stumpedCompute the z-normalized matrix profile with a
dask/rayclusterstumpy.gpu_stumpCompute the z-normalized matrix profile with one or more GPU devices
Notes
See Algorithm 1 and Algorithm 2
Examples
>>> import stumpy >>> import numpy as np >>> approx_mp = stumpy.scrump( ... np.array([584., -11., 23., 79., 1001., 0., -19.]), ... m=3) >>> approx_mp.update() >>> approx_mp.P_ array([2.982409 , 3.28412702, inf, 2.982409 , 3.28412702]) >>> approx_mp.I_ array([ 3, 4, -1, 0, 1])
stumpi#
- stumpy.stumpi(T, m, egress=True, normalize=True, p=2.0, k=1, mp=None, T_subseq_isconstant_func=None)[source]#
A class to compute an incremental z-normalized matrix profile for streaming data
This is based on the on-line STOMPI and STAMPI algorithms.
- Parameters:
- Tnumpy.ndarray
The time series or sequence for which the matrix profile and matrix profile indices will be returned.
- mint
Window size.
- egressbool, default True
If set to
True, the oldest data point in the time series is removed and the time series length remains constant rather than forever increasing- normalizebool, default True
When set to
True, this z-normalizes subsequences prior to computing distances. Otherwise, this class gets re-routed to its complementary non-normalized equivalent set in the@core.non_normalizedclass decorator.- pfloat, default 2.0
The p-norm to apply for computing the Minkowski distance. This parameter is ignored when
normalize == True.- kint, default 1
The number of top
ksmallest distances used to construct the matrix profile. Note that this will increase the total computational time and memory usage whenk > 1.- mpnumpy.ndarray, default None
A pre-computed matrix profile (and corresponding matrix profile indices). This is a 2D array of shape
(len(T) - m + 1, 2 * k + 2), where the firstkcolumns are top-k matrix profile, and the nextkcolumns are their corresponding indices. The last two columns correspond to the top-1 left and top-1 right matrix profile indices. WhenNone(default), this array is computed internally usingstumpy.stump.- T_subseq_isconstant_funcfunction, default None
A custom, user-defined function that returns a boolean array that indicates whether a subsequence in
Tis constant (True). The function must only take two arguments,a, a 1-D array, andw, the window size, while additional arguments may be specified by currying the user-defined function usingfunctools.partial. Any subsequence with at least onenp.nan/np.infwill automatically have its corresponding value set toFalsein this boolean array.
- Attributes:
P_numpy.ndarrayGet the (top-k) matrix profile.
I_numpy.ndarrayGet the (top-k) matrix profile indices.
left_P_numpy.ndarrayGet the (top-1) left matrix profile
left_I_numpy.ndarrayGet the (top-1) left matrix profile indices
T_numpy.ndarrayGet the time series
Methods
update(t)
Append a single new data point,
t, to the time series,T, and update the matrix profile.Notes
DOI: 10.1007/s10618-017-0519-9
See Table V
Note that line 11 is missing an important
sqrtoperation!Examples
>>> import stumpy >>> import numpy as np >>> stream = stumpy.stumpi( ... np.array([584., -11., 23., 79., 1001., 0.]), ... m=3) >>> stream.update(-19.0) >>> stream.left_P_ array([ inf, 3.00009263, 2.69407392, 3.05656417]) >>> stream.left_I_ array([-1, 0, 1, 2])
mstump#
- stumpy.mstump(T, m, include=None, discords=False, normalize=True, p=2.0, T_subseq_isconstant=None)[source]#
Compute the multi-dimensional z-normalized matrix profile
This is a convenience wrapper around the Numba JIT-compiled parallelized
_mstumpfunction which computes the multi-dimensional matrix profile and multi-dimensional matrix profile index according to mSTOMP, a variant of mSTAMP. Note that only self-joins are supported.- Parameters:
- Tnumpy.ndarray
The time series or sequence for which to compute the multi-dimensional matrix profile. Each row in
Trepresents data from the same dimension while each column inTrepresents data from a different dimension.- mint
Window size.
- includelist, numpy.ndarray, default None
A list of (zero-based) indices corresponding to the dimensions in
Tthat must be included in the constrained multidimensional motif search. For more information, see Section IV D in:- discordsbool, default False
When set to
True, this reverses the distance matrix which results in a multi-dimensional matrix profile that favors larger matrix profile values (i.e., discords) rather than smaller values (i.e., motifs). Note that indices inincludeare still maintained and respected.- normalizebool, default True
When set to
True, this z-normalizes subsequences prior to computing distances. Otherwise, this function gets re-routed to its complementary non-normalized equivalent set in the@core.non_normalizedfunction decorator.- pfloat, default 2.0
The p-norm to apply for computing the Minkowski distance. Minkowski distance is typically used with
pbeing1or2, which correspond to the Manhattan distance and the Euclidean distance, respectively. This parameter is ignored whennormalize == True.- T_subseq_isconstantnumpy.ndarray, function, or list, default None
A parameter that is used to show whether a subsequence of a time series in
Tis constant (True) or not.T_subseq_isconstantcan be a 2D booleannumpy.ndarrayor a function that can be applied to each time series inT. Alternatively, for maximum flexibility, a list (with length equal to the total number of time series) may also be used. In this case,T_subseq_isconstant[i]corresponds to thei-th time seriesT[i]and each element in the list can either be a 1D booleannumpy.ndarray, a function, orNone.
- Returns:
- Pnumpy.ndarray
The multi-dimensional matrix profile. Each row of the array corresponds to each matrix profile for a given dimension (i.e., the first row is the 1-D matrix profile and the second row is the 2-D matrix profile).
- Inumpy.ndarray
The multi-dimensional matrix profile index where each row of the array corresponds to each matrix profile index for a given dimension.
See also
stumpy.mstumpedCompute the multi-dimensional z-normalized matrix profile with a
dask/rayclusterstumpy.subspaceCompute the k-dimensional matrix profile subspace for a given subsequence index and its nearest neighbor index
stumpy.mdlCompute the number of bits needed to compress one array with another using the minimum description length (MDL)
Notes
See mSTAMP Algorithm
Examples
>>> stumpy.mstump( ... np.array([[584., -11., 23., 79., 1001., 0., -19.], ... [ 1., 2., 4., 8., 16., 0., 32.]]), ... m=3) (array([[0. , 1.43947142, 0. , 2.69407392, 0.11633857], [0.777905 , 2.36179922, 1.50004632, 2.92246722, 0.777905 ]]), array([[2, 4, 0, 1, 0], [4, 4, 0, 1, 0]]))
mstumped#
- stumpy.mstumped(client, T, m, include=None, discords=False, p=2.0, normalize=True, T_subseq_isconstant=None)[source]#
Compute the multi-dimensional z-normalized matrix profile with a
dask/rayclusterThis is a highly distributed implementation around the Numba JIT-compiled parallelized
_mstumpfunction which computes the multi-dimensional matrix profile according to STOMP. Note that only self-joins are supported.- Parameters:
- clientclient
A
dask/rayclient. Setting up a cluster is beyond the scope of this library. Please refer to thedask/raydocumentation.- Tnumpy.ndarray
The time series or sequence for which to compute the multi-dimensional matrix profile. Each row in
Trepresents data from the same dimension while each column inTrepresents data from a different dimension.- mint
Window size.
- includelist, numpy.ndarray, default None
A list of (zero-based) indices corresponding to the dimensions in
Tthat must be included in the constrained multidimensional motif search. For more information, see Section IV D in:- discordsbool, default False
When set to
True, this reverses the distance matrix which results in a multi-dimensional matrix profile that favors larger matrix profile values (i.e., discords) rather than smaller values (i.e., motifs). Note that indices in include are still maintained and respected.- pfloat, default 2.0
The p-norm to apply for computing the Minkowski distance. Minkowski distance is typically used with
pbeing1or2, which correspond to the Manhattan distance and the Euclidean distance, respectively.- normalizebool, default True
When set to
True, this z-normalizes subsequences prior to computing distances. Otherwise, this function gets re-routed to its complementary non-normalized equivalent set in the@core.non_normalizedfunction decorator.- T_subseq_isconstantnumpy.ndarray, function, or list, default None
A parameter that is used to show whether a subsequence of a time series in
Tis constant (True) or not.T_subseq_isconstantcan be a 2D booleannumpy.ndarrayor a function that can be applied to each time series inT. Alternatively, for maximum flexibility, a list (with length equal to the total number of time series) may also be used. In this case,T_subseq_isconstant[i]corresponds to thei-th time seriesT[i]and each element in the list can either be a 1D booleannumpy.ndarray, a function, orNone.
- Returns:
- Pnumpy.ndarray
The multi-dimensional matrix profile. Each row of the array corresponds to each matrix profile for a given dimension (i.e., the first row is the 1-D matrix profile and the second row is the 2-D matrix profile).
- Inumpy.ndarray
The multi-dimensional matrix profile index where each row of the array corresponds to each matrix profile index for a given dimension.
See also
stumpy.mstumpCompute the multi-dimensional z-normalized matrix profile
stumpy.subspaceCompute the k-dimensional matrix profile subspace for a given subsequence index and its nearest neighbor index
stumpy.mdlCompute the number of bits needed to compress one array with another using the minimum description length (MDL)
Notes
See mSTAMP Algorithm
Examples
>>> import stumpy >>> import numpy as np >>> from dask.distributed import Client >>> if __name__ == "__main__": ... with Client() as dask_client: ... stumpy.mstumped( ... dask_client, ... np.array([[584., -11., 23., 79., 1001., 0., -19.], ... [ 1., 2., 4., 8., 16., 0., 32.]]), ... m=3) (array([[0. , 1.43947142, 0. , 2.69407392, 0.11633857], [0.777905 , 2.36179922, 1.50004632, 2.92246722, 0.777905 ]]), array([[2, 4, 0, 1, 0], [4, 4, 0, 1, 0]]))
Alternatively, you can also use ray
>>> import ray >>> if __name__ == "__main__": >>> ray.init() >>> stumpy.mstumped( ... ray, ... np.array([[584., -11., 23., 79., 1001., 0., -19.], ... [ 1., 2., 4., 8., 16., 0., 32.]]), ... m=3) >>> ray.shutdown()
subspace#
- stumpy.subspace(T, m, subseq_idx, nn_idx, k, include=None, discords=False, discretize_func=None, n_bit=8, normalize=True, p=2.0, T_subseq_isconstant=None)[source]#
Compute the
k-dimensional matrix profile subspace for a given subsequence index and its nearest neighbor index- Parameters:
- Tnumpy.ndarray
The time series or sequence for which the multi-dimensional matrix profile, multi-dimensional matrix profile indices were computed.
- mint
Window size.
- subseq_idxint
The subsequence index in
T.- nn_idxint
The nearest neighbor index in
T.- kint
The subset number of dimensions out of
D = T.shape[0]-dimensions to return the subspace for. Note that zero-based indexing is used.- includenumpy.ndarray, default None
A list of (zero-based) indices corresponding to the dimensions in
Tthat must be included in the constrained multidimensional motif search. For more information, see Section IV D in:- discordsbool, default False
When set to
True, this reverses the distance profile to favor discords rather than motifs. Note that indices inincludeare still maintained and respected.- discretize_funcfunc, default None
A function for discretizing each input array. When this is
None, an appropriate discretization function (based on thenormalizeparameter) will be applied.- n_bitint, default 8
The number of bits used for discretization. For more information on an appropriate value, see Figure 4 in:
and Figure 2 in:
- normalizebool, default True
When set to
True, this z-normalizes subsequences prior to computing distances. Otherwise, this function gets re-routed to its complementary non-normalized equivalent set in the@core.non_normalizedfunction decorator.- pfloat, default 2.0
The p-norm to apply for computing the Minkowski distance. Minkowski distance is typically used with
pbeing1or2, which correspond to the Manhattan distance and the Euclidean distance, respectively. This parameter is ignored whennormalize == True.- T_subseq_isconstantnumpy.ndarray, function, or list, default None
A parameter that is used to show whether a subsequence of a time series in
Tis constant (True) or not.T_subseq_isconstantcan be a 2D booleannumpy.ndarrayor a function that can be applied to each time series inT. Alternatively, for maximum flexibility, a list (with length equal to the total number of time series) may also be used. In this case,T_subseq_isconstant[i]corresponds to thei-th time seriesT[i]and each element in the list can either be a 1D booleannumpy.ndarray, a function, orNone.
- Returns:
- Snumpy.ndarray
An array that contains the (singular)
k-th-dimensional subspace for the subsequence with index equal tosubseq_idx. Note thatk + 1rows will be returned.
See also
stumpy.mstumpCompute the multi-dimensional z-normalized matrix profile
stumpy.mstumpedCompute the multi-dimensional z-normalized matrix profile with a
dask/rayclusterstumpy.mdlCompute the number of bits needed to compress one array with another using the minimum description length (MDL)
Examples
>>> import stumpy >>> import numpy as np >>> mps, indices = stumpy.mstump( ... np.array([[584., -11., 23., 79., 1001., 0., -19.], ... [ 1., 2., 4., 8., 16., 0., 32.]]), ... m=3) >>> motifs_idx = np.argsort(mps, axis=1)[:, :2] >>> k = 1 >>> stumpy.subspace( ... np.array([[584., -11., 23., 79., 1001., 0., -19.], ... [ 1., 2., 4., 8., 16., 0., 32.]]), ... m=3, ... subseq_idx=motifs_idx[k][0], ... nn_idx=indices[k][motifs_idx[k][0]], ... k=k) array([0, 1])
mdl#
- stumpy.mdl(T, m, subseq_idx, nn_idx, include=None, discords=False, discretize_func=None, n_bit=8, normalize=True, p=2.0, T_subseq_isconstant=None)[source]#
Compute the multi-dimensional number of bits needed to compress one multi-dimensional subsequence with another along each of the
k-dimensions using the minimum description length (MDL)- Parameters:
- Tnumpy.ndarray
The time series or sequence for which the multi-dimensional matrix profile, multi-dimensional matrix profile indices were computed.
- mint
Window size.
- subseq_idxnumpy.ndarray
The multi-dimensional subsequence indices in
T- nn_idxnumpy.ndarray
The multi-dimensional nearest neighbor index in
T- includenumpy.ndarray, default None
A list of (zero-based) indices corresponding to the dimensions in
Tthat must be included in the constrained multidimensional motif search. For more information, see Section IV D in:- discordsbool, default False
When set to
True, this reverses the distance profile to favor discords rather than motifs. Note that indices inincludeare still maintained and respected.- discretize_funcfunc, default None
A function for discretizing each input array. When this is
None, an appropriate discretization function (based on thenormalizationparameter) will be applied.- n_bitint, default 8
The number of bits used for discretization and for computing the bit size. For more information on an appropriate value, see Figure 4 in:
and Figure 2 in:
- normalizebool, default True
When set to
True, this z-normalizes subsequences prior to computing distances. Otherwise, this function gets re-routed to its complementary non-normalized equivalent set in the@core.non_normalizedfunction decorator.- pfloat, default 2.0
The p-norm to apply for computing the Minkowski distance. Minkowski distance is typically used with
pbeing1or2, which correspond to the Manhattan distance and the Euclidean distance, respectively. This parameter is ignored whennormalize == True.- T_subseq_isconstantnumpy.ndarray, function, or list, default None
A parameter that is used to show whether a subsequence of a time series in
Tis constant (True) or not.T_subseq_isconstantcan be a 2D booleannumpy.ndarrayor a function that can be applied to each time series inT. Alternatively, for maximum flexibility, a list (with length equal to the total number of time series) may also be used. In this case,T_subseq_isconstant[i]corresponds to thei-th time seriesT[i]and each element in the list can either be a 1D booleannumpy.ndarray, a function, orNone.
- Returns:
- bit_sizesnumpy.ndarray
The total number of bits computed from MDL for representing each pair of multidimensional subsequences.
- Slist
A list of numpy.ndarrays that contain the
k-th-dimensional subspaces.
See also
stumpy.mstumpCompute the multi-dimensional z-normalized matrix profile
stumpy.mstumpedCompute the multi-dimensional z-normalized matrix profile with a
dask/rayclusterstumpy.subspaceCompute the k-dimensional matrix profile subspace for a given subsequence index and its nearest neighbor index
Examples
>>> import stumpy >>> import numpy as np >>> mps, indices = stumpy.mstump( ... np.array([[584., -11., 23., 79., 1001., 0., -19.], ... [ 1., 2., 4., 8., 16., 0., 32.]]), ... m=3) >>> motifs_idx = np.argsort(mps, axis=1)[:, 0] >>> stumpy.mdl( ... np.array([[584., -11., 23., 79., 1001., 0., -19.], ... [ 1., 2., 4., 8., 16., 0., 32.]]), ... m=3, ... subseq_idx=motifs_idx, ... nn_idx=indices[np.arange(motifs_idx.shape[0]), motifs_idx]) (array([ 80. , 111.509775]), [array([1]), array([0, 1])])
atsc#
- stumpy.atsc(IL, IR, j)[source]#
Compute the anchored time series chain (ATSC)
Note that since the matrix profile indices,
ILandIR, are pre-computed, this function is agnostic to subsequence normalization.- Parameters:
- ILnumpy.ndarray
Left matrix profile indices.
- IRnumpy.ndarray
Right matrix profile indices.
- jint
The index value for which to compute the ATSC.
- Returns:
- outnumpy.ndarray
Anchored time series chain for index,
j
See also
stumpy.allcCompute the all-chain set (ALLC)
Notes
See Table I
This is the implementation for the anchored time series chains (ATSC).
Unlike the original paper, we’ve replaced the while-loop with a more stable for-loop.
Examples
>>> import stumpy >>> import numpy as np >>> mp = stumpy.stump(np.array([584., -11., 23., 79., 1001., 0., -19.]), m=3) >>> stumpy.atsc(mp[:, 2], mp[:, 3], 1) array([1, 3])
>>> # Alternative example using named attributes >>> >>> mp = stumpy.stump(np.array([584., -11., 23., 79., 1001., 0., -19.]), m=3) >>> stumpy.atsc(mp.left_I_, mp.right_I_, 1) array([1, 3])
allc#
- stumpy.allc(IL, IR)[source]#
Compute the all-chain set (ALLC)
Note that since the matrix profile indices,
ILandIR, are pre-computed, this function is agnostic to subsequence normalization.- Parameters:
- ILnumpy.ndarray
Left matrix profile indices.
- IRnumpy.ndarray
Right matrix profile indices.
- Returns:
- Slist(numpy.ndarray)
All-chain set.
- Cnumpy.ndarray
Anchored time series chain for the longest chain (also known as the unanchored chain). Note that when there are multiple different chains with length equal to
len(C), then only one chain from this set is returned. You may iterate over the all-chain set,S, to find all other possible chains with lengthlen(C).
See also
stumpy.atscCompute the anchored time series chain (ATSC)
Notes
See Table II
Unlike the original paper, we’ve replaced the while-loop with a more stable for-loop.
This is the implementation for the all-chain set (ALLC) and the unanchored chain is simply the longest one among the all-chain set. Both the all-chain set and unanchored chain are returned.
The all-chain set,
S, is returned as a list of unique numpy arrays.Examples
>>> import stumpy >>> import numpy as np >>> mp = stumpy.stump(np.array([584., -11., 23., 79., 1001., 0., -19.]), m=3) >>> stumpy.allc(mp[:, 2], mp[:, 3]) ([array([1, 3]), array([2]), array([0, 4])], array([0, 4]))
>>> # Alternative example using named attributes >>> >>> mp = stumpy.stump(np.array([584., -11., 23., 79., 1001., 0., -19.]), m=3) >>> stumpy.allc(mp.left_I_, mp.right_I_) ([array([1, 3]), array([2]), array([0, 4])], array([0, 4]))
fluss#
- stumpy.fluss(I, L, n_regimes, excl_factor=5, custom_iac=None)[source]#
Compute the Fast Low-cost Unipotent Semantic Segmentation (FLUSS) for static data (i.e., batch processing)
Essentially, this is a wrapper to compute the corrected arc curve and regime locations. Note that since the matrix profile indices,
I, are pre-computed, this function is agnostic to subsequence normalization.- Parameters:
- Inumpy.ndarray
The matrix profile indices for the time series of interest.
- Lint
The subsequence length that is set roughly to be one period length. This is likely to be the same value as the window size,
m, used to compute the matrix profile and matrix profile index but it can be different since this is only used to manage edge effects and has no bearing on any of the IAC or CAC core calculations.- n_regimesint
The number of regimes to search for. This is one more than the number of regime changes as denoted in the original paper.
- excl_factorint, default 5
The multiplying factor for the regime exclusion zone.
- custom_iacnumpy.ndarray, default None
A custom idealized arc curve (IAC) that will used for correcting the arc curve.
- Returns:
- cacnumpy.ndarray
A corrected arc curve (CAC).
- regime_locsnumpy.ndarray
The locations of the regimes.
See also
stumpy.flossCompute the Fast Low-Cost Online Semantic Segmentation (FLOSS) for streaming data
Notes
See Section A
This is the implementation for Fast Low-cost Unipotent Semantic Segmentation (FLUSS).
Examples
>>> import stumpy >>> import numpy as np >>> mp = stumpy.stump(np.array([584., -11., 23., 79., 1001., 0., -19.]), m=3) >>> stumpy.fluss(mp[:, 0], 3, 2) (array([1., 1., 1., 1., 1.]), array([0]))
>>> # Alternative example using named attributes >>> >>> mp = stumpy.stump(np.array([584., -11., 23., 79., 1001., 0., -19.]), m=3) >>> stumpy.fluss(mp.P_, 3, 2) (array([1., 1., 1., 1., 1.]), array([0]))
floss#
- stumpy.floss(mp, T, m, L, excl_factor=5, n_iter=1000, n_samples=1000, custom_iac=None, normalize=True, p=2.0, T_subseq_isconstant_func=None)[source]#
A class to compute the Fast Low-cost Online Semantic Segmentation (FLOSS) for streaming data
- Parameters:
- mpnumpy.ndarray
The first column consists of the matrix profile, the second column consists of the matrix profile indices, the third column consists of the left matrix profile indices, and the fourth column consists of the right matrix profile indices.
- Tnumpy.ndarray
A 1-D time series data used to generate the matrix profile and matrix profile indices found in
mp. Note that the the right matrix profile index is used and the right matrix profile is intelligently recomputed on the fly fromTinstead of using the bidirectional matrix profile.- mint
The window size for computing sliding window mass. This is identical to the window size used in the matrix profile calculation. For managing edge effects, see the
Lparameter.- Lint
The subsequence length that is set roughly to be one period length. This is likely to be the same value as the window size,
m, used to compute the matrix profile and matrix profile index but it can be different since this is only used to manage edge effects and has no bearing on any of the IAC or CAC core calculations.- excl_factorint, default 5
The multiplying factor for the regime exclusion zone. Note that this is unrelated to the
excl_zoneused in to compute the matrix profile.- n_iterint, default 1000
Number of iterations to average over when determining the parameters for the IAC beta distribution.
- n_samplesint, default 1000
Number of distribution samples to draw during each iteration when computing the IAC.
- custom_iacnumpy.ndarray, default None
A custom idealized arc curve (IAC) that will used for correcting the arc curve.
- normalizebool, default True
When set to
True, this z-normalizes subsequences prior to computing distances- pfloat, default 2.0
The p-norm to apply for computing the Minkowski distance. Minkowski distance is typically used with
pbeing1or2, which correspond to the Manhattan distance and the Euclidean distance, respectively. This parameter is ignored whennormalize == True.- T_subseq_isconstant_funcfunction, default None
A custom, user-defined function that returns a boolean array that indicates whether a subsequence in
Tis constant (True). The function must only take two arguments,a, a 1-D array, andw, the window size, while additional arguments may be specified by currying the user-defined function usingfunctools.partial. Any subsequence with at least onenp.nan/np.infwill automatically have its corresponding value set toFalsein this boolean array.
- Attributes:
cac_1d_numpy.ndarrayGet the updated 1-dimensional corrected arc curve (CAC_1D)
P_numpy.ndarrayGet the updated matrix profile
I_numpy.ndarrayGet the updated (right) matrix profile indices
T_numpy.ndarrayGet the updated time series, T
Methods
update(t)
Ingress a new data point,
t, onto the time series,T, followed by egressing the oldest single data point fromT. Then, update the 1-dimensional corrected arc curve (CAC_1D) and the matrix profile.See also
stumpy.flussCompute the Fast Low-cost Unipotent Semantic Segmentation (FLUSS) for static data (i.e., batch processing)
Notes
See Section C
This is the implementation for Fast Low-cost Online Semantic Segmentation (FLOSS).
Examples
>>> import stumpy >>> import numpy as np >>> mp = stumpy.stump(np.array([584., -11., 23., 79., 1001., 0.]), m=3) >>> stream = stumpy.floss( ... mp, ... np.array([584., -11., 23., 79., 1001., 0.]), ... m=3, ... L=3) >>> stream.update(19.) >>> stream.cac_1d_ array([1., 1., 1., 1.])
ostinato#
- stumpy.ostinato(Ts, m, normalize=True, p=2.0, Ts_subseq_isconstant=None)[source]#
Find the z-normalized consensus motif of multiple time series
This is a wrapper around the vanilla version of the ostinato algorithm which finds the best radius and a helper function that finds the most central conserved motif.
- Parameters:
- Tslist
A list of time series for which to find the most central consensus motif.
- mint
Window size.
- normalizebool, default True
When set to
True, this z-normalizes subsequences prior to computing distances. Otherwise, this function gets re-routed to its complementary non-normalized equivalent set in the@core.non_normalizedfunction decorator.- pfloat, default 2.0
The p-norm to apply for computing the Minkowski distance. Minkowski distance is typically used with
pbeing1or2, which correspond to the Manhattan distance and the Euclidean distance, respectively. This parameter is ignored whennormalize == True.- Ts_subseq_isconstantlist, default None
A list of rolling window isconstant for each time series in
Ts.
- Returns:
- central_radiusfloat
Radius of the most central consensus motif.
- central_Ts_idxint
The time series index in
Tsthat contains the most central consensus motif.- central_subseq_idxint
The subsequence index within time series
Ts[central_motif_Ts_idx]that contains the most central consensus motif.
See also
stumpy.ostinatoedFind the z-normalized consensus motif of multiple time series with a
dask/rayclusterstumpy.gpu_ostinatoFind the z-normalized consensus motif of multiple time series with one or more GPU devices
Notes
See Table 2
The ostinato algorithm proposed in the paper finds the best radius in
Ts. Intuitively, the radius is the minimum distance of a subsequence to encompass at least one nearest neighbor subsequence from all other time series. The best radius inTsis the minimum radius amongst all radii. Some data sets might contain multiple subsequences which have the same optimal radius. The greedy Ostinato algorithm only finds one of them, which might not be the most central motif. The most central motif amongst the subsequences with the best radius is the one with the smallest mean distance to nearest neighbors in all other time series. To find this central motif it is necessary to search the subsequences with the best radius viastumpy.ostinato._get_central_motif.Examples
>>> import stumpy >>> import numpy as np >>> stumpy.ostinato( ... [np.array([584., -11., 23., 79., 1001., 0., 19.]), ... np.array([600., -10., 23., 17.]), ... np.array([ 1., 9., 6., 0.])], ... m=3) (1.2370237678153826, 0, 4)
ostinatoed#
- stumpy.ostinatoed(client, Ts, m, normalize=True, p=2.0, Ts_subseq_isconstant=None)[source]#
Find the z-normalized consensus motif of multiple time series with a
dask/rayclusterThis is a wrapper around the vanilla version of the ostinato algorithm which finds the best radius and a helper function that finds the most central conserved motif.
- Parameters:
- clientclient
A
dask/rayclient. Setting up adask/raycluster is beyond the scope of this library. Please refer to thedask/rayDistributed documentation.- Tslist
A list of time series for which to find the most central consensus motif.
- mint
Window size.
- normalizebool, default True
When set to
True, this z-normalizes subsequences prior to computing distances. Otherwise, this function gets re-routed to its complementary non-normalized equivalent set in the@core.non_normalizedfunction decorator.- pfloat, default 2.0
The p-norm to apply for computing the Minkowski distance. Minkowski distance is typically used with
pbeing1or2, which correspond to the Manhattan distance and the Euclidean distance, respectively. This parameter is ignored whennormalize == True.- Ts_subseq_isconstantlist, default None
A list of rolling window isconstant for each time series in
Ts.
- Returns:
- central_radiusfloat
Radius of the most central consensus motif.
- central_Ts_idxint
The time series index in
Tsthat contains the most central consensus motif.- central_subseq_idxint
The subsequence index within time series
Ts[central_motif_Ts_idx]that contains the most central consensus motif.
See also
stumpy.ostinatoFind the z-normalized consensus motif of multiple time series
stumpy.gpu_ostinatoFind the z-normalized consensus motif of multiple time series with one or more GPU devices
Notes
See Table 2
The ostinato algorithm proposed in the paper finds the best radius in
Ts. Intuitively, the radius is the minimum distance of a subsequence to encompass at least one nearest neighbor subsequence from all other time series. The best radius inTsis the minimum radius amongst all radii. Some data sets might contain multiple subsequences which have the same optimal radius. The greedy Ostinato algorithm only finds one of them, which might not be the most central motif. The most central motif amongst the subsequences with the best radius is the one with the smallest mean distance to nearest neighbors in all other time series. To find this central motif it is necessary to search the subsequences with the best radius viastumpy.ostinato._get_central_motif.Examples
>>> import stumpy >>> import numpy as np >>> from dask.distributed import Client >>> if __name__ == "__main__": >>> with Client() as dask_client: >>> stumpy.ostinatoed( ... dask_client, ... [np.array([584., -11., 23., 79., 1001., 0., 19.]), ... np.array([600., -10., 23., 17.]), ... np.array([ 1., 9., 6., 0.])], ... m=3) (1.2370237678153826, 0, 4)
Alternatively, you can also use ray
>>> import ray >>> if __name__ == "__main__": >>> ray.init() >>> stumpy.ostinatoed( ... ray, ... [np.array([584., -11., 23., 79., 1001., 0., 19.]), ... np.array([600., -10., 23., 17.]), ... np.array([ 1., 9., 6., 0.])], ... m=3) >>> ray.shutdown()
gpu_ostinato#
- stumpy.gpu_ostinato(Ts, m, device_id=0, normalize=True, p=2.0, Ts_subseq_isconstant=None)#
Find the z-normalized consensus motif of multiple time series with one or more GPU devices
This is a wrapper around the vanilla version of the ostinato algorithm which finds the best radius and a helper function that finds the most central conserved motif.
- Parameters:
- Tslist
A list of time series for which to find the most central consensus motif.
- mint
Window size.
- device_idint or list, default 0
The (GPU) device number to use. The default value is
0. A list of valid device ids (int) may also be provided for parallel GPU-STUMP computation. A list of all valid device ids can be obtained by executing[device.id for device in numba.cuda.list_devices()].- normalizebool, default True
When set to
True, this z-normalizes subsequences prior to computing distances. Otherwise, this function gets re-routed to its complementary non-normalized equivalent set in the@core.non_normalizedfunction decorator.- pfloat, default 2.0
The p-norm to apply for computing the Minkowski distance. Minkowski distance is typically used with
pbeing1or2, which correspond to the Manhattan distance and the Euclidean distance, respectively. This parameter is ignored whennormalize == True.- Ts_subseq_isconstantlist, default None
A list of rolling window isconstant for each time series in
Ts.
- Returns:
- central_radiusfloat
Radius of the most central consensus motif.
- central_Ts_idxint
The time series index in
Tsthat contains the most central consensus motif.- central_subseq_idxint
The subsequence index within time series
Ts[central_motif_Ts_idx]that contains the most central consensus motif.
See also
stumpy.ostinatoFind the z-normalized consensus motif of multiple time series
stumpy.ostinatoedFind the z-normalized consensus motif of multiple time series with a
dask/raycluster
Notes
See Table 2
The ostinato algorithm proposed in the paper finds the best radius in
Ts. Intuitively, the radius is the minimum distance of a subsequence to encompass at least one nearest neighbor subsequence from all other time series. The best radius inTsis the minimum radius amongst all radii. Some data sets might contain multiple subsequences which have the same optimal radius. The greedy Ostinato algorithm only finds one of them, which might not be the most central motif. The most central motif amongst the subsequences with the best radius is the one with the smallest mean distance to nearest neighbors in all other time series. To find this central motif it is necessary to search the subsequences with the best radius viastumpy.ostinato._get_central_motif.Examples
>>> import stumpy >>> import numpy as np >>> from numba import cuda >>> if __name__ == "__main__": ... all_gpu_devices = [device.id for device in cuda.list_devices()] ... stumpy.gpu_ostinato( ... [np.array([584., -11., 23., 79., 1001., 0., 19.]), ... np.array([600., -10., 23., 17.]), ... np.array([ 1., 9., 6., 0.])], ... m=3, ... device_id=all_gpu_devices) (1.2370237678153826, 0, 4)
mpdist#
- stumpy.mpdist(T_A, T_B, m, percentage=0.05, k=None, normalize=True, p=2.0, T_A_subseq_isconstant=None, T_B_subseq_isconstant=None)[source]#
Compute the z-normalized matrix profile distance (MPdist) measure between any two time series
The MPdist distance measure considers two time series to be similar if they share many subsequences, regardless of the order of matching subsequences. MPdist concatenates the output of an AB-join and a BA-join and returns the
k-th smallest value as the reported distance. Note that MPdist is a measure and not a metric. Therefore, it does not obey the triangular inequality but the method is highly scalable.- Parameters:
- T_Anumpy.ndarray
The first time series or sequence for which to compute the matrix profile.
- T_Bnumpy.ndarray
The second time series or sequence for which to compute the matrix profile.
- mint
Window size.
- percentagefloat, default 0.05
The percentage of distances that will be used to report
mpdist. The value is between0.0and1.0.- kint
Specify the
k-th value in the concatenated matrix profiles to return. Whenkis notNone, then thepercentageparameter is ignored.- normalizebool, default True
When set to
True, this z-normalizes subsequences prior to computing distances. Otherwise, this function gets re-routed to its complementary non-normalized equivalent set in the@core.non_normalizedfunction decorator.- pfloat, default 2.0
The p-norm to apply for computing the Minkowski distance. Minkowski distance is typically used with
pbeing1or2, which correspond to the Manhattan distance and the Euclidean distance, respectively. This parameter is ignored whennormalize == True.- T_A_subseq_isconstantnumpy.ndarray or function, default None
A boolean array that indicates whether a subsequence in
T_Ais constant (True). Alternatively, a custom, user-defined function that returns a boolean array that indicates whether a subsequence inT_Ais constant (True). The function must only take two arguments,a, a 1-D array, andw, the window size, while additional arguments may be specified by currying the user-defined function usingfunctools.partial. Any subsequence with at least onenp.nan/np.infwill automatically have its corresponding value set toFalsein this boolean array.- T_B_subseq_isconstantnumpy.ndarray or function, default None
A boolean array that indicates whether a subsequence in
T_Bis constant (True). Alternatively, a custom, user-defined function that returns a boolean array that indicates whether a subsequence inT_Bis constant (True). The function must only take two arguments,a, a 1-D array, andw, the window size, while additional arguments may be specified by currying the user-defined function usingfunctools.partial. Any subsequence with at least onenp.nan/np.infwill automatically have its corresponding value set toFalsein this boolean array.
- Returns:
- MPdistfloat
The matrix profile distance.
See also
mpdistedCompute the z-normalized matrix profile distance (MPdist) measure between any two time series with a
dask/rayclustergpu_mpdistCompute the z-normalized matrix profile distance (MPdist) measure between any two time series with one or more GPU devices
Notes
See Section III
Examples
>>> import stumpy >>> import numpy as np >>> stumpy.mpdist( ... np.array([-11.1, 23.4, 79.5, 1001.0]), ... np.array([584., -11., 23., 79., 1001., 0., -19.]), ... m=3) 0.00019935236191097894
mpdisted#
- stumpy.mpdisted(client, T_A, T_B, m, percentage=0.05, k=None, normalize=True, p=2.0, T_A_subseq_isconstant=None, T_B_subseq_isconstant=None)[source]#
Compute the z-normalized matrix profile distance (MPdist) measure between any two time series with a
dask/rayclusterThe MPdist distance measure considers two time series to be similar if they share many subsequences, regardless of the order of matching subsequences. MPdist concatenates the output of an AB-join and a BA-join and returns the
k-th smallest value as the reported distance. Note that MPdist is a measure and not a metric. Therefore, it does not obey the triangular inequality but the method is highly scalable.- Parameters:
- clientclient
A
dask/rayclient. Setting up adask/raycluster is beyond the scope of this library. Please refer to thedask/raydocumentation.- T_Anumpy.ndarray
The first time series or sequence for which to compute the matrix profile.
- T_Bnumpy.ndarray
The second time series or sequence for which to compute the matrix profile.
- mint
Window size.
- percentagefloat, default 0.05
The percentage of distances that will be used to report
mpdist. The value is between0.0and1.0. This parameter is ignored whenkis notNone.- kint
Specify the
k-th value in the concatenated matrix profiles to return. Whenkis notNone, then thepercentageparameter is ignored.- normalizebool, default True
When set to
True, this z-normalizes subsequences prior to computing distances. Otherwise, this function gets re-routed to its complementary non-normalized equivalent set in the@core.non_normalizedfunction decorator.- pfloat, default 2.0
The p-norm to apply for computing the Minkowski distance. Minkowski distance is typically used with
pbeing1or2, which correspond to the Manhattan distance and the Euclidean distance, respectively. This parameter is ignored whennormalize == True.- T_A_subseq_isconstantnumpy.ndarray or function, default None
A boolean array that indicates whether a subsequence in
T_Ais constant (True). Alternatively, a custom, user-defined function that returns a boolean array that indicates whether a subsequence inT_Ais constant (True). The function must only take two arguments,a, a 1-D array, andw, the window size, while additional arguments may be specified by currying the user-defined function usingfunctools.partial. Any subsequence with at least onenp.nan/np.infwill automatically have its corresponding value set toFalsein this boolean array.- T_B_subseq_isconstantnumpy.ndarray or function, default None
A boolean array that indicates whether a subsequence in
T_Bis constant (True). Alternatively, a custom, user-defined function that returns a boolean array that indicates whether a subsequence inT_Bis constant (True). The function must only take two arguments,a, a 1-D array, andw, the window size, while additional arguments may be specified by currying the user-defined function usingfunctools.partial. Any subsequence with at least onenp.nan/np.infwill automatically have its corresponding value set toFalsein this boolean array.
- Returns:
- MPdistfloat
The matrix profile distance.
See also
mpdistCompute the z-normalized matrix profile distance (MPdist) measure between any two time series
gpu_mpdistCompute the z-normalized matrix profile distance (MPdist) measure between any two time series with one or more GPU devices
Notes
See Section III
Examples
>>> import stumpy >>> import numpy as np >>> from dask.distributed import Client >>> if __name__ == "__main__": >>> with Client() as dask_client: >>> stumpy.mpdisted( ... dask_client, ... np.array([-11.1, 23.4, 79.5, 1001.0]), ... np.array([584., -11., 23., 79., 1001., 0., -19.]), ... m=3) 0.00019935236191097894
Alternatively, you can also use ray
>>> import ray >>> if __name__ == "__main__": >>> ray.init() >>> stumpy.mpdisted( ... ray, ... np.array([-11.1, 23.4, 79.5, 1001.0]), ... np.array([584., -11., 23., 79., 1001., 0., -19.]), ... m=3) >>> ray.shutdown()
gpu_mpdist#
- stumpy.gpu_mpdist(T_A, T_B, m, percentage=0.05, k=None, device_id=0, normalize=True, p=2.0, T_A_subseq_isconstant=None, T_B_subseq_isconstant=None)#
Compute the z-normalized matrix profile distance (MPdist) measure between any two time series with one or more GPU devices
The MPdist distance measure considers two time series to be similar if they share many subsequences, regardless of the order of matching subsequences. MPdist concatenates and sorts the output of an AB-join and a BA-join and returns the value of the
k-th smallest number as the reported distance. Note that MPdist is a measure and not a metric. Therefore, it does not obey the triangular inequality but the method is highly scalable.- Parameters:
- T_Anumpy.ndarray
The first time series or sequence for which to compute the matrix profile.
- T_Bnumpy.ndarray
The second time series or sequence for which to compute the matrix profile.
- mint
Window size.
- percentagefloat, default 0.05
The percentage of distances that will be used to report
mpdist. The value is between0.0and1.0. This parameter is ignored whenkis notNone.- kint, default None
Specify the
k-th value in the concatenated matrix profiles to return. Whenkis notNone, then the percentage parameter is ignored.- device_idint or list, default 0
The (GPU) device number to use. The default value is
0. A list of valid device ids (int) may also be provided for parallel GPU-STUMP computation. A list of all valid device ids can be obtained by executing[device.id for device in numba.cuda.list_devices()].- normalizebool, default True
When set to
True, this z-normalizes subsequences prior to computing distances. Otherwise, this function gets re-routed to its complementary non-normalized equivalent set in the@core.non_normalizedfunction decorator.- pfloat, default 2.0
The p-norm to apply for computing the Minkowski distance. Minkowski distance is typically used with
pbeing1or2, which correspond to the Manhattan distance and the Euclidean distance, respectively. This parameter is ignored whennormalize == True.- T_A_subseq_isconstantnumpy.ndarray or function, default None
A boolean array that indicates whether a subsequence in
T_Ais constant (True). Alternatively, a custom, user-defined function that returns a boolean array that indicates whether a subsequence inT_Ais constant (True). The function must only take two arguments,a, a 1-D array, andw, the window size, while additional arguments may be specified by currying the user-defined function usingfunctools.partial. Any subsequence with at least onenp.nan/np.infwill automatically have its corresponding value set toFalsein this boolean array.- T_B_subseq_isconstantnumpy.ndarray or function, default None
A boolean array that indicates whether a subsequence in T_B is constant (
True). Alternatively, a custom, user-defined function that returns a boolean array that indicates whether a subsequence inT_Bis constant (True). The function must only take two arguments,a, a 1-D array, andw, the window size, while additional arguments may be specified by currying the user-defined function usingfunctools.partial. Any subsequence with at least onenp.nan/np.infwill automatically have its corresponding value set toFalsein this boolean array.
- Returns:
- MPdistfloat
The matrix profile distance.
Notes
See Section III
Examples
>>> import stumpy >>> import numpy as np >>> from numba import cuda >>> if __name__ == "__main__": ... all_gpu_devices = [device.id for device in cuda.list_devices()] ... stumpy.gpu_mpdist( ... np.array([-11.1, 23.4, 79.5, 1001.0]), ... np.array([584., -11., 23., 79., 1001., 0., -19.]), ... m=3, ... device_id=all_gpu_devices) 0.00019935236191097894
motifs#
- stumpy.motifs(T, P, min_neighbors=1, max_distance=None, cutoff=None, max_matches=10, max_motifs=1, atol=1e-08, normalize=True, p=2.0, T_subseq_isconstant=None)[source]#
Discover the top motifs for time series
TA subsequence,
Q, becomes a candidate motif if there are at leastmin_neighbornumber of other subsequence matches inT(outside the exclusion zone) with a distance less or equal tomax_distance.Note that, in the best case scenario, the returned arrays would have shape
(max_motifs, max_matches)and contain all finite values. However, in reality, many conditions (see below) need to be satisfied in order for this to be true. Any truncation in the number of rows (i.e., motifs) may be the result of insufficient candidate motifs with matches greater than or equal tomin_neighborsor that the matrix profile value for the candidate motif was larger thancutoff. Similarly, any truncation in the number of columns (i.e., matches) may be the result of insufficient matches being found with distances (to their corresponding candidate motif) that are equal to or less thanmax_distance. Only motifs and matches that satisfy all of these constraints will be returned.If you must return a shape of
(max_motifs, max_matches), then you may consider specifying a smallermin_neighbors, a largermax_distance, and/or a largercutoff. For example, while it is ill advised, settingmin_neighbors=1,max_distance = np.inf, andcutoff = np.infwill ensure that the shape of the output arrays will be(max_motifs, max_matches). However, given the lack of constraints, the quality of each motif and the quality of each match may be drastically different. Setting appropriate conditions will help ensure appropriately constrained results that may be easier to interpret.- Parameters:
- Tnumpy.ndarray
The time series or sequence.
- Pnumpy.ndarray
The (1-dimensional) matrix profile of
T. In the case where the matrix profile was computed withk > 1(i.e., top-k nearest neighbors), you must summarize the top-k nearest-neighbor distances for each subsequence into a single value (e.g.,np.mean,np.min, etc) and then use that derived value as yourP.- min_neighborsint, default 1
The minimum number of similar matches a subsequence needs to have in order to be considered a motif. This defaults to
1, which means that a subsequence must have at least one similar match in order to be considered a motif.- max_distancefloat or function, default None
For a candidate motif,
Q, and a non-trivial subsequence,S,max_distanceis the maximum distance allowed betweenQandSso thatSis considered a match ofQ. Ifmax_distanceis a function, then it must be a function that accepts a single parameter,D, in its function signature, which is the distance profile betweenQandT. IfNone, this defaults tonp.nanmax([np.nanmean(D) - 2.0 * np.nanstd(D), np.nanmin(D)]).- cutofffloat, default None
The largest matrix profile value (distance) that a candidate motif is allowed to have. If
None, this defaults tonp.nanmax([np.nanmean(P) - 2.0 * np.nanstd(P), np.nanmin(P)]).- max_matchesint, default 10
The maximum amount of similar matches of a motif representative to be returned. The resulting matches are sorted by distance, so a value of
10means that the indices of the most similar10subsequences is returned. IfNone, all matches withinmax_distanceof the motif representative will be returned. Note that the first match is always the self-match/trivial-match for each motif.- max_motifsint, default 1
The maximum number of motifs to return. To consider returning all possible valid motifs, try setting max_motifs to the length of your input matrix profile (i.e.,
max_motifs=len(P))- atolfloat, default 1e-8
The absolute tolerance parameter. This value will be added to
max_distancewhen comparing distances between subsequences.- normalizebool, default True
When set to
True, this z-normalizes subsequences prior to computing distances. Otherwise, this function gets re-routed to its complementary non-normalized equivalent set in the@core.non_normalizedfunction decorator.- pfloat, default 2.0
The p-norm to apply for computing the Minkowski distance. Minkowski distance is typically used with
pbeing1or2, which correspond to the Manhattan distance and the Euclidean distance, respectively. This parameter is ignored whennormalize == True.- T_subseq_isconstantnumpy.ndarray or function, default None
A boolean array that indicates whether a subsequence in
Tis constant (True). Alternatively, a custom, user-defined function that returns a boolean array that indicates whether a subsequence inTis constant (True). The function must only take two arguments,a, a 1-D array, andw, the window size, while additional arguments may be specified by currying the user-defined function usingfunctools.partial. Any subsequence with at least onenp.nan/np.infwill automatically have its corresponding value set toFalsein this boolean array.
- Returns:
- motif_distancesnumpy.ndarray
The distances corresponding to a set of subsequence matches for each motif. Note that the first column always corresponds to the distance for the self-match/trivial-match for each motif.
- motif_indicesnumpy.ndarray
The indices corresponding to a set of subsequences matches for each motif. Note that the first column always corresponds to the index for the self-match/trivial-match for each motif.
See also
stumpy.matchFind all matches of a query
Qin a time seriesTstumpy.mmotifsDiscover the top motifs for the multi-dimensional time series
Tstumpy.stumpCompute the z-normalized matrix profile
stumpy.stumpedCompute the z-normalized matrix profile with a
dask/rayclusterstumpy.gpu_stumpCompute the z-normalized matrix profile with one or more GPU devices
stumpy.scrumpCompute an approximate z-normalized matrix profile
Examples
>>> import stumpy >>> import numpy as np >>> mp = stumpy.stump(np.array([584., -11., 23., 79., 1001., 0., -19.]), m=3) >>> stumpy.motifs( ... np.array([584., -11., 23., 79., 1001., 0., -19.]), ... mp[:, 0], ... max_distance=2.0) (array([[0. , 0.11633857]]), array([[0, 4]]))
>>> # Alternative example using named attributes >>> >>> mp = stumpy.stump(np.array([584., -11., 23., 79., 1001., 0., -19.]), m=3) >>> stumpy.motifs( ... np.array([584., -11., 23., 79., 1001., 0., -19.]), ... mp.P_, ... max_distance=2.0) (array([[0. , 0.11633857]]), array([[0, 4]]))
match#
- stumpy.match(Q, T, M_T=None, Σ_T=None, max_distance=None, max_matches=None, atol=1e-08, query_idx=None, normalize=True, p=2.0, T_subseq_isfinite=None, T_subseq_isconstant=None, Q_subseq_isconstant=None)[source]#
Find all matches of a query
Qin a time seriesTThe indices of subsequences whose distances to
Qare less than or equal tomax_distance, sorted by distance (lowest to highest). Around each occurrence, an exclusion zone is applied before searching for the next.- Parameters:
- Qnumpy.ndarray
The query sequence.
Qdoes not have to be a subsequence ofT.- Tnumpy.ndarray
The time series of interest.
- M_Tnumpy.ndarray, default None
Sliding mean of time series,
T.- Σ_Tnumpy.ndarray, default None
Sliding standard deviation of time series,
T.- max_distancefloat or function, default None
Maximum distance between
Qand a subsequence,S, forSto be considered a match. Ifmax_distanceis a function, then it must be a function that accepts a single parameter,D, in its function signature, which is the distance profile betweenQandT(a 1D numpy array of sizen - m + 1). IfNone, this defaults tonp.nanmax([np.nanmean(D) - 2 * np.nanstd(D), np.nanmin(D)])(i.e. at least the closest match will be returned).- max_matchesint, default None
The maximum amount of similar occurrences to be returned. The resulting occurrences are sorted by distance, so a value of
10means that the indices of the most similar10subsequences is returned. IfNone, then all occurrences are returned.- atolfloat, default 1e-8
The absolute tolerance parameter. This value will be added to
max_distancewhen comparing distances between subsequences.- query_idxint, default None
This is the index position along the time series,
T, where the query subsequence,Q, is located.query_idxshould only be used when the matrix profile is a self-join and should be set toNonefor matrix profiles computed from AB-joins. Ifquery_idxis set to a specific integer value, then this will help ensure that the self-match will be returned first.- normalizebool, default True
When set to
True, this z-normalizes subsequences prior to computing distances. Otherwise, this function gets re-routed to its complementary non-normalized equivalent set in the@core.non_normalizedfunction decorator.- pfloat, default 2.0
The p-norm to apply for computing the Minkowski distance. Minkowski distance is typically used with
pbeing1or2, which correspond to the Manhattan distance and the Euclidean distance, respectively. This parameter is ignored whennormalize == True.- T_subseq_isfinitenumpy.ndarray
A boolean array that indicates whether a subsequence in
Tcontains anp.nan/np.infvalue (False). This parameter is ignored whennormalize=True.- T_subseq_isconstantnumpy.ndarray or function, default None
A boolean array that indicates whether a subsequence (of length equal to
len(Q)) inTis constant (True). Alternatively, a custom, user-defined function that returns a boolean array that indicates whether a subsequence inTis constant (True). The function must only take two arguments,a, a 1-D array, andw, the window size, while additional arguments may be specified by currying the user-defined function usingfunctools.partial. Any subsequence with at least onenp.nan/np.infwill automatically have its corresponding value set toFalsein this boolean array.- Q_subseq_isconstantnumpy.ndarray or function, default None
A boolean array (of size
1) that indicates whetherQis constant (True). Alternatively, a custom, user-defined function that returns a boolean array that indicates whether a subsequence inQis constant (True). The function must only take two arguments,a, a 1-D array, andw, the window size, while additional arguments may be specified by currying the user-defined function usingfunctools.partial. Any subsequence with at least onenp.nan/np.infwill automatically have its corresponding value set toFalsein this boolean array.
- Returns:
- outnumpy.ndarray
The first column consists of distances of subsequences of
Twhose distances toQare less than or equal tomax_distance, sorted by distance (lowest to highest). The second column consists of the corresponding indices inT.
See also
stumpy.motifsDiscover the top motifs for time series
Tstumpy.mmotifsDiscover the top motifs for the multi-dimensional time series
Tstumpy.stumpCompute the z-normalized matrix profile
stumpy.stumpedCompute the z-normalized matrix profile with a
dask/rayclusterstumpy.gpu_stumpCompute the z-normalized matrix profile with one or more GPU devices
stumpy.scrumpCompute an approximate z-normalized matrix profile
Examples
>>> import stumpy >>> import numpy as np >>> stumpy.match( ... np.array([-11.1, 23.4, 79.5, 1001.0]), ... np.array([584., -11., 23., 79., 1001., 0., -19.]) ... ) array([[0.0011129739290248121, 1]], dtype=object)
mmotifs#
- stumpy.mmotifs(T, P, I, min_neighbors=1, max_distance=None, cutoffs=None, max_matches=10, max_motifs=1, atol=1e-08, k=None, include=None, normalize=True, p=2.0, T_subseq_isconstant=None)[source]#
Discover the top motifs for the multi-dimensional time series
T.- Parameters:
- Tnumpy.ndarray
The multi-dimensional time series or sequence.
- Pnumpy.ndarray
Multi-dimensional Matrix Profile of
T.- Inumpy.ndarray
Multi-dimensional Matrix Profile indices.
- min_neighborsint, default 1
The minimum number of similar matches a subsequence needs to have in order to be considered a motif. This defaults to
1, which means that a subsequence must have at least one similar match in order to be considered a motif.- max_distancefloat, default None
Maximal distance that is allowed between a query subsequence (a candidate motif) and all subsequences in
Tto be considered as a match. IfNone, this defaults tonp.nanmax([np.nanmean(D) - 2 * np.nanstd(D), np.nanmin(D)])(i.e. at least the closest match will be returned).- cutoffsnumpy.ndarray or float, default None
The largest matrix profile value (distance) for each dimension of the multidimensional matrix profile that a multidimenisonal candidate motif is allowed to have. If
cutoffsis a scalar value, then this value will be applied to every dimension.- max_matchesint, default 10
The maximum number of similar matches (nearest neighbors) to return for each motif. The first match is always the self/trivial-match for each motif.
- max_motifsint, default 1
The maximum number of motifs to return. To consider returning all possible valid motifs, try setting max_motifs to the length of your input matrix profile (i.e.,
max_motifs=len(P))- atolfloat, default 1e-8
The absolute tolerance parameter. This value will be added to
max_distancewhen comparing distances between subsequences.- kint, default None
The number of dimensions (
k + 1) required for discovering all motifs. This value is available for doing guided search or, together withinclude, for constrained search. Ifk is None, then this will be automatically be computed for each motif using MDL (unconstrained search).- includenumpy.ndarray, default None
A list of (zero based) indices corresponding to the dimensions in
Tthat must be included in the constrained multidimensional motif search.- normalizebool, default True
When set to
True, this z-normalizes subsequences prior to computing distances. Otherwise, this function gets re-routed to its complementary non-normalized equivalent set in the@core.non_normalizedfunction decorator.- pfloat, default 2.0
The p-norm to apply for computing the Minkowski distance. Minkowski distance is typically used with
pbeing1or2, which correspond to the Manhattan distance and the Euclidean distance, respectively. This parameter is ignored whennormalize == True.- T_subseq_isconstantnumpy.ndarray, function, or list, default None
A parameter that is used to show whether a subsequence of a time series in
Tis constant (True) or not.T_subseq_isconstantcan be a 2D booleannumpy.ndarrayor a function that can be applied to each time series inT. Alternatively, for maximum flexibility, a list (with length equal to the total number of time series) may also be used. In this case,T_subseq_isconstant[i]corresponds to thei-th time seriesT[i]and each element in the list can either be a 1D booleannumpy.ndarray, a function, orNone.
- Returns:
- motif_distances: numpy.ndarray
The distances corresponding to a set of subsequence matches for each motif.
- motif_indices: numpy.ndarray
The indices corresponding to a set of subsequences matches for each motif.
- motif_subspaces: list
A list consisting of arrays that contain the
k-dimensional subspace for each motif.- motif_mdls: list
A list consisting of arrays that contain the mdl results for finding the dimension of each motif.
See also
stumpy.motifsFind the top motifs for time series
Tstumpy.matchFind all matches of a query
Qin a time seriesTstumpy.mstumpCompute the multi-dimensional z-normalized matrix profile
stumpy.mstumpedCompute the multi-dimensional z-normalized matrix profile with a
dask/rayclusterstumpy.subspaceCompute the
k-dimensional matrix profile subspace for a given subsequence index and its nearest neighbor indexstumpy.mdlCompute the number of bits needed to compress one array with another using the minimum description length (MDL)
Notes
For more information on
includeand search types, see Section IV D and IV EExamples
>>> import stumpy >>> import numpy as np >>> mps, indices = stumpy.mstump( ... np.array([[584., -11., 23., 79., 1001., 0., -19.], ... [ 1., 2., 4., 8., 16., 0., 32.]]), ... m=3) >>> stumpy.mmotifs( ... np.array([[584., -11., 23., 79., 1001., 0., -19.], ... [ 1., 2., 4., 8., 16., 0., 32.]]), ... mps, ... indices) (array([[4.47034836e-08, 4.47034836e-08]]), array([[0, 2]]), [array([1])], [array([ 80. , 111.509775])])
snippets#
- stumpy.snippets(T, m, k, percentage=1.0, s=None, mpdist_percentage=0.05, mpdist_k=None, normalize=True, p=2.0, mpdist_T_subseq_isconstant=None)[source]#
Identify the top
ksnippets that best represent the time series,T- Parameters:
- Tnumpy.ndarray
The time series or sequence for which to find the snippets.
- mint
The snippet window size.
- kint
The desired number of snippets.
- percentagefloat, default 1.0
With the length of each non-overlapping subsequence,
S[i], set tom, this is the percentage ofS[i](i.e.,percentage * m) to sets(the sub-subsequence length) to. Whenpercentage == 1.0, then the full length ofS[i]is used to compute thempdist_vect. Whenpercentage < 1.0, then a shorter sub-subsequence length ofs = min(math.ceil(percentage * m), m)from eachS[i]is used to computempdist_vect. Whensis notNone, then thepercentageparameter is ignored.- sint, default None
With the length of each non-overlapping subsequence,
S[i], set tom, this is essentially the sub-subsequence length (i.e., a shorter part ofS[i]). Whens == m, then the full length ofS[i]is used to compute thempdist_vect. Whens < m, then shorter subsequences with lengthsfrom eachS[i]is used to computempdist_vect. Whensis notNone, then thepercentageparameter is ignored.- mpdist_percentagefloat, default 0.05
The percentage of distances that will be used to report
mpdist. The value is between0.0and1.0.- mpdist_kint
Specify the
k-th value in the concatenated matrix profiles to return. Whenmpdist_kis notNone, then thempdist_percentageparameter is ignored.- normalizebool, default True
When set to
True, this z-normalizes subsequences prior to computing distances. Otherwise, this function gets re-routed to its complementary non-normalized equivalent set in the@core.non_normalizedfunction decorator.- pfloat, default 2.0
The p-norm to apply for computing the Minkowski distance. Minkowski distance is typically used with
pbeing1or2, which correspond to the Manhattan distance and the Euclidean distance, respectively. This parameter is ignored whennormalize == True.- mpdist_T_subseq_isconstantnumpy.ndarray or function, default None
A boolean array that indicates whether a subsequence (of length equal to
len(s)) inTis constant (True). Alternatively, a custom, user-defined function that returns a boolean array that indicates whether a subsequence inTis constant (True). The function must only take two arguments,a, a 1-D array, andw, the window size, while additional arguments may be specified by currying the user-defined function usingfunctools.partial. Any subsequence with at least onenp.nan/np.infwill automatically have its corresponding value set toFalsein this boolean array.
- Returns:
- snippetsnumpy.ndarray
The top
ksnippets.- snippets_indicesnumpy.ndarray
The index locations for each of top
ksnippets.- snippets_profilesnumpy.ndarray
The MPdist profiles for each of the top
ksnippets.- snippets_fractionsnumpy.ndarray
The fraction of data that each of the top
ksnippets represents.- snippets_areasnumpy.ndarray
The area under the curve corresponding to each profile for each of the top
ksnippets.- snippets_regimes: numpy.ndarray
The index slices corresponding to the set of regimes for each of the top
ksnippets. The first column is the (zero-based) snippet index while the second and third columns correspond to the (inclusive) regime start indices and the (exclusive) regime stop indices, respectively.
Notes
See Table I
Examples
>>> import stumpy >>> import numpy as np >>> stumpy.snippets(np.array([584., -11., 23., 79., 1001., 0., -19.]), m=3, k=2) (array([[ 584., -11., 23.], [ 79., 1001., 0.]]), array([0, 3]), array([[0. , 3.2452632 , 3.00009263, 2.982409 , 0.11633857], [2.982409 , 2.69407392, 3.01719586, 0. , 2.92154586]]), array([0.6, 0.4]), array([9.3441034 , 5.81050512]), array([[0, 0, 1], [0, 2, 3], [0, 4, 5], [1, 1, 2], [1, 3, 4]]))
stimp#
- stumpy.stimp(T, min_m=3, max_m=None, step=1, percentage=0.01, pre_scrump=True, normalize=True, p=2.0, T_subseq_isconstant_func=None)[source]#
A class to compute the Pan Matrix Profile
This is based on the SKIMP algorithm.
- Parameters:
- Tnumpy.ndarray
The time series or sequence for which to compute the pan matrix profile.
- min_mint, default 3
The starting (or minimum) subsequence window size for which a matrix profile may be computed.
- max_mint, default None
The stopping (or maximum) subsequence window size for which a matrix profile may be computed. When
max_m = None, this is set to the maximum allowable subsequence window size.- stepint, default 1
The step between subsequence window sizes.
- percentagefloat, default 0.01
The percentage of the full matrix profile to compute for each subsequence window size. When
percentage < 1.0, then thescrumpalgorithm is used. Otherwise, thestumpalgorithm is used when the exact matrix profile is requested.- pre_scrumpbool, default True
A flag for whether or not to perform the PreSCRIMP calculation prior to computing SCRIMP. If set to
True, this is equivalent to computing SCRIMP++. This parameter is ignored whenpercentage = 1.0.- normalizebool, default True
When set to
True, this z-normalizes subsequences prior to computing distances. Otherwise, this function gets re-routed to its complementary non-normalized equivalent set in the@core.non_normalizedfunction decorator.- pfloat, default 2.0
The p-norm to apply for computing the Minkowski distance. Minkowski distance is typically used with
pbeing1or2, which correspond to the Manhattan distance and the Euclidean distance, respectively. This parameter is ignored whennormalize == True.- T_subseq_isconstant_funcfunction, default None
A custom, user-defined function that returns a boolean array that indicates whether a subsequence in
Tis constant (True). The function must only take two arguments,a, a 1-D array, andw, the window size, while additional arguments may be specified by currying the user-defined function usingfunctools.partial. Any subsequence with at least onenp.nan/np.infwill automatically have its corresponding value set toFalsein this boolean array.
- Attributes:
- PAN_numpy.ndarray
The transformed (i.e., normalized, contrasted, binarized, and repeated) pan matrix profile.
- M_numpy.ndarray
The full list of (breadth first search (level) ordered) subsequence window sizes.
Methods
update():
Compute the next matrix profile using the next available (breadth-first-search (level) ordered) subsequence window size and update the pan matrix profile
See also
stumpy.stimpedCompute the Pan Matrix Profile with a
dask/rayclusterstumpy.gpu_stimpCompute the Pan Matrix Profile with with one or more GPU devices
Notes
See Table 2
Examples
>>> import stumpy >>> import numpy as np >>> pmp = stumpy.stimp(np.array([584., -11., 23., 79., 1001., 0., -19.])) >>> pmp.update() >>> pmp.PAN_ array([[0., 1., 1., 1., 1., 1., 1.], [0., 1., 1., 1., 1., 1., 1.]])
stimped#
- stumpy.stimped(client, T, min_m=3, max_m=None, step=1, normalize=True, p=2.0, T_subseq_isconstant_func=None)[source]#
A class to compute the Pan Matrix Profile with a
dask/rayclusterThis is based on the SKIMP algorithm.
- Parameters:
- clientclient
A
dask/rayclient. Setting up adask/raycluster is beyond the scope of this library. Please refer to thedask/raydocumentation.- Tnumpy.ndarray
The time series or sequence for which to compute the pan matrix profile.
- min_mint, default 3
The starting (or minimum) subsequence window size for which a matrix profile may be computed.
- max_mint, default None
The stopping (or maximum) subsequence window size for which a matrix profile may be computed. When
max_m = None, this is set to the maximum allowable subsequence window size- stepint, default 1
The step between subsequence window sizes.
- normalizebool, default True
When set to
True, this z-normalizes subsequences prior to computing distances. Otherwise, this function gets re-routed to its complementary non-normalized equivalent set in the@core.non_normalizedfunction decorator.- pfloat, default 2.0
The p-norm to apply for computing the Minkowski distance. Minkowski distance is typically used with
pbeing1or2, which correspond to the Manhattan distance and the Euclidean distance, respectively. This parameter is ignored whennormalize == True.- T_subseq_isconstant_funcfunction, default None
A custom, user-defined function that returns a boolean array that indicates whether a subsequence in
Tis constant (True). The function must only take two arguments,a, a 1-D array, andw, the window size, while additional arguments may be specified by currying the user-defined function usingfunctools.partial. Any subsequence with at least onenp.nan/np.infwill automatically have its corresponding value set toFalsein this boolean array.
- Attributes:
- PAN_numpy.ndarray
The transformed (i.e., normalized, contrasted, binarized, and repeated) pan matrix profile.
- M_numpy.ndarray
The full list of (breadth first search (level) ordered) subsequence window sizes.
Methods
update():
Compute the next matrix profile using the next available (breadth-first-search (level) ordered) subsequence window size and update the pan matrix profile.
See also
stumpy.stimpCompute the Pan Matrix Profile
stumpy.gpu_stimpCompute the Pan Matrix Profile with with one or more GPU devices
Notes
See Table 2
Examples
>>> import stumpy >>> import numpy as np >>> from dask.distributed import Client >>> if __name__ == "__main__": ... with Client() as dask_client: ... pmp = stumpy.stimped( ... dask_client, ... np.array([584., -11., 23., 79., 1001., 0., -19.])) ... pmp.update() ... pmp.PAN_ array([[0., 1., 1., 1., 1., 1., 1.], [0., 1., 1., 1., 1., 1., 1.]])
Alternatively, you can also use ray
>>> import ray >>> if __name__ == "__main__": >>> ray.init() >>> pmp = stumpy.stimped( ... ray, ... np.array([584., -11., 23., 79., 1001., 0., -19.])) >>> ray.shutdown()
gpu_stimp#
- stumpy.gpu_stimp(T, min_m=3, max_m=None, step=1, device_id=0, normalize=True, p=2.0, T_subseq_isconstant_func=None)#
A class to compute the Pan Matrix Profile with with one or more GPU devices
This is based on the SKIMP algorithm.
- Parameters:
- Tnumpy.ndarray
The time series or sequence for which to compute the pan matrix profile.
- min_mint, default 3
The starting (or minimum) subsequence window size for which a matrix profile may be computed.
- max_mint, default None
The stopping (or maximum) subsequence window size for which a matrix profile may be computed. When
m_stop = None, this is set to the maximum allowable subsequence window size.- stepint, default 1
The step between subsequence window sizes.
- device_idint or list, default 0
The (GPU) device number to use. The default value is
0. A list of valid device ids (int) may also be provided for parallel GPU-STUMP computation. A list of all valid device ids can be obtained by executing[device.id for device in numba.cuda.list_devices()].- normalizebool, default True
When set to
True, this z-normalizes subsequences prior to computing distances. Otherwise, this function gets re-routed to its complementary non-normalized equivalent set in the@core.non_normalizedfunction decorator.- pfloat, default 2.0
The p-norm to apply for computing the Minkowski distance. Minkowski distance is typically used with
pbeing1or2, which correspond to the Manhattan distance and the Euclidean distance, respectively. This parameter is ignored whennormalize == True.- T_subseq_isconstant_funcfunction, default None
A custom, user-defined function that returns a boolean array that indicates whether a subsequence in
Tis constant (True). The function must only take two arguments,a, a 1-D array, andw, the window size, while additional arguments may be specified by currying the user-defined function usingfunctools.partial. Any subsequence with at least onenp.nan/np.infwill automatically have its corresponding value set toFalsein this boolean array.
- Attributes:
- PAN_numpy.ndarray
The transformed (i.e., normalized, contrasted, binarized, and repeated) pan matrix profile.
- M_numpy.ndarray
The full list of (breadth first search (level) ordered) subsequence window sizes.
Methods
update():
Compute the next matrix profile using the next available (breadth-first-search (level) ordered) subsequence window size and update the pan matrix profile.
See also
stumpy.stimpCompute the Pan Matrix Profile
stumpy.stimpedCompute the Pan Matrix Profile with a
dask/raycluster
Notes
See Table 2
Examples
>>> import stumpy >>> import numpy as np >>> from numba import cuda >>> if __name__ == "__main__": ... all_gpu_devices = [device.id for device in cuda.list_devices()] ... pmp = stumpy.gpu_stimp( ... np.array([584., -11., 23., 79., 1001., 0., -19.]), ... device_id=all_gpu_devices) ... pmp.update() ... pmp.PAN_ array([[0., 1., 1., 1., 1., 1., 1.], [0., 1., 1., 1., 1., 1., 1.]])