Semantic segmentation on multi-dimensional Time Series #20

ywy9876 · 2019-05-25T00:37:00Z

ywy9876
May 25, 2019

Hello,
I was looking at the example of Multi-dimensional time series data with MSTUMP. I tried to run the code and found that the returned values (matrix_profile, matrix_profile_indices) are both 3d (1 for each Time series I guess). My doubt is, when the indices for a segment i point to different nearest segments (say 847, 237, 847), how do we decide which is the nearest neighbour? In a document of Prof. Eamonn, it says that we simply add the distance profiles together and find the minimum. But with your implementation, I'm not sure how to do it.
Also I wonder if the FLOSS or similar algorithm for semantic segmentation is already implemented.

Thank you in advance.

seanlaw · 2019-05-25T01:26:12Z

seanlaw
May 25, 2019
Maintainer

@ywy9876 Thank you for your question. Our implementation of multi-dimensional STOMP (MSTUMP) produces identical results to the original author's open source reference implementation that can be found here.

To answer your question, each row of the returned matrix profile actually corresponds to using one time series up to D-time series. So, if you have three time series then the first row of the matrix profile chooses the one time series (out of three) that produces the smallest (minimum) distance value for that window. The second row of the matrix profile chooses two time series (out of three) that produces the smallest (minimum) average distance value for that window. Finally, the third row of the matrix profile gives the (average) distance value for when all three time series are used.

The matrix profile indices tell you where along the time series you'd find matching subsequence. However, it doesn't tell you which of the time series is chosen as you increase k, the number of dimensions to include.

We have not implemented FLOSS but I suspect that you can do this fairly easily since the hard part is having an efficient way to compute the full matrix profile indices (which STUMPY does for you).

Let me know if this answers your question.

0 replies

ywy9876 · 2019-05-25T09:18:47Z

ywy9876
May 25, 2019
Author

@seanlaw ,
Thank you for your reply.
Just to be clear with your sentence:

The matrix profile indices tell you where along the time series you'd find matching subsequence.

If I have the following results:

matrix_profile_indices[0][4] 
Out[22]: 867

matrix_profile_indices[1][4]
Out[23]: 205

matrix_profile_indices[2][4]
Out[24]: 867

What I should interpret is that for the subsequence starting from position 4 of the original TS, the nearest neighbour is the subsequence starting from position 867 if we take into account all the dimensions (due to matrix_profile_indices[2][4]), am I wrong?

However, it doesn't tell you which of the time series is chosen as you increase k, the number of dimensions to include.

I'm not very sure what did you mean, what's the purpose of knowing which of the time series is chosen?
Isn't that a subsequence is chosen as nearest neighbour as it's averaged distance for all three time series is minimum?

Looking forward to hearing from you and thank you again.

0 replies

seanlaw · 2019-05-25T13:29:37Z

seanlaw
May 25, 2019
Maintainer

What I should interpret is that for the subsequence starting from position 4 of the original TS, the nearest neighbour is the subsequence starting from position 867 if we take into account all the dimensions (due to matrix_profile_indices[2][4]), am I wrong?

This is correct. However, you must take into account what the matrix profile values are for k=1 to k=3 (where k is the number of sub-dimensions). The matrix profile value may be lowest for k=2 (not k=1 or k=3). When you read the original paper, it explains very clearly that one is rarely interested in using all of the dimensions. Instead, you want to choose the smallest subset of dimensions, k, that produces the smallest mean matrix profile value.

However, it doesn't tell you which of the time series is chosen as you increase k, the number of dimensions to include.

Consistent with what I said above, if k=2 produces the smallest mean matrix profile value then the matrix profile index only tells you where along the three original time series to look. However, for k=2, neither the matrix profile nor the matrix profile index tells you which two of the three time series contains the subsequence that was used in the matrix profile calculation. Now, imagine if you have D=20 time series and, for a given subsequence, k=5. You now have no idea which 5 time series out of 20 are important. The matrix profile index also doesn’t give you this information.

Fortunately, this is something that can be tracked (though more complicated for MSTUMPED) and it would welcome a Pull Request for that.

Let me know if this explanation helps and then I can close this issue.

0 replies

ywy9876 · 2019-05-25T14:14:28Z

ywy9876
May 25, 2019
Author

Thanks for the clarification.

For my particular case, the data has only 2 dimensions (u,v components of wind), so I think this shouldn't be an issue for me.

Fortunately, this is something that can be tracked (though more complicated for MSTUMPED) and it would welcome a Pull Request for that.

Interesting work! I will see if I can manage it.

Let me know if this explanation helps and then I can close this issue.

Thanks again for the explanation. I think the issue can be closed.

0 replies

seanlaw · 2019-05-25T14:33:11Z

seanlaw
May 25, 2019
Maintainer

That’s great. We also lack a Tutorial for MSTUMP so that is also something that you could contribute as well if you have a nice data set that you can share.

0 replies

seanlaw · 2019-08-06T06:11:08Z

seanlaw
Aug 6, 2019
Maintainer

@ywy9876 FLUSS and FLOSS have been implemented for 1-dimensional data. You might want to take a look at Tutorial 3

0 replies

rnjv · 2020-09-22T02:09:52Z

rnjv
Sep 22, 2020

Was wondering if FLUSS or FLOSS is applicable to segment multidimensional time series. Any ideas on how to achieve this?

1 reply

huricha1 Jan 3, 2022

Do you solve the problem about multivariate time series segmentation?

seanlaw · 2020-09-22T02:16:33Z

seanlaw
Sep 22, 2020
Maintainer

@rnjv Unfortunately, it isn't clear how one would do this.

1 reply

huricha1 Jan 3, 2022

I want to segmentation on multivariate time series,what should use which method?
Thank you so much!

rnjv · 2020-09-26T02:17:42Z

rnjv
Sep 26, 2020

Could we port ESPRESSO - https://github.com/cruiseresearchgroup/ESPRESSO?

With dependency to https://github.com/cruiseresearchgroup/IGTS-python

0 replies

seanlaw · 2020-09-26T15:19:49Z

seanlaw
Sep 26, 2020
Maintainer

@rnjv Would you mind starting a new issue separate with this request and please clearly describe the problem that you are trying to solve? ESPRESSO may be beyond the scope of STUMPY.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Semantic segmentation on multi-dimensional Time Series #20

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 10 comments 2 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Semantic segmentation on multi-dimensional Time Series #20

ywy9876 May 25, 2019

Replies: 10 comments · 2 replies

seanlaw May 25, 2019 Maintainer

ywy9876 May 25, 2019 Author

seanlaw May 25, 2019 Maintainer

ywy9876 May 25, 2019 Author

seanlaw May 25, 2019 Maintainer

seanlaw Aug 6, 2019 Maintainer

rnjv Sep 22, 2020

huricha1 Jan 3, 2022

seanlaw Sep 22, 2020 Maintainer

huricha1 Jan 3, 2022

rnjv Sep 26, 2020

seanlaw Sep 26, 2020 Maintainer

ywy9876
May 25, 2019

Replies: 10 comments 2 replies

seanlaw
May 25, 2019
Maintainer

ywy9876
May 25, 2019
Author

seanlaw
May 25, 2019
Maintainer

ywy9876
May 25, 2019
Author

seanlaw
May 25, 2019
Maintainer

seanlaw
Aug 6, 2019
Maintainer

rnjv
Sep 22, 2020

seanlaw
Sep 22, 2020
Maintainer

rnjv
Sep 26, 2020

seanlaw
Sep 26, 2020
Maintainer