The Wayback Machine - https://web.archive.org/web/20230220164531/https://github.com/elastic/elasticsearch/issues/51691
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow percentiles_bucket use interpolation #51691

Open
gbarba opened this issue Jan 30, 2020 · 6 comments · May be fixed by #62922
Open

Allow percentiles_bucket use interpolation #51691

gbarba opened this issue Jan 30, 2020 · 6 comments · May be fixed by #62922
Labels
:Analytics/Aggregations Aggregations >enhancement good first issue low hanging fruit help wanted adoptme Team:Analytics Meta label for analytics/geo team

Comments

@gbarba
Copy link

gbarba commented Jan 30, 2020

Change the behavior or add a param to make the percentiles_bucket pipeline aggregation interpolate the result instead of return the nearest value.

The current behaviour is expected as it's properly documented and it was also discussed in the PR that introduced this pipeline (#13186 (comment)).

But I find it annoying that the percentils metrics aggregation interpolates the value so both are not consistent.

@pgomulka pgomulka added the :Analytics/Aggregations Aggregations label Jan 31, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo (:Analytics/Aggregations)

@rjernst rjernst added the Team:Analytics Meta label for analytics/geo team label May 4, 2020
@polyfractal polyfractal added good first issue low hanging fruit help wanted adoptme and removed team-discuss labels Jul 8, 2020
@alpparkan
Copy link

Hi, if there is an agreed solution within the team I can work on this.

@alpparkan
Copy link

Hi @polyfractal, I was making the related changes to calculate percentiles with interpolation but there are couple of questions I want to ask before. Should I change the default behavior or add a new parameter to use interpolation? Also, is it ok to use percentiles from Apache Commons Math? If It's ok to use Commons but you want to use Excel style interpolation, we need to update the Apache Commons Math version from 3.2 to 3.4(min) to use Excel style computation or we can implement our own.

@polyfractal
Copy link
Contributor

Thanks for the ping, missed your prior message :)

We try to avoid including dependencies unless we really need them, especially to the core server module. So in this case I think we should just implement the interpolation method of choice as they are relatively small.

We probably don't need the fancy interpolations though, or at least not all of them (or in the first PR). I was thinking a simple linear interpolation between adjacent values if the requested percentile doesn't land squarely on a value. We can certainly add more, but I think that's the most important / most used variation (especially since this agg is dealing with a complete "population" of values, not a sample)

Should I change the default behavior or add a new parameter to use interpolation?

Hmm, let's add a new parameter, that way we can add more interpolation types in the future if we want. Perhaps interpolation: none is the default, and then we can add interpolation: linear, etc for new types

@Rockstar5645
Copy link

Is @alpparkan still working on this? Can I try it?

@alpparkan
Copy link

Hi @Rockstar5645, I just finished the implementation and I'll send a PR in a couple days.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/Aggregations Aggregations >enhancement good first issue low hanging fruit help wanted adoptme Team:Analytics Meta label for analytics/geo team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants