-
Notifications
You must be signed in to change notification settings - Fork 3.7k
introducing lists of existing columns in the fields of select queries' output #2491
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
introducing lists of existing columns in the fields of select queries' output #2491
Conversation
failing UT in java8 only.
|
return arg1; | ||
} | ||
|
||
List<String> newColumns = Lists.newArrayList(arg1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can't we just create a set?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, they do not need to be a List. I will change them into sets.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
About the failing UTs in java8 only. HashMaps of java7 and java8 are different in ordering elements. But, they are just little hash maps which we should not expect them to order elements though, AFAIK. |
resolve merge conflicts and rebase it. |
@CHOIJAEHONG1 Cannot see newly added fields in |
@navis AFAIC, there are not newly added fields in a query itself which |
@jaehc what is the main problem being addressed by this PR? Is it to improve the potential performance of the select query later on or to reduce the size of the output? |
ping @sirpkt |
@fjy Sorry for being late. This PR is intended to reduce the size of output of search queries by turning outputs into tabular format like csv, tsv, etc. |
👍 |
while (index < objects.length && !(objects[index] instanceof DateTime)) { | ||
values.add(new EventHolder(null, 0, (Map) objects[index++])); | ||
} | ||
|
||
retVal.add(new Result<>(timestamp, new SelectResultValue(null, values))); | ||
retVal.add(new Result<>(timestamp, new SelectResultValue(null, Sets.<String>newHashSet(), Sets.<String>newHashSet(), values))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a caching test that has some actual values in dimensions and metrics set ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done. I added code to test cached values of dimensions and metrics set in a select query.
Looks good to me, pending comments :- Cache key needs to be updated and need some more tests for caching |
Also create a github issue to track the future work needed to actually reduce the result size. |
@jaehc any chance of finishing this up? |
@fjy I will address the comment. |
@nishantmonu51 are you cool with the changes? |
@@ -66,7 +66,7 @@ | |||
*/ | |||
public class SelectQueryQueryToolChest extends QueryToolChest<Result<SelectResultValue>, SelectQuery> | |||
{ | |||
private static final byte SELECT_QUERY = 0x13; | |||
private static final byte SELECT_QUERY = 0x15; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This collides with the cache code in SearchQueryQueryToolchest
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done. It's fixed to 0x16.
test seems fine, one last thing new cache key for select query is same as the one for SearchQuery |
@nishantmonu51 Oops, my bad. I fixed it to 0x16. |
looks good to me, 👍 |
In druid 0.9.2 - list of dimensions and metrics was added to druid results Druid PR - apache/druid#2491 Since calcite uses its own parsing logic, this PR fixes the logic to ignore dimensions and metrics from resultset if present. Tested this with both Druid v0.9.1 and v0.9.2
In druid 0.9.2 - list of dimensions and metrics was added to druid results Druid PR - apache/druid#2491 Since calcite uses its own parsing logic, this PR fixes the logic to ignore dimensions and metrics from resultset if present. Tested this with both Druid v0.9.1 and v0.9.2
This PR is introducing
dimensions
andmetrics
fields in the output of aselect
query, as the below. Looking at these fields, I think we could easilyfind the full list of existing columns
in the whole events and then tell whether those column are adimension
or ametric
. We could also get rid of columns' names which arerepeated in events
lists, so as toreduce the output size
by means of these fields in the future. This would be especially beneficial when the output is huge and has a lot of records.