Skip to content

introducing lists of existing columns in the fields of select queries' output #2491

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Aug 25, 2016
Merged

introducing lists of existing columns in the fields of select queries' output #2491

merged 4 commits into from
Aug 25, 2016

Conversation

jaehc
Copy link
Contributor

@jaehc jaehc commented Feb 18, 2016

This PR is introducing dimensions and metrics fields in the output of a select query, as the below. Looking at these fields, I think we could easily find the full list of existing columns in the whole events and then tell whether those column are a dimension or a metric. We could also get rid of columns' names which are repeated in events lists, so as to reduce the output size by means of these fields in the future. This would be especially beneficial when the output is huge and has a lot of records.

[ {
  "timestamp" : "2013-08-29T00:00:00.000Z",
  "result" : {
    "pagingIdentifiers" : {
      "wikipedia_2013-08-29T00:00:00.000Z_2013-08-30T00:00:00.000Z_2016-02-18T07:37:25.834Z" : 3
    },
    "dimensions" : [ "page", "language", "user", "unpatrolled", "newPage", "robot", "anonymous", "namespace", "continent", "country", "region", "city" ],
    "metrics" : [ "added", "count", "delta", "deleted" ],
    "events" : [ {
      "segmentId" : "wikipedia_2013-08-29T00:00:00.000Z_2013-08-30T00:00:00.000Z_2016-02-18T07:37:25.834Z",
      "offset" : 0,
      "event" : {
        "timestamp" : "2013-08-29T01:02:33.000Z",
        "region" : "Bay Area",
        "anonymous" : "false",
        "unpatrolled" : "true",
        "page" : "Gypsy Danger50",
        "continent" : "North America",
        "newPage" : "true",
        "robot" : "false",
        "language" : "en",
        "user" : "nuclear",
        "city" : "San Francisco",
        "country" : "United States",
        "namespace" : "article",
        "count" : 1,
        "added" : 57,
        "delta" : -143,
        "deleted" : 200
      }
    },

@jaehc jaehc changed the title introducing lists of available columns in the fields of select queries' output introducing lists of existing columns in the fields of select queries' output Feb 18, 2016
@jaehc
Copy link
Contributor Author

jaehc commented Feb 19, 2016

failing UT in java8 only.
The order of the metric sequence is changed when the result is from a QueryableIndex. It's a somewhat strange thing.

testFullSelectNoResults[io.druid.query.FinalizeResultsQueryRunner@4a7761b1:descending=false](io.druid.query.select.SelectQueryRunnerTest)  Time elapsed: 0.007 sec  <<< FAILURE!
java.lang.AssertionError: expected:<[index, quality_uniques]> but was:<[quality_uniques, index]>
    at org.junit.Assert.fail(Assert.java:88)
    at org.junit.Assert.failNotEquals(Assert.java:743)
    at org.junit.Assert.assertEquals(Assert.java:118)
    at org.junit.Assert.assertEquals(Assert.java:144)
    at io.druid.query.select.SelectQueryRunnerTest.verify(SelectQueryRunnerTest.java:627)
    at io.druid.query.select.SelectQueryRunnerTest.testFullSelectNoResults(SelectQueryRunnerTest.java:474)

@fjy fjy added this to the 0.9.1 milestone Feb 19, 2016
@fjy fjy closed this Feb 19, 2016
@fjy fjy reopened this Feb 19, 2016
return arg1;
}

List<String> newColumns = Lists.newArrayList(arg1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't we just create a set?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, they do not need to be a List. I will change them into sets.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

@fjy fjy modified the milestones: 0.9.2, 0.9.1 Feb 19, 2016
@jaehc
Copy link
Contributor Author

jaehc commented Feb 20, 2016

About the failing UTs in java8 only.
A list of available column names from a queryable index are turned into a HashMap to get a difference set, as in.
https://github.com/druid-io/druid/blob/master/processing/src/main/java/io/druid/segment/QueryableIndexStorageAdapter.java#L92

HashMaps of java7 and java8 are different in ordering elements. But, they are just little hash maps which we should not expect them to order elements though, AFAIK.

@jaehc
Copy link
Contributor Author

jaehc commented Mar 3, 2016

resolve merge conflicts and rebase it.

@navis
Copy link
Contributor

navis commented Mar 24, 2016

@CHOIJAEHONG1 Cannot see newly added fields in computeCacheKey. Could you add test case for it?

@jaehc
Copy link
Contributor Author

jaehc commented Mar 25, 2016

@navis AFAIC, there are not newly added fields in a query itself which computeCacheKey uses to make a byte array. I think it would be better adding a flag to switch on/off the newly added functionality that is listing dimensions and metrics in a query output. Correct me if I am wrong.

@fjy
Copy link
Contributor

fjy commented Jun 28, 2016

@jaehc what is the main problem being addressed by this PR? Is it to improve the potential performance of the select query later on or to reduce the size of the output?

@fjy
Copy link
Contributor

fjy commented Jul 29, 2016

ping @sirpkt

@jaehc
Copy link
Contributor Author

jaehc commented Aug 1, 2016

@fjy Sorry for being late. This PR is intended to reduce the size of output of search queries by turning outputs into tabular format like csv, tsv, etc.

@fjy
Copy link
Contributor

fjy commented Aug 11, 2016

👍

while (index < objects.length && !(objects[index] instanceof DateTime)) {
values.add(new EventHolder(null, 0, (Map) objects[index++]));
}

retVal.add(new Result<>(timestamp, new SelectResultValue(null, values)));
retVal.add(new Result<>(timestamp, new SelectResultValue(null, Sets.<String>newHashSet(), Sets.<String>newHashSet(), values)));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a caching test that has some actual values in dimensions and metrics set ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done. I added code to test cached values of dimensions and metrics set in a select query.

@nishantmonu51
Copy link
Member

Looks good to me, pending comments :- Cache key needs to be updated and need some more tests for caching

@nishantmonu51
Copy link
Member

Also create a github issue to track the future work needed to actually reduce the result size.

@fjy
Copy link
Contributor

fjy commented Aug 18, 2016

@jaehc any chance of finishing this up?

@jaehc
Copy link
Contributor Author

jaehc commented Aug 23, 2016

@fjy I will address the comment.

@fjy
Copy link
Contributor

fjy commented Aug 24, 2016

@nishantmonu51 are you cool with the changes?

@@ -66,7 +66,7 @@
*/
public class SelectQueryQueryToolChest extends QueryToolChest<Result<SelectResultValue>, SelectQuery>
{
private static final byte SELECT_QUERY = 0x13;
private static final byte SELECT_QUERY = 0x15;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This collides with the cache code in SearchQueryQueryToolchest

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done. It's fixed to 0x16.

@nishantmonu51
Copy link
Member

test seems fine, one last thing new cache key for select query is same as the one for SearchQuery

@jaehc
Copy link
Contributor Author

jaehc commented Aug 25, 2016

@nishantmonu51 Oops, my bad. I fixed it to 0x16.

@nishantmonu51
Copy link
Member

looks good to me, 👍

@nishantmonu51 nishantmonu51 merged commit 2e0f253 into apache:master Aug 25, 2016
nishantmonu51 added a commit to nishantmonu51/calcite that referenced this pull request Mar 31, 2017
In druid 0.9.2 - list of dimensions and metrics was added to druid
results Druid PR - apache/druid#2491

Since calcite uses its own parsing logic, this PR fixes the logic to
ignore dimensions and metrics from resultset if present.
Tested this with both Druid v0.9.1 and v0.9.2
nishantmonu51 added a commit to nishantmonu51/calcite that referenced this pull request Mar 31, 2017
In druid 0.9.2 - list of dimensions and metrics was added to druid
results Druid PR - apache/druid#2491

Since calcite uses its own parsing logic, this PR fixes the logic to
ignore dimensions and metrics from resultset if present.
Tested this with both Druid v0.9.1 and v0.9.2
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants