introducing lists of existing columns in the fields of select queries' output #2491

jaehc · 2016-02-18T08:27:14Z

This PR is introducing dimensions and metrics fields in the output of a select query, as the below. Looking at these fields, I think we could easily find the full list of existing columns in the whole events and then tell whether those column are a dimension or a metric. We could also get rid of columns' names which are repeated in events lists, so as to reduce the output size by means of these fields in the future. This would be especially beneficial when the output is huge and has a lot of records.

[ {
  "timestamp" : "2013-08-29T00:00:00.000Z",
  "result" : {
    "pagingIdentifiers" : {
      "wikipedia_2013-08-29T00:00:00.000Z_2013-08-30T00:00:00.000Z_2016-02-18T07:37:25.834Z" : 3
    },
    "dimensions" : [ "page", "language", "user", "unpatrolled", "newPage", "robot", "anonymous", "namespace", "continent", "country", "region", "city" ],
    "metrics" : [ "added", "count", "delta", "deleted" ],
    "events" : [ {
      "segmentId" : "wikipedia_2013-08-29T00:00:00.000Z_2013-08-30T00:00:00.000Z_2016-02-18T07:37:25.834Z",
      "offset" : 0,
      "event" : {
        "timestamp" : "2013-08-29T01:02:33.000Z",
        "region" : "Bay Area",
        "anonymous" : "false",
        "unpatrolled" : "true",
        "page" : "Gypsy Danger50",
        "continent" : "North America",
        "newPage" : "true",
        "robot" : "false",
        "language" : "en",
        "user" : "nuclear",
        "city" : "San Francisco",
        "country" : "United States",
        "namespace" : "article",
        "count" : 1,
        "added" : 57,
        "delta" : -143,
        "deleted" : 200
      }
    },

jaehc · 2016-02-19T00:27:05Z

failing UT in java8 only.
The order of the metric sequence is changed when the result is from a QueryableIndex. It's a somewhat strange thing.

testFullSelectNoResults[io.druid.query.FinalizeResultsQueryRunner@4a7761b1:descending=false](io.druid.query.select.SelectQueryRunnerTest)  Time elapsed: 0.007 sec  <<< FAILURE!
java.lang.AssertionError: expected:<[index, quality_uniques]> but was:<[quality_uniques, index]>
    at org.junit.Assert.fail(Assert.java:88)
    at org.junit.Assert.failNotEquals(Assert.java:743)
    at org.junit.Assert.assertEquals(Assert.java:118)
    at org.junit.Assert.assertEquals(Assert.java:144)
    at io.druid.query.select.SelectQueryRunnerTest.verify(SelectQueryRunnerTest.java:627)
    at io.druid.query.select.SelectQueryRunnerTest.testFullSelectNoResults(SelectQueryRunnerTest.java:474)

fjy · 2016-02-19T21:10:19Z

processing/src/main/java/io/druid/query/select/SelectBinaryFn.java

+      return arg1;
+    }
+
+    List<String> newColumns = Lists.newArrayList(arg1);


can't we just create a set?

Yes, they do not need to be a List. I will change them into sets.

jaehc · 2016-02-20T08:35:26Z

About the failing UTs in java8 only.
A list of available column names from a queryable index are turned into a HashMap to get a difference set, as in.
https://github.com/druid-io/druid/blob/master/processing/src/main/java/io/druid/segment/QueryableIndexStorageAdapter.java#L92

HashMaps of java7 and java8 are different in ordering elements. But, they are just little hash maps which we should not expect them to order elements though, AFAIK.

jaehc · 2016-03-03T11:01:35Z

resolve merge conflicts and rebase it.

navis · 2016-03-24T01:27:13Z

@CHOIJAEHONG1 Cannot see newly added fields in computeCacheKey. Could you add test case for it?

jaehc · 2016-03-25T14:24:11Z

@navis AFAIC, there are not newly added fields in a query itself which computeCacheKey uses to make a byte array. I think it would be better adding a flag to switch on/off the newly added functionality that is listing dimensions and metrics in a query output. Correct me if I am wrong.

fjy · 2016-06-28T21:57:25Z

@jaehc what is the main problem being addressed by this PR? Is it to improve the potential performance of the select query later on or to reduce the size of the output?

fjy · 2016-07-29T22:22:23Z

ping @sirpkt

jaehc · 2016-08-01T07:57:31Z

@fjy Sorry for being late. This PR is intended to reduce the size of output of search queries by turning outputs into tabular format like csv, tsv, etc.

fjy · 2016-08-11T18:12:59Z

👍

nishantmonu51 · 2016-08-17T15:12:25Z

server/src/test/java/io/druid/client/CachingClusteredClientTest.java

      while (index < objects.length && !(objects[index] instanceof DateTime)) {
        values.add(new EventHolder(null, 0, (Map) objects[index++]));
      }

-      retVal.add(new Result<>(timestamp, new SelectResultValue(null, values)));
+      retVal.add(new Result<>(timestamp, new SelectResultValue(null, Sets.<String>newHashSet(), Sets.<String>newHashSet(), values)));


can you add a caching test that has some actual values in dimensions and metrics set ?

done. I added code to test cached values of dimensions and metrics set in a select query.

nishantmonu51 · 2016-08-17T15:25:45Z

Looks good to me, pending comments :- Cache key needs to be updated and need some more tests for caching

nishantmonu51 · 2016-08-17T15:36:25Z

Also create a github issue to track the future work needed to actually reduce the result size.

fjy · 2016-08-18T17:40:24Z

@jaehc any chance of finishing this up?

…' output

jaehc · 2016-08-23T08:15:43Z

@fjy I will address the comment.

fjy · 2016-08-24T23:45:47Z

@nishantmonu51 are you cool with the changes?

nishantmonu51 · 2016-08-25T08:23:13Z

processing/src/main/java/io/druid/query/select/SelectQueryQueryToolChest.java

@@ -66,7 +66,7 @@
 */
 public class SelectQueryQueryToolChest extends QueryToolChest<Result<SelectResultValue>, SelectQuery>
 {
-  private static final byte SELECT_QUERY = 0x13;
+  private static final byte SELECT_QUERY = 0x15;


This collides with the cache code in SearchQueryQueryToolchest

done. It's fixed to 0x16.

nishantmonu51 · 2016-08-25T08:24:42Z

test seems fine, one last thing new cache key for select query is same as the one for SearchQuery

jaehc · 2016-08-25T15:07:36Z

@nishantmonu51 Oops, my bad. I fixed it to 0x16.

nishantmonu51 · 2016-08-25T16:07:06Z

looks good to me, 👍

In druid 0.9.2 - list of dimensions and metrics was added to druid results Druid PR - apache/druid#2491 Since calcite uses its own parsing logic, this PR fixes the logic to ignore dimensions and metrics from resultset if present. Tested this with both Druid v0.9.1 and v0.9.2

jaehc changed the title ~~introducing lists of available columns in the fields of select queries' output~~ introducing lists of existing columns in the fields of select queries' output Feb 18, 2016

fjy added the Improvement label Feb 19, 2016

fjy added this to the 0.9.1 milestone Feb 19, 2016

fjy closed this Feb 19, 2016

fjy reopened this Feb 19, 2016

fjy reviewed Feb 19, 2016
View reviewed changes

fjy modified the milestones: 0.9.2, 0.9.1 Feb 19, 2016

nishantmonu51 reviewed Aug 17, 2016
View reviewed changes

introducing lists of existing columns in the fields of select queries…

abf36e9

…' output

jaehc added 2 commits August 24, 2016 10:00

rebase master

64d931b

address the comment. add test code for select query caching

1f19a1b

nishantmonu51 reviewed Aug 25, 2016
View reviewed changes

change the cache code in SelectQueryQueryToolChest to 0x16

3213471

nishantmonu51 merged commit 2e0f253 into apache:master Aug 25, 2016

nishantmonu51 mentioned this pull request Mar 31, 2017

[CALCITE-1734] Fix select query result parsing with druid 0.9.2 apache/calcite#416

Closed

introducing lists of existing columns in the fields of select queries' output #2491

introducing lists of existing columns in the fields of select queries' output #2491

Uh oh!

Conversation

jaehc commented Feb 18, 2016

Uh oh!

jaehc commented Feb 19, 2016

Uh oh!

fjy Feb 19, 2016

Choose a reason for hiding this comment

Uh oh!

jaehc Feb 20, 2016

Choose a reason for hiding this comment

Uh oh!

jaehc Feb 22, 2016

Choose a reason for hiding this comment

Uh oh!

jaehc commented Feb 20, 2016

Uh oh!

jaehc commented Mar 3, 2016

Uh oh!

navis commented Mar 24, 2016

Uh oh!

jaehc commented Mar 25, 2016

Uh oh!

fjy commented Jun 28, 2016

Uh oh!

fjy commented Jul 29, 2016

Uh oh!

jaehc commented Aug 1, 2016

Uh oh!

fjy commented Aug 11, 2016

Uh oh!

nishantmonu51 Aug 17, 2016

Choose a reason for hiding this comment

Uh oh!

jaehc Aug 24, 2016

Choose a reason for hiding this comment

Uh oh!

nishantmonu51 commented Aug 17, 2016

Uh oh!

nishantmonu51 commented Aug 17, 2016

Uh oh!

fjy commented Aug 18, 2016

Uh oh!

jaehc commented Aug 23, 2016

Uh oh!

fjy commented Aug 24, 2016

Uh oh!

nishantmonu51 Aug 25, 2016

Choose a reason for hiding this comment

Uh oh!

jaehc Aug 25, 2016

Choose a reason for hiding this comment

Uh oh!

nishantmonu51 commented Aug 25, 2016

Uh oh!

jaehc commented Aug 25, 2016

Uh oh!

nishantmonu51 commented Aug 25, 2016

Uh oh!

Uh oh!