Skip to content

[CALCITE-5695] Add MAP_KEYS, MAP_VALUES function (enabled in Spark li… #3194

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 21, 2023

Conversation

liuyongvs
Copy link
Contributor

@liuyongvs liuyongvs commented May 10, 2023

MAP_KEYS - Returns an unordered array containing the keys of the map.
MAP_VALUES - Returns an unordered array containing the values of the map.
For more details
https://spark.apache.org/docs/latest/sql-ref-functions-builtin.html

@liuyongvs
Copy link
Contributor Author

hi @JiajunBernoulli could you also help review it?

@@ -3797,6 +3798,16 @@ public static List distinct(List list) {
return new ArrayList<>(result);
}

/** Support the MAP_KEYS function. */
public static List mapKeys(Map map) {
return (List) map.keySet().stream().collect(Collectors.toList());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the expected result if the input map is null?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return null @zoudan

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then we should handle null here? It seems a NPE Exception will be thrown?

Copy link
Contributor Author

@liuyongvs liuyongvs May 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zoudan not needs. the runtime will does it like other engine, the NullPolicy does that

image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and why i doesn't add unit test map_keys(cast(null as map[keyType, valueType])) is because the calcite doesn't supported, which you can see here #3189
java.lang.RuntimeException: Error while parsing query: select cast(map['washington', 1, null, 44] as map[CHAR(10), INTEGER NOT NULL>] from (values (1))
at org.apache.calcite.sql.test.AbstractSqlTester.parseAndValidate(AbstractSqlTester.java:157)
at org.apache.calcite.sql.test.AbstractSqlTester.validateAndThen(AbstractSqlTester.java:248)
at org.apache.calcite.sql.test.AbstractSqlTester.checkColumnType(AbstractSqlTester.java:177)
at org.apache.calcite.test.SqlValidatorFixture.columnType(

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use new ArrayList<>(map.keySets())?

@liuyongvs liuyongvs force-pushed the map_keys_map_values branch 2 times, most recently from 79ab686 to f1fad81 Compare May 10, 2023 10:10
@JiajunBernoulli JiajunBernoulli self-assigned this May 10, 2023
/** Tests {@code MAP_KEYS} function from Spark. */
@Test void testMapKeysFunc() {
final SqlOperatorFixture f = fixture()
.setFor(SqlLibraryOperators.MAP_KEYS)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it could be good to write a test that fails if library is not set to Spark (you only want it to work for Spark).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added @tanclary

@@ -2723,6 +2723,8 @@ BigQuery's type system uses confusingly different names for types and functions:
| m | TO_BASE64(string) | Converts the *string* to base-64 encoded form and returns a encoded string
| b m | FROM_BASE64(string) | Returns the decoded result of a base-64 *string* as a string
| b o | LTRIM(string) | Returns *string* with all blanks removed from the start
| s | MAP_KEYS(map) | Returns an unordered array containing the keys of the map.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove '.'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed @julianhyde

@liuyongvs
Copy link
Contributor Author

hi @julianhyde @tanclary @zoudan fix all your reviews, please have a look again,thanks. and i will supports all the spark collection functions latter.

@liuyongvs liuyongvs force-pushed the map_keys_map_values branch from 1110d8d to 48def8d Compare May 12, 2023 01:42
@liuyongvs
Copy link
Contributor Author

hi @JiajunBernoulli do you have time to review it ?

@JiajunBernoulli
Copy link
Contributor

hi @JiajunBernoulli do you have time to review it ?

I usually only have time on weekends.

So I will review it tomorrow.

@julianhyde
Copy link
Contributor

@liuyongvs Can you respond to my comment in Jira? What is an 'unordered array'?

@@ -2723,6 +2723,8 @@ BigQuery's type system uses confusingly different names for types and functions:
| m | TO_BASE64(string) | Converts the *string* to base-64 encoded form and returns a encoded string
| b m | FROM_BASE64(string) | Returns the decoded result of a base-64 *string* as a string
| b o | LTRIM(string) | Returns *string* with all blanks removed from the start
| s | MAP_KEYS(map) | Returns an unordered array containing the keys of the map
| s | MAP_VALUES(map) | Returns an unordered array containing the values of the map
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

map should be *map*.

* <p>For example, given {@code (INTEGER, DATE) MAP}, returns
* {@code DATE ARRAY}.
*/
public static final SqlReturnTypeInference TO_MAP_VALUES_NULLABLE =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should have two types: not null and nullable.

  public static final SqlReturnTypeInference TO_MAP_KEYS =
      ARG0.andThen(SqlTypeTransforms.TO_MAP_KEYS);

  public static final SqlReturnTypeInference TO_MAP_KEYS_NULLABLE =
      TO_MAP_KEYS.andThen(SqlTypeTransforms.TO_NULLABLE);


/** Support the MAP_VALUES function. */
public static List mapValues(Map map) {
return (List) map.values().stream().collect(Collectors.toList());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return new ArrayList<>(map.values());

I think it's clearer and there won't be NPE.

@JiajunBernoulli JiajunBernoulli added the discussion-in-jira There's open discussion in JIRA to be resolved before proceeding with the PR label May 13, 2023
@JiajunBernoulli
Copy link
Contributor

I left some comments for the PR.

And we should discuss 'unordered array' in JIRA.

@liuyongvs
Copy link
Contributor Author

liuyongvs commented May 14, 2023

fix all your review @JiajunBernoulli
and the spark doesn't support MULTISET type. an unordered array means the element doesn't sort.
and i response in the issue, sorry for late response @julianhyde

@liuyongvs liuyongvs requested a review from JiajunBernoulli May 14, 2023 12:26
@@ -3797,6 +3798,16 @@ public static List distinct(List list) {
return new ArrayList<>(result);
}

/** Support the MAP_KEYS function. */
public static List mapKeys(Map map) {
return (List) map.keySet().stream().collect(Collectors.toList());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use new ArrayList<>(map.keySet())?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@liuyongvs liuyongvs requested a review from JiajunBernoulli May 15, 2023 03:51
@liuyongvs
Copy link
Contributor Author

fixed , thanks for your details review @JiajunBernoulli. do you also have any comments.
If not, i will squash the commits.

@liuyongvs liuyongvs force-pushed the map_keys_map_values branch 2 times, most recently from ecd2ba1 to 822509a Compare May 16, 2023 11:18
@liuyongvs
Copy link
Contributor Author

have squashed the commits, could anyone merge it thanks @JiajunBernoulli @julianhyde @zoudan

@liuyongvs liuyongvs force-pushed the map_keys_map_values branch 2 times, most recently from 2d5c4b0 to 42c02f2 Compare May 17, 2023 00:01
@JiajunBernoulli JiajunBernoulli removed the discussion-in-jira There's open discussion in JIRA to be resolved before proceeding with the PR label May 17, 2023
@JiajunBernoulli
Copy link
Contributor

Conflicting files because #3198

Please resolve them.

@liuyongvs
Copy link
Contributor Author

hi @JiajunBernoulli fixed the conflict and ci passed

@JiajunBernoulli
Copy link
Contributor

Please delete '.' in commit message.

@liuyongvs liuyongvs force-pushed the map_keys_map_values branch from a212a06 to 281d094 Compare May 18, 2023 02:01
@liuyongvs
Copy link
Contributor Author

@JiajunBernoulli fixed all the current prs message ending with '.'

@sonarqubecloud
Copy link

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 10 Code Smells

95.7% 95.7% Coverage
0.0% 0.0% Duplication

@JiajunBernoulli JiajunBernoulli added the LGTM-will-merge-soon Overall PR looks OK. Only minor things left. label May 19, 2023
@JiajunBernoulli JiajunBernoulli merged commit 9c33d7a into apache:main May 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
LGTM-will-merge-soon Overall PR looks OK. Only minor things left.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants