Skip to content

[CALCITE-6283] Function ARRAY_APPEND with a NULL array argument crashes with NullPointerException #3706

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Mar 11, 2024

Conversation

mihaibudiu
Copy link
Contributor

@mihaibudiu mihaibudiu commented Feb 27, 2024

No description provided.

f.checkNull("array_append(null, 2)");
f.checkNull("array_prepend(null, 2)");
f.checkNull("array_remove(null, 2)");
f.checkNull("array_contains(null, 2)");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have tests for appending null elements

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tanclary Yes, similar to:
f.checkNull("array_append(cast(null as integer array), 1)");

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant a test where the element is null, not the array. But i found that we do so that's good.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested it in spark3.4. It seems that the first parameter of array_append is null. Why do we need to change it like this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand your question

Copy link
Member

@caicancai caicancai Feb 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I mean is that spark3.4 version does not support array_append(null, 1), it will report an error. Currently, array_append and array_prepend are adapted for spark in calcite. If spark does not support array_append(null , 1), why do we need to return null? I am not sure whether higher versions of spark support it.
I don't know if my opinion is useful, if it is problematic, please ignore it. thank you.
捕获

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah. Yes I agree with you. What do you think @mihaibudiu ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have changed the type checker to reject NULL literals for arrays.
However, it seems to be that Spark is inconsistent, since it allows NULL values for arrays at runtime, but not at compile time. This is probably because Spark uses a different type checking/inference algorithm.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have changed the type checker to reject NULL literals for arrays. However, it seems to be that Spark is inconsistent, since it allows NULL values for arrays at runtime, but not at compile time. This is probably because Spark uses a different type checking/inference algorithm.

spark sql, after generating the Unresolved Logical Plan, runs a series of rules to turn it into the Logical Plan, which solves some type conversion issues there, and the type handling of array_append(null,2) seems to be in that part

@chucheng92
Copy link
Member

chucheng92 commented Feb 28, 2024

pls also see https://issues.apache.org/jira/browse/CALCITE-5893 to support array_except/array_intersect. they also should reject NULL literals.

calcite spark:

// return null
select array_contains(array[1, 2], null);
// return null
select array_except(array[1, 2, 3], null)
// return null
select array_intersect(array[1,2,3], null)

they should reject NULL.

@chucheng92
Copy link
Member

chucheng92 commented Feb 28, 2024

FYI: One background is that many spark functions related to calcite's array are contributed by one contributor. One basic strategy is to turn on NullPolicy, which means that NULL literals, including NULL and cast(null as xxx) are returned null directly. But in fact, spark will throw an exception in the former. We need to address the former case like this PR.

@mihaibudiu
Copy link
Contributor Author

The situation may be even more complicated if one considers cases where the compiler needs to insert implicit casts. I don't have a spark engine handy to test what Spark does, but if the array element types and the added element do not have the same type, Spark may choose to insert implicit casts in different ways than Calcite does.

Another fact I don't have a good handle on is the nullability of array elements: https://issues.apache.org/jira/browse/CALCITE-6275

@chucheng92
Copy link
Member

array_append needs to be capitalized in commit name/jira name/pr name.

@chucheng92
Copy link
Member

chucheng92 commented Feb 28, 2024

The situation may be even more complicated if one considers cases where the compiler needs to insert implicit casts. I don't have a spark engine handy to test what Spark does, but if the array element types and the added element do not have the same type, Spark may choose to insert implicit casts in different ways than Calcite does.

Another fact I don't have a good handle on is the nullability of array elements: https://issues.apache.org/jira/browse/CALCITE-6275

make sense. I just read it wrong. Your PR only handles the situation when array arg is NULL.
My point is that the element is also a NULL literal.

@mihaibudiu
Copy link
Contributor Author

make sense. I just read it wrong. Your PR only handles the situation when array arg is NULL. My point is that the element is also a NULL literal.

There are already tests for this case, e.g.: testArrayAppendFunc

@mihaibudiu
Copy link
Contributor Author

pls also see https://issues.apache.org/jira/browse/CALCITE-5893 to support array_except/array_intersect. they also should reject NULL literals.

calcite spark:

// return null select array_contains(array[1, 2], null); // return null select array_except(array[1, 2, 3], null) // return null select array_intersect(array[1,2,3], null)

they should reject NULL.

That's great, we should merge that PR if it solves the problem. I will try to review it.

@mihaibudiu
Copy link
Contributor Author

array_append needs to be capitalized in commit name/jira name/pr name.

I will update these when I squash the commits

@mihaibudiu mihaibudiu changed the title [CALCITE-6283] Function array_append with a NULL array argument crashes with NullPointerException [CALCITE-6283] Function ARRAY_APPEND with a NULL array argument crashes with NullPointerException Feb 28, 2024
@chucheng92
Copy link
Member

chucheng92 commented Feb 29, 2024

make sense. I just read it wrong. Your PR only handles the situation when array arg is NULL. My point is that the element is also a NULL literal.

There are already tests for this case, e.g.: testArrayAppendFunc

If the first array arg is NULL, then it is an exception for all these array functions. There is no doubt about this, but if the second arg is NULL, the semantics are different. array_contains/array_except/array_intersect is currently wrong. And you are right to mention array_append/array_prepend as it does support inserting null.

image

yes, I have fixed array_contains/array_except/array_intersect in that PR, and that PR will affect this PR.

@mihaibudiu
Copy link
Contributor Author

Let's merge [CALCITE-5893] first and I will rebase this one.

Comment on lines 6286 to 6290
f.checkFails("^array_append(null, 2)^", expected, false);
f.checkFails("^array_prepend(null, 2)^", expected, false);
f.checkFails("^array_remove(null, 2)^", expected, false);
f.checkFails("^array_contains(null, 2)^", expected, false);
f.checkFails("^array_position(null, 2)^", expected, false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems there is a similar issue for array_min/array_max should we fix it within this issue or create a separate one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since I got started I will add these too

@mihaibudiu
Copy link
Contributor Author

In my last commit I have fixed array_min and array_max as well.
I have slightly modified the helper function introduced by @chucheng92 in #3355 and used it for typechecking more functions.
I have also corrected a test for array_contains, according to the Databricks manual, that test was incorrect.
https://docs.databricks.com/en/sql/language-manual/functions/array_contains.html

this.allowCast = allowCast;
public ArrayElementOperandTypeChecker(boolean arrayMayBeNull, boolean elementMayBeNull) {
this.arrayMayBeNull = arrayMayBeNull;
this.elementMayBeNull = elementMayBeNull;
}

//~ Methods ----------------------------------------------------------------

@Override public boolean checkOperandTypes(
SqlCallBinding callBinding,
boolean throwOnFailure) {
Copy link
Member

@chucheng92 chucheng92 Mar 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If array is not first argument? such as array_xxx(1, array, 2). It seems that this typechecker only works in this situation.

But I think if this happens, we can add another typechecker or refactor, such as passing in index to indicate the index of the array parameter.

@chucheng92
Copy link
Member

overall looks good to me +1

…es with NullPointerException

Signed-off-by: Mihai Budiu <[email protected]>
Copy link

@mihaibudiu
Copy link
Contributor Author

Interpreting the +1 as a "go ahead and merge"

@mihaibudiu mihaibudiu merged commit 90beb01 into apache:main Mar 11, 2024
@mihaibudiu mihaibudiu deleted the issue6283 branch March 11, 2024 19:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants