Skip to content

[CALCITE-5831] Add SOUNDEX function (enabled in Spark library) #3307

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Aug 4, 2023

Conversation

chucheng92
Copy link
Member

@chucheng92 chucheng92 commented Jul 10, 2023

What is the purpose of the change

Returns Soundex code of the string.

SELECT soundex('Miller');
M460

details pls see: https://issues.apache.org/jira/browse/CALCITE-5831
note: spark soundex not throw exception but return original string if the input string can't be mapped.

Brief change log

Add soundex function (enabled in Spark Library)

Verifying this change

SqlOperatorTests#testSoundex2Func()

@chucheng92 chucheng92 force-pushed the CALCITE-5831 branch 2 times, most recently from 1c1aaa1 to ac8961a Compare July 10, 2023 12:23
Copy link
Contributor

@tanclary tanclary left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty good I think. Is the behavior for the other dialects correct? I would imagine so but on the off chance it's not, we could avoid adding a whole extra operator. Otherwise just left a couple small comments.

@@ -227,6 +227,7 @@
import static org.apache.calcite.sql.fun.SqlLibraryOperators.SINH;
import static org.apache.calcite.sql.fun.SqlLibraryOperators.SORT_ARRAY;
import static org.apache.calcite.sql.fun.SqlLibraryOperators.SOUNDEX;
import static org.apache.calcite.sql.fun.SqlLibraryOperators.SOUNDEX2;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like SOUNDEX2 implies it takes 2 arguments, maybe SOUNDEX_SPARK would be a better name? Similar to the SUBSTR or FLOOR/CEIL functions.

@NobiGo
Copy link
Contributor

NobiGo commented Jul 11, 2023

@chucheng92 Please make the commit info stay the same as the Jira summary.

@chucheng92
Copy link
Member Author

@chucheng92 Please make the commit info stay the same as the Jira summary.

@NobiGo thanks. I added a commit for letting commiters to review. should i merge to one commit and force-push it?

@chucheng92
Copy link
Member Author

chucheng92 commented Jul 11, 2023

Looks pretty good I think. Is the behavior for the other dialects correct? I would imagine so but on the off chance it's not, we could avoid adding a whole extra operator. Otherwise just left a couple small comments.

Maybe the behavior is still a bit different, such as calcite mysql library(it limits 4 chars) now

mysql> SELECT SOUNDEX('Quadratically');
         -> 'Q363'

And mysql:

mysql> SELECT SOUNDEX('Quadratically');
         -> 'Q36324'

But this behavior is also different from spark, so we may need to deal with each dialect separately later. Given that these issues are orthogonal, I prepare to create another issue to address it. WDYT?

@chucheng92
Copy link
Member Author

chucheng92 commented Jul 11, 2023

@tanclary thanks for reviewing. I have updated this pr. please take a look.

Copy link
Contributor

@tanclary tanclary left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is looking good, I just left one comment about the operator table entry. If you wouldn't mind changing the commit message from Soundex->SOUNDEX whenever you squash your commits that'd be great. Thank you!

@chucheng92 chucheng92 requested a review from tanclary July 12, 2023 02:28
@chucheng92 chucheng92 force-pushed the CALCITE-5831 branch 3 times, most recently from 6c98a97 to b680dc3 Compare July 12, 2023 06:32
@chucheng92
Copy link
Member Author

@tanclary @olivrlee hi, I have addressed the problems and rebased calcite-main, PTAL. thanks.

@chucheng92 chucheng92 changed the title [CALCITE-5831] Add Soundex function (enabled in Spark library) [CALCITE-5831] Add SOUNDEX function (enabled in Spark library) Jul 12, 2023
Copy link
Contributor

@tanclary tanclary left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks great! Thanks for addressing all the comments. If no one has other suggestions I will merge later today.

@tanclary tanclary added the LGTM-will-merge-soon Overall PR looks OK. Only minor things left. label Jul 12, 2023
@chucheng92
Copy link
Member Author

@tanclary Hi, tanner. Main Code freeze has been closed. Can you help to merge this?

@tanclary
Copy link
Contributor

@tanclary Hi, tanner. Main Code freeze has been closed. Can you help to merge this?

I saw you opened an issue for making an addition to SqlBasicFunction and I believe you referenced this case. Should we resolve that so this case can use that logic?

@chucheng92
Copy link
Member Author

chucheng92 commented Jul 27, 2023

@tanclary Hi, tanner. Main Code freeze has been closed. Can you help to merge this?

I saw you opened an issue for making an addition to SqlBasicFunction and I believe you referenced this case. Should we resolve that so this case can use that logic?

ok, it makes sense, thanks.

@chucheng92
Copy link
Member Author

chucheng92 commented Jul 29, 2023

@tanclary hi, tanner. I have added a method of withReturnTypeInference and use it, just like Julian suggested in the issue to squash it in this PR instead of open a new PR. PTAL~ thanks a lot.

@chucheng92 chucheng92 requested a review from tanclary July 31, 2023 13:12
Copy link
Contributor

@tanclary tanclary left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks good and is just about ready. Would you mind writing a RelToSqlConverter test and ensure that it gets unparsed as SOUNDEX(..) and not SOUNDEX_SPARK(..)? Sorry for the delay in review I have been out the past few days.

@chucheng92
Copy link
Member Author

chucheng92 commented Aug 2, 2023

I think this looks good and is just about ready. Would you mind writing a RelToSqlConverter test and ensure that it gets unparsed as SOUNDEX(..) and not SOUNDEX_SPARK(..)? Sorry for the delay in review I have been out the past few days.

Hi, @tanclary I got your point. I have added it. PTAL.

@tanclary
Copy link
Contributor

tanclary commented Aug 2, 2023

I think this looks good and is just about ready. Would you mind writing a RelToSqlConverter test and ensure that it gets unparsed as SOUNDEX(..) and not SOUNDEX_SPARK(..)? Sorry for the delay in review I have been out the past few days.

Hi, @tanclary I got your point. I have added it. PTAL.

Looks good. Squash your commits please and I will merge soon.

@chucheng92
Copy link
Member Author

I think this looks good and is just about ready. Would you mind writing a RelToSqlConverter test and ensure that it gets unparsed as SOUNDEX(..) and not SOUNDEX_SPARK(..)? Sorry for the delay in review I have been out the past few days.

Hi, @tanclary I got your point. I have added it. PTAL.

Looks good. Squash your commits please and I will merge soon.

Done. thanks again.

@sonarqubecloud
Copy link

sonarqubecloud bot commented Aug 3, 2023

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 2 Code Smells

100.0% 100.0% Coverage
0.0% 0.0% Duplication

@chucheng92 chucheng92 requested a review from tanclary August 3, 2023 14:28
@tanclary tanclary merged commit 76ba489 into apache:main Aug 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
LGTM-will-merge-soon Overall PR looks OK. Only minor things left.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants