The Wayback Machine - https://web.archive.org/web/20210914133236/https://github.com/opencv/opencv/pull/17675
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GSoC] Add digit and text recognition samples. #17675

Merged
merged 1 commit into from Aug 22, 2020

Conversation

@zihaomu
Copy link
Contributor

@zihaomu zihaomu commented Jun 27, 2020

Hi, this is my GSoC project to add digit and text recognition samples.

Status Update:

The detailed tutorial of OCR models usage method and how to train your own OCR model have been added to the doc/tutorials/dnn/dnn_OCR/dnn_OCR.markdown.

1. digit recognition

Take the live image from the camera, use connected component analysis to detect potential regions with each digit, and use the LeNet to classify.
With CPU only (i5-8300), it can achieve 12 FPS.
demo1

2. scene text recognition

My laptop environment is CPU: i5-8300, GPU: 1050, Ubuntu 18
Take the live image from the camera, use EAST as text detector. After getting the detector output, crop these bounding box as the input of the text recognizer based on VGG Net. Finally, print the result near the box. Using GPU can achieve around 9FPS.

After loading the model into OpenCV, test the performance of the text recognition model on different data sets. And the result is filled in following table:

Model name IIIT5k(%) SVT(%) ICDAR03(%) ICDAR13(%) ICDAR15(%) SVTP(%) CUTE80(%) average acc (%) FPS parameter( x10^6 )
DenseNet-CTC 72.267 67.39 82.814 80 48.387 49.457 42.509 63.260571 134.63 0.239
DenseNet-BiLSTM-CTC 73.767 72.334 86.159 83.153 50.676 57.984 49.826 67.699857 27.59 3.636
VGG-CTC 75.967 75.425 85.928 83.547 54.891 57.519 50.174 69.064429 108.04 5.569
CRNN_VGG-BiLSTM-CTC 82.633 82.071 92.964 88.867 66.285 71.008 62.369 78.028143 31.94 8.452
ResNet-CTC 84 84.08 92.388 88.966 67.742 74.729 67.596 79.928714 15.87 44.283

These pre-trained models can be found here https://drive.google.com/drive/folders/1cTbQ3nuZG-EKWak6emD_s8_hHXWz7lAr?usp=sharing. The FPS in the table is the performance of the text recognition model on my computer, and does not include the text detection model.

demo2
demo3

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

  • I agree to contribute to the project under OpenCV (BSD) License.
  • To the best of my knowledge, the proposed patch is not based on a code under GPL or other license that is incompatible with OpenCV
  • The PR is proposed to proper branch
  • There is reference to original bug report and related work
  • There is accuracy test, performance test and test data in opencv_extra repository, if applicable
    Patch to opencv_extra has the same branch name.
  • The feature is well documented and sample code can be built with the project CMake
@alalek alalek added the GSoC label Jun 27, 2020
@zihaomu zihaomu mentioned this pull request Jun 27, 2020
3 of 6 tasks
samples/cpp/digits.cpp Outdated Show resolved Hide resolved
@alalek
Copy link
Contributor

@alalek alalek commented Jun 29, 2020

I have closed the previous PR (#17462) which contains LeNet_digit recognition only.

Users may want to compare different algorithms/approaches on their tasks.

@zihaomu
Copy link
Contributor Author

@zihaomu zihaomu commented Jun 30, 2020

I have closed the previous PR (#17462) which contains LeNet_digit recognition only.

Users may want to compare different algorithms/approaches on their tasks.

Hi, I keep the original digits.cpp as digits_SVM.cpp, and add the proposal as digits_LeNet.cpp.

@zihaomu zihaomu requested a review from dkurt Jun 30, 2020
Copy link
Contributor Author

@zihaomu zihaomu left a comment

I modified the original font size, the previous one is too big to affect the display effect.

samples/dnn/text_detection.cpp Outdated Show resolved Hide resolved
@vpisarev
Copy link
Contributor

@vpisarev vpisarev commented Jul 29, 2020

@zihaomu, thank you very much! I've tested your code and it works great!

Can you, please, also update text_detection.py to use the same model for detection and add OCR part?

samples/cpp/digits_LeNet.cpp Outdated Show resolved Hide resolved
@zihaomu
Copy link
Contributor Author

@zihaomu zihaomu commented Jul 31, 2020

@zihaomu, thank you very much! I've tested your code and it works great!

Can you, please, also update text_detection.py to use the same model for detection and add OCR part?

@vpisarev Thank you for your reply.
Indeed, the implementation of text_detection.py does have errors. In order not to affect this PR of GSoC, I have created a new PR #17992.

samples/dnn/text_detection.cpp Outdated Show resolved Hide resolved
samples/cpp/digits_LeNet.cpp Outdated Show resolved Hide resolved
samples/cpp/digits_LeNet.cpp Outdated Show resolved Hide resolved
samples/dnn/text_detection.py Outdated Show resolved Hide resolved
@dkurt
Copy link
Member

@dkurt dkurt commented Aug 21, 2020

It seems to me ready for merge. May I ask to squash all the commits into one?

@zihaomu zihaomu force-pushed the zihaomu:GSoC_digit_text_detect_and_recog branch from 5f7eef9 to 397ba2d Aug 21, 2020
@alalek alalek requested a review from dkurt Aug 21, 2020
@alalek alalek added this to the 4.5.0 milestone Aug 21, 2020
@dkurt
dkurt approved these changes Aug 21, 2020
Copy link
Member

@dkurt dkurt left a comment

👍 Thank you!

@dkurt dkurt self-assigned this Aug 21, 2020
@alalek alalek merged commit 3547ac4 into opencv:master Aug 22, 2020
1 check passed
1 check passed
@opencv-pushbot
default Required builds passed
Details
@zihaomu zihaomu changed the title [GSoC] Add digit and text recongnition samples. [GSoC] Add digit and text recognition samples. Sep 8, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Linked issues

Successfully merging this pull request may close these issues.

None yet

4 participants