Abstract: Disclosed herein is a method and system for identifying type of an input document in real-time. In an embodiment, visual features and keywords of the input document are compared with reference visual features and reference keywords extracted from plurality of predetermined document types for computing a relative similarity score for the input document. Subsequently, one or more best-match document types are identified among the plurality of predetermined document types based on the relative similarity score of the input document. Thereafter, visual features and keywords of the input document are compared with global and local characteristics of the best-match document types for identifying the type of the input document. In an embodiment, the present disclosure helps in recognizing type of a document prior to digitizing the document, and thereby helps in storing the digitized documents in correct formats and appropriate storage directories. FIG. 1
Claims:WE CLAIM:
1. A method for identifying type of a document in real-time, the method comprising:
extracting, by a document identification system (105), one or more visual features (102A) and one or more keywords (103A) from an input document (101);
comparing, by the document identification system (105), each of the one or more visual features (102A) and each of the one or more keywords (103A) with one or more reference visual features (102B) and with one or more reference keywords (103B) associated with a plurality of predetermined document types (109);
computing, by the document identification system (105), a relative similarity score (211) for the input document (101) based on the comparison;
identifying, by the document identification system (105), one or more best-match document types (111), among the plurality of predetermined document types (109), for the input document (101) based on the relative similarity score (211) of the input document (101); and
identifying, by the document identification system (105), the type (117) of the input document (101) by comparing the one or more visual features (102A) and the one or more keywords (103A) extracted from the input document (101) with one or more global characteristics (113) and one or more local characteristics (115) associated with each of the one or more best-match document types (111).
2. The method as claimed in claim 1, wherein the one or more visual features (102A) and the one or more keywords (103A) are extracted from the input document (101) using a predetermined character recognition technique configured in the document identification system (105).
3. The method as claimed in claim 1, wherein the one or more visual features (102A) comprises location and pattern of each of lines, keywords, text boxes, check boxes, box sequences, tables, labels and logos in the input document (101).
4. The method as claimed in claim 1, wherein computing the relative similarity score (211) for the input document (101) comprises:
assigning a visual similarity score for the input document (101) based on comparison of each of the one or more visual features (102A) extracted from the input document (101) with one or more reference visual features (102B) of each of the plurality of predetermined document types (109);
assigning a textual similarity score for the input document (101) based on comparison of each of the one or more keywords (103A) extracted from the input document (101) with one or more reference keywords (103B) associated with each of the plurality of predetermined document types (109); and
aggregating the visual similarity score and the textual similarity score for obtaining the relative similarity score (211) of the input document (101).
5. The method as claimed in claim 4, wherein the visual similarity score and the textual similarity score for the input document (101) are assigned using a pre-trained multi-class classifier configured in the document identification system (105).
6. The method as claimed in claim 5, wherein the pre-trained multi-class classifier is trained using one or more visual features (102A) and one or more keywords (103A) extracted from one or more documents filled with contents and one or more non-filled documents of each of the plurality of predetermined document types (109).
7. The method as claimed in claim 1, wherein the relative similarity score (211) of the input document (101) indicates relative similarity of the input document (101) with each of the plurality of predetermined document types (109).
8. The method as claimed in claim 1, wherein one or more of the plurality of predetermined document types (109) are identified as the one or more best-match document types (111) when the relative similarity score (211) of the input document (101) is higher than a threshold similarity score.
9. The method as claimed in claim 1, wherein the one or more global characteristics (113) indicate presence and count of each of lines, keywords, text boxes, check boxes, box sequences, tables, labels and logos in the one or more best-match document types (111).
10. The method as claimed in claim 1, wherein the one or more local characteristics (115) indicate location and pattern of each of one or more global characteristics (113) in the one or more best-match document types (111).
11. A document identification system (105) for identifying type of a document in real-time, the document identification system (105) comprising:
a processor (203); and
a memory (205), communicatively coupled to the processor (203), wherein the memory (205) stores processor-executable instructions, which on execution cause the processor (203) to:
extract one or more visual features (102A) and one or more keywords (103A) from an input document (101);
compare each of the one or more visual features (102A) and each of the one or more keywords (103A) with one or more reference visual features (102B) and with one or more reference keywords (103B) associated with a plurality of predetermined document types (109);
compute a relative similarity score (211) for the input document (101) based on the comparison;
identify one or more best-match document types (111), among the plurality of predetermined document types (109), for the input document (101) based on the relative similarity score (211) of the input document (101); and
identify the type (117) of the input document (101) based on comparison of the one or more visual features (102A) and the one or more keywords (103A) extracted from the input document (101) with one or more global characteristics (113) and one or more local characteristics (115) associated with each of the one or more best-match document types (111).
12. The document identification system (105) as claimed in claim 11, wherein the processor (203) extracts the one or more visual features (102A) and the one or more keywords (103A) from the input document (101) using a predetermined character recognition technique configured in the document identification system (105).
13. The document identification system (105) as claimed in claim 11, wherein the one or more visual features (102A) comprises location and pattern of each of lines, keywords, text boxes, check boxes, box sequences, tables, labels and logos in the input document (101).
14. The document identification system (105) as claimed in claim 11, wherein to compute the relative similarity score (211) for the input document (101), the processor (203) is configured to:
assign a visual similarity score for the input document (101) based on comparison of each of the one or more visual features (102A) extracted from the input document (101) with one or more reference visual features (102B) of each of the plurality of predetermined document types (109);
assign a textual similarity score for the input document (101) based on comparison of each of the one or more keywords (103A) extracted from the input document (101) with one or more reference keywords (103B) associated with each of the plurality of predetermined document types (109); and
aggregate the visual similarity score and the textual similarity score to obtain the relative similarity score (211) of the input document (101).
15. The document identification system (105) as claimed in claim 14, wherein the processor (203) assigns the visual similarity score and the textual similarity score for the input document (101) using a pre-trained multi-class classifier configured in the document identification system (105).
16. The document identification system (105) as claimed in claim 15, wherein the processor (203) trains the pre-trained multi-class classifier using one or more visual features (102A) and one or more keywords (103A) extracted from one or more documents filled with contents, and one or more non-filled documents of each of the plurality of predetermined document types (109).
17. The document identification system (105) as claimed in claim 11, wherein the relative similarity score (211) of the input document (101) indicates relative similarity of the input document (101) with each of the plurality of predetermined document types (109).
18. The document identification system (105) as claimed in claim 11, wherein the processor (203) identifies one or more of the plurality of predetermined document types (109) as the one or more best-match document types (111), when the relative similarity score (211) of the input document (101) is higher than a threshold similarity score.
19. The document identification system (105) as claimed in claim 11, wherein the one or more global characteristics (113) indicate presence and count of each of lines, keywords, text boxes, check boxes, box sequences, tables, labels and logos in the one or more best-match document types (111).
20. The document identification system (105) as claimed in claim 11, wherein the one or more local characteristics (115) indicate location and pattern of each of one or more global characteristics (113) in the one or more best-match document types (111).
Dated this 28th day of March 2018
SWETHA S. N
OF K&S PARTNERS
ATTORNEY FOR THE APPLICANT
, Description:TECHNICAL FIELD
The present subject matter is, in general, related to feature extraction and more particularly, but not exclusively, to a method and system for identifying type of a document in real-time.
| # | Name | Date |
|---|---|---|
| 1 | 201841011593-IntimationOfGrant16-09-2023.pdf | 2023-09-16 |
| 1 | 201841011593-STATEMENT OF UNDERTAKING (FORM 3) [28-03-2018(online)].pdf | 2018-03-28 |
| 2 | 201841011593-PatentCertificate16-09-2023.pdf | 2023-09-16 |
| 2 | 201841011593-REQUEST FOR EXAMINATION (FORM-18) [28-03-2018(online)].pdf | 2018-03-28 |
| 3 | 201841011593-POWER OF AUTHORITY [28-03-2018(online)].pdf | 2018-03-28 |
| 3 | 201841011593-FORM 3 [06-06-2023(online)].pdf | 2023-06-06 |
| 4 | 201841011593-FORM-26 [06-06-2023(online)].pdf | 2023-06-06 |
| 4 | 201841011593-FORM 18 [28-03-2018(online)].pdf | 2018-03-28 |
| 5 | 201841011593-Written submissions and relevant documents [06-06-2023(online)].pdf | 2023-06-06 |
| 5 | 201841011593-FORM 1 [28-03-2018(online)].pdf | 2018-03-28 |
| 6 | 201841011593-DRAWINGS [28-03-2018(online)].pdf | 2018-03-28 |
| 6 | 201841011593-AMENDED DOCUMENTS [28-04-2023(online)].pdf | 2023-04-28 |
| 7 | 201841011593-DECLARATION OF INVENTORSHIP (FORM 5) [28-03-2018(online)].pdf | 2018-03-28 |
| 7 | 201841011593-Correspondence to notify the Controller [28-04-2023(online)].pdf | 2023-04-28 |
| 8 | 201841011593-FORM 13 [28-04-2023(online)].pdf | 2023-04-28 |
| 8 | 201841011593-COMPLETE SPECIFICATION [28-03-2018(online)].pdf | 2018-03-28 |
| 9 | 201841011593-POA [28-04-2023(online)].pdf | 2023-04-28 |
| 9 | abstract 201841011593.jpg | 2018-04-02 |
| 10 | 201841011593-REQUEST FOR CERTIFIED COPY [04-05-2018(online)].pdf | 2018-05-04 |
| 10 | 201841011593-US(14)-HearingNotice-(HearingDate-22-05-2023).pdf | 2023-04-21 |
| 11 | 201841011593-FER.pdf | 2021-10-17 |
| 11 | 201841011593-Proof of Right (MANDATORY) [30-07-2018(online)].pdf | 2018-07-30 |
| 12 | 201841011593-ABSTRACT [02-03-2021(online)].pdf | 2021-03-02 |
| 12 | Correspondence by Agent_Form 30_0108-2018.pdf | 2018-08-29 |
| 13 | 201841011593-CLAIMS [02-03-2021(online)].pdf | 2021-03-02 |
| 13 | 201841011593-PETITION UNDER RULE 137 [01-03-2021(online)].pdf | 2021-03-01 |
| 14 | 201841011593-COMPLETE SPECIFICATION [02-03-2021(online)].pdf | 2021-03-02 |
| 14 | 201841011593-FORM 3 [01-03-2021(online)].pdf | 2021-03-01 |
| 15 | 201841011593-CORRESPONDENCE [02-03-2021(online)].pdf | 2021-03-02 |
| 15 | 201841011593-OTHERS [02-03-2021(online)].pdf | 2021-03-02 |
| 16 | 201841011593-DRAWING [02-03-2021(online)].pdf | 2021-03-02 |
| 16 | 201841011593-FER_SER_REPLY [02-03-2021(online)].pdf | 2021-03-02 |
| 17 | 201841011593-FER_SER_REPLY [02-03-2021(online)].pdf | 2021-03-02 |
| 17 | 201841011593-DRAWING [02-03-2021(online)].pdf | 2021-03-02 |
| 18 | 201841011593-CORRESPONDENCE [02-03-2021(online)].pdf | 2021-03-02 |
| 18 | 201841011593-OTHERS [02-03-2021(online)].pdf | 2021-03-02 |
| 19 | 201841011593-COMPLETE SPECIFICATION [02-03-2021(online)].pdf | 2021-03-02 |
| 19 | 201841011593-FORM 3 [01-03-2021(online)].pdf | 2021-03-01 |
| 20 | 201841011593-CLAIMS [02-03-2021(online)].pdf | 2021-03-02 |
| 20 | 201841011593-PETITION UNDER RULE 137 [01-03-2021(online)].pdf | 2021-03-01 |
| 21 | 201841011593-ABSTRACT [02-03-2021(online)].pdf | 2021-03-02 |
| 21 | Correspondence by Agent_Form 30_0108-2018.pdf | 2018-08-29 |
| 22 | 201841011593-FER.pdf | 2021-10-17 |
| 22 | 201841011593-Proof of Right (MANDATORY) [30-07-2018(online)].pdf | 2018-07-30 |
| 23 | 201841011593-REQUEST FOR CERTIFIED COPY [04-05-2018(online)].pdf | 2018-05-04 |
| 23 | 201841011593-US(14)-HearingNotice-(HearingDate-22-05-2023).pdf | 2023-04-21 |
| 24 | abstract 201841011593.jpg | 2018-04-02 |
| 24 | 201841011593-POA [28-04-2023(online)].pdf | 2023-04-28 |
| 25 | 201841011593-FORM 13 [28-04-2023(online)].pdf | 2023-04-28 |
| 25 | 201841011593-COMPLETE SPECIFICATION [28-03-2018(online)].pdf | 2018-03-28 |
| 26 | 201841011593-DECLARATION OF INVENTORSHIP (FORM 5) [28-03-2018(online)].pdf | 2018-03-28 |
| 26 | 201841011593-Correspondence to notify the Controller [28-04-2023(online)].pdf | 2023-04-28 |
| 27 | 201841011593-DRAWINGS [28-03-2018(online)].pdf | 2018-03-28 |
| 27 | 201841011593-AMENDED DOCUMENTS [28-04-2023(online)].pdf | 2023-04-28 |
| 28 | 201841011593-Written submissions and relevant documents [06-06-2023(online)].pdf | 2023-06-06 |
| 28 | 201841011593-FORM 1 [28-03-2018(online)].pdf | 2018-03-28 |
| 29 | 201841011593-FORM-26 [06-06-2023(online)].pdf | 2023-06-06 |
| 29 | 201841011593-FORM 18 [28-03-2018(online)].pdf | 2018-03-28 |
| 30 | 201841011593-POWER OF AUTHORITY [28-03-2018(online)].pdf | 2018-03-28 |
| 30 | 201841011593-FORM 3 [06-06-2023(online)].pdf | 2023-06-06 |
| 31 | 201841011593-PatentCertificate16-09-2023.pdf | 2023-09-16 |
| 31 | 201841011593-REQUEST FOR EXAMINATION (FORM-18) [28-03-2018(online)].pdf | 2018-03-28 |
| 32 | 201841011593-IntimationOfGrant16-09-2023.pdf | 2023-09-16 |
| 32 | 201841011593-STATEMENT OF UNDERTAKING (FORM 3) [28-03-2018(online)].pdf | 2018-03-28 |
| 1 | search201841011593E_07-10-2020.pdf |