Sign In to Follow Application
View All Documents & Correspondence

A Method And System For Identifying Type Of A Document

Abstract: Disclosed herein is a method and system for identifying type of an input document in real-time. In an embodiment, visual features and keywords of the input document are compared with reference visual features and reference keywords extracted from plurality of predetermined document types for computing a relative similarity score for the input document. Subsequently, one or more best-match document types are identified among the plurality of predetermined document types based on the relative similarity score of the input document. Thereafter, visual features and keywords of the input document are compared with global and local characteristics of the best-match document types for identifying the type of the input document. In an embodiment, the present disclosure helps in recognizing type of a document prior to digitizing the document, and thereby helps in storing the digitized documents in correct formats and appropriate storage directories. FIG. 1

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #
Filing Date
28 March 2018
Publication Number
40/2019
Publication Type
INA
Invention Field
COMPUTER SCIENCE
Status
Email
bangalore@knspartners.com
Parent Application
Patent Number
Legal Status
Grant Date
2023-09-16
Renewal Date

Applicants

WIPRO LIMITED
Doddakannelli, Sarjapur Road, Bangalore 560035,  Karnataka, India.

Inventors

1. GHULAM MOHIUDDIN  KHAN
F‐901,  Concorde  Manhattans,  Electronic  City  Phase 1, Bangalore – 560100, Karnataka, India
2. DR. GOPICHAND AGNIHOTRAM
A‐207,  S.K.  ASTER,  Doddathogur  Village,  Electronics  City,  (Near)  Narashimha  Swami  Temple,  Bangalore ‐560100, Karnataka, India

Specification

Claims:WE CLAIM:

1. A method for identifying type of a document in real-time, the method comprising:
extracting, by a document identification system (105), one or more visual features (102A) and one or more keywords (103A) from an input document (101);
comparing, by the document identification system (105), each of the one or more visual features (102A) and each of the one or more keywords (103A) with one or more reference visual features (102B) and with one or more reference keywords (103B) associated with a plurality of predetermined document types (109);
computing, by the document identification system (105), a relative similarity score (211) for the input document (101) based on the comparison;
identifying, by the document identification system (105), one or more best-match document types (111), among the plurality of predetermined document types (109), for the input document (101) based on the relative similarity score (211) of the input document (101); and
identifying, by the document identification system (105), the type (117) of the input document (101) by comparing the one or more visual features (102A) and the one or more keywords (103A) extracted from the input document (101) with one or more global characteristics (113) and one or more local characteristics (115) associated with each of the one or more best-match document types (111).

2. The method as claimed in claim 1, wherein the one or more visual features (102A) and the one or more keywords (103A) are extracted from the input document (101) using a predetermined character recognition technique configured in the document identification system (105).

3. The method as claimed in claim 1, wherein the one or more visual features (102A) comprises location and pattern of each of lines, keywords, text boxes, check boxes, box sequences, tables, labels and logos in the input document (101).

4. The method as claimed in claim 1, wherein computing the relative similarity score (211) for the input document (101) comprises:
assigning a visual similarity score for the input document (101) based on comparison of each of the one or more visual features (102A) extracted from the input document (101) with one or more reference visual features (102B) of each of the plurality of predetermined document types (109);
assigning a textual similarity score for the input document (101) based on comparison of each of the one or more keywords (103A) extracted from the input document (101) with one or more reference keywords (103B) associated with each of the plurality of predetermined document types (109); and
aggregating the visual similarity score and the textual similarity score for obtaining the relative similarity score (211) of the input document (101).

5. The method as claimed in claim 4, wherein the visual similarity score and the textual similarity score for the input document (101) are assigned using a pre-trained multi-class classifier configured in the document identification system (105).

6. The method as claimed in claim 5, wherein the pre-trained multi-class classifier is trained using one or more visual features (102A) and one or more keywords (103A) extracted from one or more documents filled with contents and one or more non-filled documents of each of the plurality of predetermined document types (109).

7. The method as claimed in claim 1, wherein the relative similarity score (211) of the input document (101) indicates relative similarity of the input document (101) with each of the plurality of predetermined document types (109).

8. The method as claimed in claim 1, wherein one or more of the plurality of predetermined document types (109) are identified as the one or more best-match document types (111) when the relative similarity score (211) of the input document (101) is higher than a threshold similarity score.

9. The method as claimed in claim 1, wherein the one or more global characteristics (113) indicate presence and count of each of lines, keywords, text boxes, check boxes, box sequences, tables, labels and logos in the one or more best-match document types (111).

10. The method as claimed in claim 1, wherein the one or more local characteristics (115) indicate location and pattern of each of one or more global characteristics (113) in the one or more best-match document types (111).

11. A document identification system (105) for identifying type of a document in real-time, the document identification system (105) comprising:
a processor (203); and
a memory (205), communicatively coupled to the processor (203), wherein the memory (205) stores processor-executable instructions, which on execution cause the processor (203) to:
extract one or more visual features (102A) and one or more keywords (103A) from an input document (101);
compare each of the one or more visual features (102A) and each of the one or more keywords (103A) with one or more reference visual features (102B) and with one or more reference keywords (103B) associated with a plurality of predetermined document types (109);
compute a relative similarity score (211) for the input document (101) based on the comparison;
identify one or more best-match document types (111), among the plurality of predetermined document types (109), for the input document (101) based on the relative similarity score (211) of the input document (101); and
identify the type (117) of the input document (101) based on comparison of the one or more visual features (102A) and the one or more keywords (103A) extracted from the input document (101) with one or more global characteristics (113) and one or more local characteristics (115) associated with each of the one or more best-match document types (111).

12. The document identification system (105) as claimed in claim 11, wherein the processor (203) extracts the one or more visual features (102A) and the one or more keywords (103A) from the input document (101) using a predetermined character recognition technique configured in the document identification system (105).

13. The document identification system (105) as claimed in claim 11, wherein the one or more visual features (102A) comprises location and pattern of each of lines, keywords, text boxes, check boxes, box sequences, tables, labels and logos in the input document (101).

14. The document identification system (105) as claimed in claim 11, wherein to compute the relative similarity score (211) for the input document (101), the processor (203) is configured to:
assign a visual similarity score for the input document (101) based on comparison of each of the one or more visual features (102A) extracted from the input document (101) with one or more reference visual features (102B) of each of the plurality of predetermined document types (109);
assign a textual similarity score for the input document (101) based on comparison of each of the one or more keywords (103A) extracted from the input document (101) with one or more reference keywords (103B) associated with each of the plurality of predetermined document types (109); and
aggregate the visual similarity score and the textual similarity score to obtain the relative similarity score (211) of the input document (101).

15. The document identification system (105) as claimed in claim 14, wherein the processor (203) assigns the visual similarity score and the textual similarity score for the input document (101) using a pre-trained multi-class classifier configured in the document identification system (105).

16. The document identification system (105) as claimed in claim 15, wherein the processor (203) trains the pre-trained multi-class classifier using one or more visual features (102A) and one or more keywords (103A) extracted from one or more documents filled with contents, and one or more non-filled documents of each of the plurality of predetermined document types (109).

17. The document identification system (105) as claimed in claim 11, wherein the relative similarity score (211) of the input document (101) indicates relative similarity of the input document (101) with each of the plurality of predetermined document types (109).

18. The document identification system (105) as claimed in claim 11, wherein the processor (203) identifies one or more of the plurality of predetermined document types (109) as the one or more best-match document types (111), when the relative similarity score (211) of the input document (101) is higher than a threshold similarity score.

19. The document identification system (105) as claimed in claim 11, wherein the one or more global characteristics (113) indicate presence and count of each of lines, keywords, text boxes, check boxes, box sequences, tables, labels and logos in the one or more best-match document types (111).

20. The document identification system (105) as claimed in claim 11, wherein the one or more local characteristics (115) indicate location and pattern of each of one or more global characteristics (113) in the one or more best-match document types (111).

Dated this 28th day of March 2018

SWETHA S. N
OF K&S PARTNERS
ATTORNEY FOR THE APPLICANT
, Description:TECHNICAL FIELD
The present subject matter is, in general, related to feature extraction and more particularly, but not exclusively, to a method and system for identifying type of a document in real-time.

Documents

Application Documents

# Name Date
1 201841011593-IntimationOfGrant16-09-2023.pdf 2023-09-16
1 201841011593-STATEMENT OF UNDERTAKING (FORM 3) [28-03-2018(online)].pdf 2018-03-28
2 201841011593-PatentCertificate16-09-2023.pdf 2023-09-16
2 201841011593-REQUEST FOR EXAMINATION (FORM-18) [28-03-2018(online)].pdf 2018-03-28
3 201841011593-POWER OF AUTHORITY [28-03-2018(online)].pdf 2018-03-28
3 201841011593-FORM 3 [06-06-2023(online)].pdf 2023-06-06
4 201841011593-FORM-26 [06-06-2023(online)].pdf 2023-06-06
4 201841011593-FORM 18 [28-03-2018(online)].pdf 2018-03-28
5 201841011593-Written submissions and relevant documents [06-06-2023(online)].pdf 2023-06-06
5 201841011593-FORM 1 [28-03-2018(online)].pdf 2018-03-28
6 201841011593-DRAWINGS [28-03-2018(online)].pdf 2018-03-28
6 201841011593-AMENDED DOCUMENTS [28-04-2023(online)].pdf 2023-04-28
7 201841011593-DECLARATION OF INVENTORSHIP (FORM 5) [28-03-2018(online)].pdf 2018-03-28
7 201841011593-Correspondence to notify the Controller [28-04-2023(online)].pdf 2023-04-28
8 201841011593-FORM 13 [28-04-2023(online)].pdf 2023-04-28
8 201841011593-COMPLETE SPECIFICATION [28-03-2018(online)].pdf 2018-03-28
9 201841011593-POA [28-04-2023(online)].pdf 2023-04-28
9 abstract 201841011593.jpg 2018-04-02
10 201841011593-REQUEST FOR CERTIFIED COPY [04-05-2018(online)].pdf 2018-05-04
10 201841011593-US(14)-HearingNotice-(HearingDate-22-05-2023).pdf 2023-04-21
11 201841011593-FER.pdf 2021-10-17
11 201841011593-Proof of Right (MANDATORY) [30-07-2018(online)].pdf 2018-07-30
12 201841011593-ABSTRACT [02-03-2021(online)].pdf 2021-03-02
12 Correspondence by Agent_Form 30_0108-2018.pdf 2018-08-29
13 201841011593-CLAIMS [02-03-2021(online)].pdf 2021-03-02
13 201841011593-PETITION UNDER RULE 137 [01-03-2021(online)].pdf 2021-03-01
14 201841011593-COMPLETE SPECIFICATION [02-03-2021(online)].pdf 2021-03-02
14 201841011593-FORM 3 [01-03-2021(online)].pdf 2021-03-01
15 201841011593-CORRESPONDENCE [02-03-2021(online)].pdf 2021-03-02
15 201841011593-OTHERS [02-03-2021(online)].pdf 2021-03-02
16 201841011593-DRAWING [02-03-2021(online)].pdf 2021-03-02
16 201841011593-FER_SER_REPLY [02-03-2021(online)].pdf 2021-03-02
17 201841011593-FER_SER_REPLY [02-03-2021(online)].pdf 2021-03-02
17 201841011593-DRAWING [02-03-2021(online)].pdf 2021-03-02
18 201841011593-CORRESPONDENCE [02-03-2021(online)].pdf 2021-03-02
18 201841011593-OTHERS [02-03-2021(online)].pdf 2021-03-02
19 201841011593-COMPLETE SPECIFICATION [02-03-2021(online)].pdf 2021-03-02
19 201841011593-FORM 3 [01-03-2021(online)].pdf 2021-03-01
20 201841011593-CLAIMS [02-03-2021(online)].pdf 2021-03-02
20 201841011593-PETITION UNDER RULE 137 [01-03-2021(online)].pdf 2021-03-01
21 201841011593-ABSTRACT [02-03-2021(online)].pdf 2021-03-02
21 Correspondence by Agent_Form 30_0108-2018.pdf 2018-08-29
22 201841011593-FER.pdf 2021-10-17
22 201841011593-Proof of Right (MANDATORY) [30-07-2018(online)].pdf 2018-07-30
23 201841011593-REQUEST FOR CERTIFIED COPY [04-05-2018(online)].pdf 2018-05-04
23 201841011593-US(14)-HearingNotice-(HearingDate-22-05-2023).pdf 2023-04-21
24 abstract 201841011593.jpg 2018-04-02
24 201841011593-POA [28-04-2023(online)].pdf 2023-04-28
25 201841011593-FORM 13 [28-04-2023(online)].pdf 2023-04-28
25 201841011593-COMPLETE SPECIFICATION [28-03-2018(online)].pdf 2018-03-28
26 201841011593-DECLARATION OF INVENTORSHIP (FORM 5) [28-03-2018(online)].pdf 2018-03-28
26 201841011593-Correspondence to notify the Controller [28-04-2023(online)].pdf 2023-04-28
27 201841011593-DRAWINGS [28-03-2018(online)].pdf 2018-03-28
27 201841011593-AMENDED DOCUMENTS [28-04-2023(online)].pdf 2023-04-28
28 201841011593-Written submissions and relevant documents [06-06-2023(online)].pdf 2023-06-06
28 201841011593-FORM 1 [28-03-2018(online)].pdf 2018-03-28
29 201841011593-FORM-26 [06-06-2023(online)].pdf 2023-06-06
29 201841011593-FORM 18 [28-03-2018(online)].pdf 2018-03-28
30 201841011593-POWER OF AUTHORITY [28-03-2018(online)].pdf 2018-03-28
30 201841011593-FORM 3 [06-06-2023(online)].pdf 2023-06-06
31 201841011593-PatentCertificate16-09-2023.pdf 2023-09-16
31 201841011593-REQUEST FOR EXAMINATION (FORM-18) [28-03-2018(online)].pdf 2018-03-28
32 201841011593-IntimationOfGrant16-09-2023.pdf 2023-09-16
32 201841011593-STATEMENT OF UNDERTAKING (FORM 3) [28-03-2018(online)].pdf 2018-03-28

Search Strategy

1 search201841011593E_07-10-2020.pdf

ERegister / Renewals

3rd: 14 Dec 2023

From 28/03/2020 - To 28/03/2021

4th: 14 Dec 2023

From 28/03/2021 - To 28/03/2022

5th: 14 Dec 2023

From 28/03/2022 - To 28/03/2023

6th: 14 Dec 2023

From 28/03/2023 - To 28/03/2024

7th: 19 Mar 2024

From 28/03/2024 - To 28/03/2025

8th: 28 Mar 2025

From 28/03/2025 - To 28/03/2026