Abstract: The present disclosure relates to a method and system for extracting information from templateless formats. The method comprises: (1) receiving, by a processing unit [102], a data message; (2) identifying a set of specific patterns of entity values; (3) substituting the identified specific patterns with a corresponding key value to generate processed data message; (4) storing each key value in memory unit [108]; (5) generating, by a fine-tuned tokenizer [104], a set of data message tokens, organisation data tokens and a domain data, from the processed data message; (6) receiving, by a pre-trained sub-system [106], the generated set of data message tokens, the organisation data tokens, and the domain data from the fine-tuned tokenizer; (7) extracting an information comprising a set of entity values and meta data associated with the data message; (8) associating each of the identified entity value with a corresponding key value stored in memory unit.
We Claim:
1. A method for extracting information from templateless formats, the method comprising:
- receiving, by a processing unit [102], a data message;
- identifying, by the processing unit [102], a set of one or more specific patterns of entity values;
- substituting, by the processing unit [102], each entity value of the identified specific patterns of entity values with a corresponding key value to generate a processed data message;
- storing, by the processing unit [102], the each entity value in a memory unit [108];
- generating, by a fine-tuned tokenizer [104], a set of one or more data message tokens, organisation data tokens and a domain data, from the processed data message;
- receiving, by a pre-trained sub-system [106], the generated set of one or more data message tokens, the organisation data tokens, and the domain data from the fine-tuned tokenizer [104];
- extracting, by the pre-trained sub-system [104], an information comprising a set of one or more entity values and a meta data associated with the data message; and
- associating, by the pre-trained sub-system [104], each of the identified entity values with a corresponding key value stored in memory unit [108].
2. The method as claimed in claim 1, the method further comprising storing, by the pre-trained sub-system [106], each of the one or more entity values and the meta data in the memory unit [108].
3. The method as claimed in claim 1, wherein the pre-trained sub-system [106] is a layered system.
4. The method as claimed in claim 3, wherein the pre-training of the subsystem [106] further comprises:
- converting, by one or more embedding layers, the organisation data tokens into an organisation data embeddings and the data message tokens into a data message embeddings; and
- feeding the organisation data embeddings and the data message embeddings to one or more bi-directional Long Short-Term Memory (LSTM) layers.
5. The method as claimed in claim 3, wherein the extracting, by the pre-trained sub-system [104], an information comprising a set of one or more entity values and a meta data associated with the data message, further comprises:
- feeding tokens, from the one or more bi-directional Long Short-Term Memory (LSTM) layers, to a fully connected layer and a fully connected and adaptive Max-pooling Layers; and
- predicting, by a Conditional Random Field (CRF) layer, an entity sequence and a set of meta data labels.
6. The method as claimed in claim 5, wherein the predicting, by a Conditional Random Field (CRF) layer, an entity sequence and a set of meta data labels, is based on a transition matrix and a set of Negative Log Likelihood Loss (NLLL) weights.
7. A system for extracting information from templateless formats, the system comprising:
- a processing unit [102] configured to:
o receive a data message;
o identify a set of one or more specific patterns of entity values;
o substitute each entity value of the identified specific patterns of entity values with a corresponding key value to generate a processed data message; and
o store the each entity value in a memory unit [108] coupled with the processing unit [102];
- a fine-tuned tokenizer [104] connected to the processing unit [102] and the memory unit [108], the fine-tuned tokenizer [104] configured to:
o generate a set of one or more data message tokens, organisation data tokens and a domain data, from the processed data message; and
- a pre-trained sub-system [106] connected to the processing unit [102], the memory unit [108], and the fine-tuned tokenizer [104], the pretrained sub-system [106] configured to:
o receive the generated set of one or more data message tokens, the organisation data tokens, and the domain data from the fine-tuned tokenizer [104];
o extract an information comprising a set of one or more entity values and a meta data associated with the data message; and
o associate each of the identified entity value with a corresponding key value stored in the memory unit [108].
8. The system as claimed in claim 7, wherein the pre-trained sub-system [106] is further configured to store each of the one or more entity values and the meta data in the memory unit [108].
9. The system as claimed in claim 7, wherein the pre-trained sub-system [106] is a layered system.
10. The system as claimed in claim 9, wherein the sub-system [106] for the pretraining, is further configured to:
- convert, using one or more embedding layers of the sub-system [106], the organisation data tokens into an organisation data embeddings and the data message tokens into a data message embeddings; and
- feed the organisation data embeddings and the data message embeddings to one or more bi-directional Long Short-Term Memory (LSTM) layers.
11. The system as claimed in claim 9, wherein the pre-trained sub-system [106], for extracting an information comprising a set of one or more entity values
and the meta data associated with the data message, is further configured to:
- feed tokens, from the one or more bi-directional Long Short-Term Memory (LSTM) layers, to a fully connected layer and a fully connected and adaptive Max-pooling Layers; and
- predict, by a Conditional Random Field (CRF) layer, an entity sequence and a set of meta data labels.
12. The system as claimed in claim 11, wherein the pre-trained sub-system [106] predicts, by a Conditional Random Field (CRF) layer, the entity sequence and a set of meta data labels, based on a transition matrix and a set of Negative Log Likelihood Loss (NLLL) weights.
| # | Name | Date |
|---|---|---|
| 1 | 202241063464-STATEMENT OF UNDERTAKING (FORM 3) [07-11-2022(online)].pdf | 2022-11-07 |
| 2 | 202241063464-REQUEST FOR EXAMINATION (FORM-18) [07-11-2022(online)].pdf | 2022-11-07 |
| 3 | 202241063464-REQUEST FOR EARLY PUBLICATION(FORM-9) [07-11-2022(online)].pdf | 2022-11-07 |
| 4 | 202241063464-PROOF OF RIGHT [07-11-2022(online)].pdf | 2022-11-07 |
| 5 | 202241063464-POWER OF AUTHORITY [07-11-2022(online)].pdf | 2022-11-07 |
| 6 | 202241063464-FORM-9 [07-11-2022(online)].pdf | 2022-11-07 |
| 7 | 202241063464-FORM 18 [07-11-2022(online)].pdf | 2022-11-07 |
| 8 | 202241063464-FORM 1 [07-11-2022(online)].pdf | 2022-11-07 |
| 9 | 202241063464-FIGURE OF ABSTRACT [07-11-2022(online)].pdf | 2022-11-07 |
| 10 | 202241063464-DRAWINGS [07-11-2022(online)].pdf | 2022-11-07 |
| 11 | 202241063464-DECLARATION OF INVENTORSHIP (FORM 5) [07-11-2022(online)].pdf | 2022-11-07 |
| 12 | 202241063464-COMPLETE SPECIFICATION [07-11-2022(online)].pdf | 2022-11-07 |
| 13 | 202241063464-Request Letter-Correspondence [09-11-2022(online)].pdf | 2022-11-09 |
| 14 | 202241063464-Power of Attorney [09-11-2022(online)].pdf | 2022-11-09 |
| 15 | 202241063464-Form 1 (Submitted on date of filing) [09-11-2022(online)].pdf | 2022-11-09 |
| 16 | 202241063464-Covering Letter [09-11-2022(online)].pdf | 2022-11-09 |
| 17 | 202241063464-Correspondence_Form-1 And POA_21-11-2022.pdf | 2022-11-21 |
| 18 | 202241063464-FER.pdf | 2023-03-10 |
| 19 | 202241063464-FER_SER_REPLY [08-09-2023(online)].pdf | 2023-09-08 |
| 20 | 202241063464-PatentCertificate01-10-2024.pdf | 2024-10-01 |
| 21 | 202241063464-IntimationOfGrant01-10-2024.pdf | 2024-10-01 |
| 1 | SearchHistoryE_10-03-2023.pdf |