Abstract: Systems and methods for generating a report are described. The system receives documents along with target intent. The system further analyzes first metadata of placeholders of a template in order to extract the relevant content, to be placed in each placeholder, from the one or more documents. The system further identifies plurality of candidate contents in the documents by using OCR technique. Further, the system assigns score to each candidate content upon analyzing each candidate content based on their meaning and pattern. The system further extracts one or more contents, from the candidate contents, as the relevant content based on the score along with second metadata. The second metadata indicates a type of the extracted contents. The system further generates the report by populating the extracted content in each placeholder based on correlation between the first metadata and the second metadata. FIG. 1
Claims:We Claim:
1. A method for generating a report (116, 306) characterized by extracting relevant content in an unstructured format and populating the relevant content in a template (212) of the report (116, 306), the method comprising:
receiving, by a report generating system (102), one or more documents (108) along with a target intent (106) associated with the report (116, 306) to be generated, wherein the one or more documents (108) are received in the unstructured format;
analyzing, by the report generating system (102), first metadata (214) of one or more placeholders of the template (212) in order to extract the relevant content, to be placed in each placeholder, from the one or more documents (108), wherein the first metadata (214) indicates a type of a content to be inserted into each placeholder;
identifying, by the report generating system (102), a plurality of candidate contents (110) present in each document by using an Optical Character Recognition (OCR) technique;
assigning, by the report generating system (102), a score (112) to each candidate content (110) based on meaning and pattern associated with each candidate content (110), wherein the score (112) is assigned based on a model (240) trained using documents relevant to the target intent (106) by using at least one machine learning technique, and wherein the score (112) indicates contextual relevance of a candidate content (110) with the target intent (106);
extracting, by the report generating system (102), one or more contents (114), from the plurality of candidate contents (110), as the relevant content based on the score (112) and storing each relevant content, in a structured format, along with a second metadata (216) in a database associated with the report generating system (102), wherein the second metadata (216) indicates a type of the extracted contents (114); and
generating, by the report generating system (102), the report (116, 306) by populating the extracted content (114) in each placeholder based on correlation between the first metadata (214) and the second metadata (216) thereby extracting the relevant content in the unstructured format and populating the relevant content in the template (212) of the report (116, 306).
2. The method as claimed in claim 1, further comprising selecting the one or more documents (108) from a plurality of documents (104) by:
determining a plurality of categories associated with the plurality of documents (104);
analyzing the plurality of documents (104) based on the corresponding plurality of categories in relation to the target intent (106); and
selecting the one or more documents (108) from the plurality of documents (104) based on the model (240) trained using the at least one machine learning technique.
3. The method as claimed in claim 1, wherein the first metadata (214) and the second metadata (216) comprises at least one of name, college, university, discipline, percentage, and CGPA.
4. The method as claimed in claim 1, wherein the score (112) is assigned by:
splitting each candidate content (110), of the plurality of candidate contents (110), into at least one of noun, verb, adjective, pronoun, preposition and conjunction;
determining at least one of the meaning and the pattern of the at least one of noun, verb, adjective, pronoun, preposition and conjunction of each candidate content (110) using a Natural Language Processing (NLP) technique, wherein the pattern comprises at least one of font-size, font-family, color and font-style; and
assigning the score (112) to each of the plurality of candidate contents (110) based on at least one of the meaning and the pattern determined for the at least one of the noun, verb, adjective, pronoun, preposition and conjunction of each of the plurality of candidate contents (110), wherein the model (240) is updated based on the assigning of the score (112).
5. The method as claimed in claim 1, wherein the plurality of candidate contents (110) is identified upon bounding a box over each candidate content (110).
6. A method for generating a report (116, 306), the method comprising:
receiving one or more documents (108) along with a target intent (106) associated with the report (116, 306) to be generated, wherein the one or more documents (108) are received in the unstructured format;
analyzing first metadata (214) of one or more placeholders of a template (212) of the report (116, 306), wherein the first metadata (214) indicates a type of a content to be inserted into each placeholder;
identifying a plurality of candidate contents (110) from each of the one or more documents (108) using an Optical Character Recognition (OCR) technique;
assigning a score (112) to each of the plurality of candidate contents (110) of each of the one or more documents (108) based on a model (240), wherein the score (112) indicates contextual relevance of each of the plurality of candidate contents (110) in context with the target intent (106);
extracting one or more contents (114), from the plurality of candidate contents (110) based on the corresponding scores (112) of the one or more contents (114), along with a second metadata (216), wherein the one or more contents (114) are stored in a database in a structured format, and wherein the second metadata (216) indicates a type of the extracted one or more contents; and
generating the report by populating the extracted one or more contents (114) at one or more respective placeholders provided in a template (212) of the report (116, 306) based on correlation of the first (214) and the second metadata (216).
7. A report generating system (102) for generating a report (116, 306) characterized by extracting relevant content in an unstructured format and thereby populating the relevant content in a template (212) of the report (116, 306), the report generating system (102) comprises:
a receiving unit (220) to receive one or more documents (108) along with a target intent (106) associated with the report (116, 306) to be generated, wherein the one or more documents (108) are received in the unstructured format;
an analyzing unit (224) to analyze first metadata (214) of one or more placeholders of the template (212) in order to extract the relevant content, to be placed in each placeholder, from the one or more documents (108), wherein the first metadata (214) indicates a type of a content to be inserted into each placeholder;
an identifying unit (228) to identify a plurality of candidate contents (110) present in each document by using an Optical Character Recognition (OCR) technique;
an assigning unit (232) to assign a score (112) to each candidate content (110) upon analyzing each candidate content (110) based on meaning and pattern associated with each candidate content (110), wherein the score (112) is assigned based on a model (240) trained using documents relevant to the target intent (106) by using at least one machine learning technique, and, wherein the score (112) indicates contextual relevance of a candidate content (110) with the target intent (106);
an extracting unit (234) to extract one or more contents (114), from the plurality of candidate contents (110), as the relevant content based on the score (112) thereby storing each relevant content, in a structured format, along with a second metadata (216) in a database associated with the report generating system (102), wherein the second metadata (216) indicates a type of the extracted content (114); and
a generating unit (236) to generate the report (116, 306) by populating the extracted content (114) in each placeholder based on correlation between the first metadata (214) and the second metadata (216) thereby extracting the relevant content in the unstructured format and populating the relevant content in the provided in a template (212) of the report (116, 306).
8. The report generating system (102) as claimed in claim 7 is configured to select the one or more documents (108) from a plurality of documents (104) by:
determining a plurality of categories associated with the plurality of documents (104);
analyzing the plurality of documents (104) based on the corresponding plurality of categories in relation to the target intent (106); and
selecting the one or more documents (108) from the plurality of documents (104) based on the model (240) trained using the at least one machine learning technique.
9. The report generating system (102) as claimed in claim 7, wherein the first metadata (214) and the second metadata (216) comprises at least one of name, college, university, discipline, percentage, and CGPA.
10. The report generating system (102) as claimed in claim 7 is configured to assign the score (112) by:
splitting each candidate content (110), of the plurality of candidate contents (110), into at least one of noun, verb, adjective, pronoun, preposition and conjunction;
determining at least one of the meaning and the pattern of the at least one of noun, verb, adjective, pronoun, preposition and conjunction of each candidate content (110) using a Natural Language Processing (NLP) technique, wherein the pattern comprises at least one of font-size, font-family, color and font-style; and
assigning the score (112) to each of the plurality of candidate contents (110) based on at least one of the meaning and the pattern determined for the at least one of the noun, verb, adjective, pronoun, preposition and conjunction of each of the plurality of candidate contents (110), wherein the model (240) is updated based on the assigning of the score (112).
Dated this 26th day of July, 2019
R. RAMYA RAO
IN/PA-1607
OF K & S PARTNERS
ATTORNEY FOR THE APPLICANT(S)
, Description:TECHNICAL FIELD
The present disclosure relates in general to document management. More particularly, but not exclusively, the present disclosure discloses a method and system to convert unstructured data into structured data and thereby generating a report.
| # | Name | Date |
|---|---|---|
| 1 | 201921030389-IntimationOfGrant10-02-2025.pdf | 2025-02-10 |
| 1 | 201921030389-STATEMENT OF UNDERTAKING (FORM 3) [26-07-2019(online)].pdf | 2019-07-26 |
| 1 | 201921030389-Written submissions and relevant documents [30-12-2024(online)].pdf | 2024-12-30 |
| 2 | 201921030389-POWER OF AUTHORITY [26-07-2019(online)].pdf | 2019-07-26 |
| 2 | 201921030389-PatentCertificate10-02-2025.pdf | 2025-02-10 |
| 2 | 201921030389-Correspondence to notify the Controller [12-12-2024(online)].pdf | 2024-12-12 |
| 3 | 201921030389-FORM 1 [26-07-2019(online)].pdf | 2019-07-26 |
| 3 | 201921030389-FORM-26 [12-12-2024(online)].pdf | 2024-12-12 |
| 3 | 201921030389-ORIGINAL UR 6(1A) FORM 26-301224.pdf | 2025-01-14 |
| 4 | 201921030389-DRAWINGS [26-07-2019(online)].pdf | 2019-07-26 |
| 4 | 201921030389-US(14)-HearingNotice-(HearingDate-17-12-2024).pdf | 2024-11-29 |
| 4 | 201921030389-Written submissions and relevant documents [30-12-2024(online)].pdf | 2024-12-30 |
| 5 | 201921030389-DECLARATION OF INVENTORSHIP (FORM 5) [26-07-2019(online)].pdf | 2019-07-26 |
| 5 | 201921030389-Correspondence to notify the Controller [12-12-2024(online)].pdf | 2024-12-12 |
| 5 | 201921030389-CLAIMS [27-10-2021(online)].pdf | 2021-10-27 |
| 6 | 201921030389-FORM-26 [12-12-2024(online)].pdf | 2024-12-12 |
| 6 | 201921030389-COMPLETE SPECIFICATION [27-10-2021(online)].pdf | 2021-10-27 |
| 6 | 201921030389-COMPLETE SPECIFICATION [26-07-2019(online)].pdf | 2019-07-26 |
| 7 | 201921030389-US(14)-HearingNotice-(HearingDate-17-12-2024).pdf | 2024-11-29 |
| 7 | 201921030389-FORM 18 [29-07-2019(online)].pdf | 2019-07-29 |
| 7 | 201921030389-CORRESPONDENCE [27-10-2021(online)].pdf | 2021-10-27 |
| 8 | 201921030389-CLAIMS [27-10-2021(online)].pdf | 2021-10-27 |
| 8 | 201921030389-DRAWING [27-10-2021(online)].pdf | 2021-10-27 |
| 8 | 201921030389-Proof of Right (MANDATORY) [30-07-2019(online)].pdf | 2019-07-30 |
| 9 | 201921030389-COMPLETE SPECIFICATION [27-10-2021(online)].pdf | 2021-10-27 |
| 9 | 201921030389-FER_SER_REPLY [27-10-2021(online)].pdf | 2021-10-27 |
| 9 | Abstract1.jpg | 2019-10-25 |
| 10 | 201921030389-CORRESPONDENCE [27-10-2021(online)].pdf | 2021-10-27 |
| 10 | 201921030389-ORIGINAL UR 6(1A) FORM 1-310719.pdf | 2019-12-02 |
| 10 | 201921030389-OTHERS [27-10-2021(online)].pdf | 2021-10-27 |
| 11 | 201921030389-DRAWING [27-10-2021(online)].pdf | 2021-10-27 |
| 11 | 201921030389-FER.pdf | 2021-10-19 |
| 12 | 201921030389-FER_SER_REPLY [27-10-2021(online)].pdf | 2021-10-27 |
| 12 | 201921030389-ORIGINAL UR 6(1A) FORM 1-310719.pdf | 2019-12-02 |
| 12 | 201921030389-OTHERS [27-10-2021(online)].pdf | 2021-10-27 |
| 13 | 201921030389-FER_SER_REPLY [27-10-2021(online)].pdf | 2021-10-27 |
| 13 | 201921030389-OTHERS [27-10-2021(online)].pdf | 2021-10-27 |
| 13 | Abstract1.jpg | 2019-10-25 |
| 14 | 201921030389-Proof of Right (MANDATORY) [30-07-2019(online)].pdf | 2019-07-30 |
| 14 | 201921030389-FER.pdf | 2021-10-19 |
| 14 | 201921030389-DRAWING [27-10-2021(online)].pdf | 2021-10-27 |
| 15 | 201921030389-CORRESPONDENCE [27-10-2021(online)].pdf | 2021-10-27 |
| 15 | 201921030389-FORM 18 [29-07-2019(online)].pdf | 2019-07-29 |
| 15 | 201921030389-ORIGINAL UR 6(1A) FORM 1-310719.pdf | 2019-12-02 |
| 16 | 201921030389-COMPLETE SPECIFICATION [26-07-2019(online)].pdf | 2019-07-26 |
| 16 | 201921030389-COMPLETE SPECIFICATION [27-10-2021(online)].pdf | 2021-10-27 |
| 16 | Abstract1.jpg | 2019-10-25 |
| 17 | 201921030389-CLAIMS [27-10-2021(online)].pdf | 2021-10-27 |
| 17 | 201921030389-DECLARATION OF INVENTORSHIP (FORM 5) [26-07-2019(online)].pdf | 2019-07-26 |
| 17 | 201921030389-Proof of Right (MANDATORY) [30-07-2019(online)].pdf | 2019-07-30 |
| 18 | 201921030389-DRAWINGS [26-07-2019(online)].pdf | 2019-07-26 |
| 18 | 201921030389-US(14)-HearingNotice-(HearingDate-17-12-2024).pdf | 2024-11-29 |
| 18 | 201921030389-FORM 18 [29-07-2019(online)].pdf | 2019-07-29 |
| 19 | 201921030389-FORM 1 [26-07-2019(online)].pdf | 2019-07-26 |
| 19 | 201921030389-FORM-26 [12-12-2024(online)].pdf | 2024-12-12 |
| 19 | 201921030389-COMPLETE SPECIFICATION [26-07-2019(online)].pdf | 2019-07-26 |
| 20 | 201921030389-POWER OF AUTHORITY [26-07-2019(online)].pdf | 2019-07-26 |
| 20 | 201921030389-DECLARATION OF INVENTORSHIP (FORM 5) [26-07-2019(online)].pdf | 2019-07-26 |
| 20 | 201921030389-Correspondence to notify the Controller [12-12-2024(online)].pdf | 2024-12-12 |
| 21 | 201921030389-Written submissions and relevant documents [30-12-2024(online)].pdf | 2024-12-30 |
| 21 | 201921030389-STATEMENT OF UNDERTAKING (FORM 3) [26-07-2019(online)].pdf | 2019-07-26 |
| 21 | 201921030389-DRAWINGS [26-07-2019(online)].pdf | 2019-07-26 |
| 22 | 201921030389-FORM 1 [26-07-2019(online)].pdf | 2019-07-26 |
| 22 | 201921030389-ORIGINAL UR 6(1A) FORM 26-301224.pdf | 2025-01-14 |
| 23 | 201921030389-PatentCertificate10-02-2025.pdf | 2025-02-10 |
| 23 | 201921030389-POWER OF AUTHORITY [26-07-2019(online)].pdf | 2019-07-26 |
| 24 | 201921030389-IntimationOfGrant10-02-2025.pdf | 2025-02-10 |
| 24 | 201921030389-STATEMENT OF UNDERTAKING (FORM 3) [26-07-2019(online)].pdf | 2019-07-26 |
| 1 | searchE_16-04-2021.pdf |