Sign In to Follow Application
View All Documents & Correspondence

Method, Device, And System For Clustering Document Objects Based On Information Content

Abstract: This disclosure relates to method, device, and system for clustering document objects based on information content. The method may include identifying a plurality of object chunks from at least one document based on semantic context of each of the plurality of object chunks, determining at least one document portion from the at least one document as a base document based on a plurality of parameters applied to the plurality of object chunks, determining a plurality of hierarchies within the base document, and categorizing the plurality of object chunks based on the plurality of hierarchies and information in each of the plurality of object chunks. It should be noted that each of the plurality of object chunks may include at least one object selected from the at least one document. Figure 2

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #
Filing Date
30 November 2018
Publication Number
23/2020
Publication Type
INA
Invention Field
COMPUTER SCIENCE
Status
Email
bangalore@knspartners.com
Parent Application
Patent Number
Legal Status
Grant Date
2023-10-17
Renewal Date

Applicants

WIPRO LIMITED
Doddakannelli, Sarjapur Road, Bangalore, Karnataka, India, Pin Code-560 035.

Inventors

1. MANJUNATH RAMACHANDRA IYER
80, Sadhana, 2nd Main, BSK 3rd Stage, Katriguppe East, Bangalore, Karnataka, India, Pin Code-560 085.
2. BOBY CHAITANYA VILLARI
199 West Park Road, Near New Basketball Stadium, Malleswaram West, Bengaluru, Karnataka, India, Pin Code-560 055.
3. RAMESHWAR PRATAP
U-101 Purva Fairmont Apartment, 24th Main, Sector-2, HSR Layout, Bangalore, Karnataka, India.

Specification

Claims:
WE CLAIM:
1. A method of clustering document objects based on information content, the method comprising:
identifying, by a document clustering device, a plurality of object chunks from at least one document based on semantic context of each of the plurality of object chunks, wherein each of the plurality of object chunks comprise at least one object selected from the at least one document;
determining, by the document clustering device, at least one document portion from the at least one document as a base document, based on a plurality of parameters applied to the plurality of object chunks;
determining, by the document clustering device, a plurality of hierarchies within the base document; and
categorizing, by the document clustering device, the plurality of object chunks based on the plurality of hierarchies and information in each of the plurality of object chunks.

2. The method of claim 1, wherein each of the at least one object comprises at least one of text, an image, a figure, a table, or a graph.

3. The method of claim 1, wherein identifying an object chunk from the plurality of object chunks comprises:
summarizing a paragraph within a document from the at least one document;
iteratively adding at least one sentence to the paragraph;
iteratively computing a summary quotient based on length of sentences within the paragraph and length of the at least one first sentence added in a current iteration; and
iteratively comparing the summary quotient with a predefined threshold.

4. The method of claim 3, further comprising demarcating the object chunk in a current iteration, when the summary quotient in the current iteration exceeds the predefined threshold, wherein the demarcated object chunk excludes the at least one sentence added in the current iteration.

5. The method of claim 1, wherein determining the at least one document portion as the base document comprises:
determining the plurality of parameters for each document portion in a plurality of document portions within the at least one document, wherein the plurality of document portions comprise the at least one document portion;
computing, for each document portion, a weighted sum of the plurality of parameters in response to determining the plurality of parameters for each document portion; and
selecting the at least one document portion as the base document in response to computing the weighted sum for each document portion, wherein the at least one document portion comprises the highest weighted sum.

6. The method of claim 1, wherein the plurality of parameters comprises at least one of: number of object chunks in each document portion, number of object chunks in each document portion that are common with remaining document portions in the plurality of document portions, number of object chunks in each document portion that overlap with one or more of the remaining document portions, or number of documents from the at least one document that each document portion overlaps.

7. The method of claim 1, wherein categorizing an object chunk from the plurality of object chunks comprises:
creating an index for the object chunk based on iterative summarization of the object chunk; and
extracting information context from the object chunk based on frequency of occurrence of each term in the object chunk and total number of terms in the object chunk.

8. The method of claim 7, wherein iterative summarization is performed to reduce a summary of the object chunk to a predefined number of words.

9. The method of claim 7, wherein the object chunk is categorized in a hierarchy from the plurality of hierarchies based on similarity of the index and the information context with the hierarchy.

10. The method of claim 1 further comprising receiving a user query, wherein the user query comprises at least one of textual query and a vocal query.


11. The method of claim 10 further comprising:
extracting keywords from the user query to determine a context of the user query;
comparing the extracted keywords with each hierarchy in the plurality of hierarchies to identify a hierarchy matching the extracted keywords;
retrieving at least one object chunk from a set of chunks categorized within the matching hierarchy; and
presenting the at least one object chunk to a user generating the user query.

12. The method of claim 11, wherein the at least one object chunk is retrieved based on history associated with the user.

13. A system for clustering document objects based on information content, the method comprising:
a document clustering device comprising at least one processor and a memory storing instructions that, when executed by the at least one processor, cause the at least one processor to perform operations comprising:
identifying a plurality of object chunks from at least one document based on semantic context of each of the plurality of object chunks, wherein each of the plurality of object chunks comprise at least one object selected from the at least one document;
determining at least one document portion from the at least one document as a base document, based on a plurality of parameters applied to the plurality of object chunks;
determining a plurality of hierarchies within the base document; and
categorizing the plurality of object chunks based on the plurality of hierarchies and information in each of the plurality of object chunks.

14. The system of claim 13, wherein identifying an object chunk from the plurality of object chunks comprises:
summarizing a paragraph within a document from the at least one document;
iteratively adding at least one sentence to the paragraph;
iteratively computing a summary quotient based on length of sentences within the paragraph and length of the at least one first sentence added in a current iteration; and
iteratively comparing the summary quotient with a predefined threshold.

15. The system of claim 14, wherein the operations further comprise demarcating the object chunk in a current iteration, when the summary quotient in the current iteration exceeds the predefined threshold, wherein the demarcated object chunk excludes the at least one sentence added in the current iteration.

16. The system of claim 13, wherein determining the at least one document portion as the base document comprises:
determining the plurality of parameters for each document portion in a plurality of document portions within the at least one document, wherein the plurality of document portions comprise the at least one document portion;
computing, for each document portion, a weighted sum of the plurality of parameters in response to determining the plurality of parameters for each document portion; and
selecting the at least one document portion as the base document in response to computing the weighted sum for each document portion, wherein the at least one document portion comprises the highest weighted sum.

17. The system of claim 13, wherein categorizing an object chunk from the plurality of object chunks comprises:
creating an index for the object chunk based on iterative summarization of the object chunk; and
extracting information context from the object chunk based on frequency of occurrence of each term in the object chunk and total number of terms in the object chunk.

18. The system of claim 17, wherein iterative summarization is performed to reduce a summary of the object chunk to a predefined number of words, and wherein the object chunk is categorized in a hierarchy from the plurality of hierarchies based on similarity of the index and the information context with the hierarchy.

19. The method of claim 13, wherein the operations further comprise:
receiving a user query;
extracting keywords from the user query to determine a context of the user query;
comparing the extracted keywords with each hierarchy in the plurality of hierarchies to identify a hierarchy matching the extracted keywords;
retrieving at least one object chunk from a set of chunks categorized within the matching hierarchy, wherein the at least one object chunk is retrieved based on history associated with the user; and
presenting the at least one object chunk to a user generating the user query.

Documents

Application Documents

# Name Date
1 201841045339-STATEMENT OF UNDERTAKING (FORM 3) [30-11-2018(online)].pdf 2018-11-30
2 201841045339-REQUEST FOR EXAMINATION (FORM-18) [30-11-2018(online)].pdf 2018-11-30
3 201841045339-POWER OF AUTHORITY [30-11-2018(online)].pdf 2018-11-30
4 201841045339-FORM 18 [30-11-2018(online)].pdf 2018-11-30
5 201841045339-FORM 1 [30-11-2018(online)].pdf 2018-11-30
6 201841045339-FIGURE OF ABSTRACT [30-11-2018].jpg 2018-11-30
7 201841045339-DRAWINGS [30-11-2018(online)].pdf 2018-11-30
8 201841045339-DECLARATION OF INVENTORSHIP (FORM 5) [30-11-2018(online)].pdf 2018-11-30
9 201841045339-COMPLETE SPECIFICATION [30-11-2018(online)].pdf 2018-11-30
10 201841045339-Request Letter-Correspondence [11-12-2018(online)].pdf 2018-12-11
11 201841045339-Power of Attorney [11-12-2018(online)].pdf 2018-12-11
12 201841045339-Form 1 (Submitted on date of filing) [11-12-2018(online)].pdf 2018-12-11
13 201841045339-Proof of Right (MANDATORY) [09-05-2019(online)].pdf 2019-05-09
14 Correspondence by Agent_Proof of Right_15-05-2019.pdf 2019-05-15
15 201841045339-PETITION UNDER RULE 137 [01-10-2021(online)].pdf 2021-10-01
16 201841045339-OTHERS [01-10-2021(online)].pdf 2021-10-01
17 201841045339-Information under section 8(2) [01-10-2021(online)].pdf 2021-10-01
18 201841045339-FORM-26 [01-10-2021(online)].pdf 2021-10-01
19 201841045339-FORM 3 [01-10-2021(online)].pdf 2021-10-01
20 201841045339-FER_SER_REPLY [01-10-2021(online)].pdf 2021-10-01
21 201841045339-CORRESPONDENCE [01-10-2021(online)].pdf 2021-10-01
22 201841045339-CLAIMS [01-10-2021(online)].pdf 2021-10-01
23 201841045339-FER.pdf 2021-10-17
24 201841045339-PatentCertificate17-10-2023.pdf 2023-10-17
25 201841045339-IntimationOfGrant17-10-2023.pdf 2023-10-17
26 201841045339-PROOF OF ALTERATION [11-01-2024(online)].pdf 2024-01-11

Search Strategy

1 SearchStrategy45339E_23-02-2021.pdf

ERegister / Renewals

3rd: 11 Jan 2024

From 30/11/2020 - To 30/11/2021

4th: 11 Jan 2024

From 30/11/2021 - To 30/11/2022

5th: 11 Jan 2024

From 30/11/2022 - To 30/11/2023

6th: 11 Jan 2024

From 30/11/2023 - To 30/11/2024

7th: 26 Nov 2024

From 30/11/2024 - To 30/11/2025