A Method And A System For Dynamic Summarization Of An Input Text

Abstract: In an embodiment, a method (200) of dynamic summarization of text is disclosed. The method (200) may include receiving (202) a plurality of text documents (104A) and a user input query (104B), and segregating (204) the input text into a plurality of paragraphs. The method (200) may further include creating (206B) a plurality of vectors corresponding to the plurality of paragraphs, and clustering (208) the plurality of vectors to generate one or more clusters of vectors. The method (200) may further include identifying a set of clusters from the one or more clusters, based on the user input query, and identifying relevant sentences from the set of clusters, based on the user input query. The method (200) may further include generating (212) a text summary using the relevant sentences.

Patent Information

Application #

Filing Date

16 September 2020

Publication Number

24/2023

Publication Type

INA

Invention Field

COMPUTER SCIENCE

Status

Email

MOHAMMED.FAISAL@LTTS.COM

Parent Application

Patent Number

Legal Status

Grant Date

2025-03-27

Renewal Date

Applicants

L&T TECHNOLOGY SERVICES LIMITED

DLF IT SEZ Park 2nd Floor – Block 3, 1/124, Mount Poonamallee Road, Ramapuram Chennai

Inventors

1. MADHUSUDAN SINGH

B-603, Ajmera Stone Park 1st Cross Street, Electronic City -1 Bangalore- 560100

2. ARITRA GHOSH DASTIDAR

Flat-06 181 Banamali Banerjee Road, Near Sodepur Shanidev Mandir, Kolkata-700082

3. VANAPALLI VENKATA NIRMAL RAMESH RAYULU

6-37 Gopalnagar, Behind Govt Junior College, Sabbavaram Mandal, Visakhapatnam-531035

Specification

Claims:We Claim:
1. A method (200) for dynamic summarization of a text, the method (200) comprising:
receiving (202), by a text summarization device (102), a plurality of text documents (104A) and a user input query (104B);
segregating (204), by the text summarization device (102), the input text into a plurality of paragraphs, wherein the plurality of paragraphs is stored in an object database;
creating (206B), by the text summarization device (102), a plurality of vectors corresponding to the plurality of paragraphs;
clustering (208), by the text summarization device (102), the plurality of vectors to generate one or more clusters of vectors, wherein the clustering is performed based on a clustering criterion;
identifying, by the text summarization device (102), a set of clusters from the one or more clusters, based on the user input query;
identifying, by the text summarization device (102), relevant sentences from the set of clusters, based on the user input query; and
generating (212), by the text summarization device (102), a text summary using the relevant sentences.
2. The method (200) as claimed in claim 1, wherein the plurality of vectors is created corresponding to the plurality of paragraphs using at least one of Word2vec model or a Global Vectors (GloVe) model.
3. The method (200) as claimed in claim 1, wherein the clustering criterion is based on at least one of a hierarchical clustering or a K-Means clustering.
4. The method (200) as claimed in claim 1, comprising performing a silhouette analysis to determine a number of clusters.

5. The method (200) as claimed in claim 1, wherein generating the text summary comprises performing (210) a cosine similarity between the set of clusters and vector corresponding to the user input query. , Description:Technical Field
[001] This disclosure relates generally to text summarization, and more particularly to a method and a system for dynamically generating text summarization for an input text.
BACKGROUND
[002] In today’s fast-paced world, summarization of text plays a significant role in retrieving useful content from lengthy text documents (or manuals) that are available through different sources like World Wide Web. Since these text documents contain multiple topics, it can be a difficult task to manually process these text documents to obtain relevant information from those text documents in a concise form.
[003] Some automated techniques are known to be used for text summarization. For example, one technique includes summarizing the source text using concepts of graph theory. The source text is processed by structural analysis techniques to create a structural summarization. This structural summarization is further summarized by compressing portions of it that are inclined to generalization. However, since each instance needs to be up-to-date with the data from other instances, data exchange and its validity between the instances poses a challenge.
[004] Therefore, a semantic based system and method for dynamic summarization of an input text based on an input keyword is desired.
SUMMARY OF THE INVENTION
[005] In an embodiment, a method of dynamic summarization of text is disclosed. The method may include receiving a plurality of documents and a user input query. The method may further include segregating the input text into a plurality of paragraphs, and creating a plurality of vectors corresponding to the plurality of paragraphs. The method may further include clustering the plurality of vectors to generate one or more clusters of vectors, and identifying a set of clusters from the one or more clusters, based on the user input query. The method may further include identifying relevant sentences from the set of clusters, based on the user input query, and generating a text summary using the relevant sentences.
BRIEF DESCRIPTION OF THE DRAWINGS
[006] The accompanying drawings, which are incorporated in and constitute a part of this disclosure, illustrate exemplary embodiments and, together with the description, serve to explain the disclosed principles.
[007] FIG. 1 is a block diagram of a system for dynamic summarization of an input text, in accordance with an embodiment of the present disclosure.
[008] FIG.2 is a flowchart of a method for dynamic summarization of an input text, in accordance with an embodiment of the present disclosure.
[009] FIG. 3A is an exemplary process of clustering a plurality of vectors to generate one or more clusters of vectors, in accordance with an embodiment of the present disclosure.
[010] FIG. 3B is an exemplary process of identifying relevant sentences from the set of clusters, in accordance with an embodiment of the present disclosure.
[011] FIG. 3C is an exemplary process of generating text summary using the relevant sentences, in accordance with an embodiment of the present disclosure.
DETAILED DESCRIPTION OF THE DRAWINGS
[012] Exemplary embodiments are described with reference to the accompanying drawings. Wherever convenient, the same reference numbers are used throughout the drawings to refer to the same or like parts. While examples and features of disclosed principles are described herein, modifications, adaptations, and other implementations are possible without departing from the spirit and scope of the disclosed embodiments. It is intended that the following detailed description be considered as exemplary only, with the true scope and spirit being indicated by the following claims. Additional illustrative embodiments are listed below.
[013] Referring to FIG. 1, a block diagram of a system 100 for dynamic summarization of an input text is illustrated, in accordance with an embodiment of the present disclosure. The system 100 may include a text summarization device 102 that may be configured to summarize an input text. The text summarization device 102 may further include an input text module 106, a segregation module 108, a paragraph vector module 110A, a query vector module 110B, a clustering module 112, a sentence identification module 116, and an output module 118.
[014] The input text module 106 may receive a plurality of text documents 104A and a user input query 104B. For example, the input text module 106 may have two input parameters - documents and number of topics. By way of an example, the input text module 106 may receive the user input query 104B from a user in form of words or sentences based on which summarization needs to be performed. Further, the input text module 106 may receive the plurality of text documents 104A that may be stored in a database (not shown in FIG. 1) of the text summarization device 102. The text documents 104A may include a collection of multiple topics of one or more domain, such as but not limited to electrical and electronics, mechanical, civil, and biotechnology.
[015] The segregation module 108 may be configured to segregate the input text into a plurality of paragraphs. The plurality of paragraphs may be stored in an object database. In some embodiments, the one or more paragraphs may be mapped into a dictionary with ID and then stored in the object database. The paragraph vector module 110A may be configured to create a plurality of vectors corresponding to the plurality of paragraphs. In other words, the paragraph vector module 110A may create a vector corresponding to each of the plurality of paragraphs. In some embodiments, the plurality of vectors may be created corresponding to the plurality of paragraphs using at least one of Word2vec model or a Global Vectors (GloVe) model. For example, the Word2vec model may create vector representation of each word of dimension (1,300) in one paragraph and then take the mean to create a vector which represents the paragraph of dimension (1,300). The query vector module 110B may create vectors for the user input query 104B. Similar to the paragraph vector module 110A, the query vector module 110B may create the vectors using the Word2vec model or the GloVe model.
[016] The clustering module 112 may be configured to cluster the plurality of vectors to generate one or more clusters of vectors. The clustering may be performed based on a clustering criterion. In some embodiments, the clustering criterion may be based on content similarity or context similarity. Further, the clustering criterion may be based on at least one of a hierarchical clustering or a K-Means clustering. Further in some embodiments, the clustering module 112 may perform a silhouette analysis to determine a number of clusters. This is further explained in detail in conjunction with FIG. 3A-3C.
[017] The cluster identification module 114 may identify a set of clusters from the one or more clusters, based on the user input query. For example, cluster identification module 114 may identify the set of clusters which are similar to the input query. The sentence identification module 116 may identify relevant sentences from the set of clusters, based on the user input query. For example, the sentence identification module 116 may identify the relevant sentences from the set of clusters based on similarity of the sentences with the input query. In some embodiments, the sentence identification module 116 may perform a cosine similarity analysis of the vectors of the user input query 104B with the mean vector of the one or more paragraphs of each of the plurality of clusters. Based on the cosine similarity (i.e. matching the cosine similarity), the output module 118 may generate the summarized output text.
[018] Referring now to FIG. 2, a flowchart of a method 200 for dynamic summarization of an input text is illustrated, in accordance with an embodiment of the present disclosure. As mentioned earlier, a user may feed the user input query 104B to the input text module 104 in the form of words or sentences based on which the text is to be summarized. The user input query 104B may be any natural language words or sentences that are fed by the user. Further, a plurality of text documents 104A may be fed to the input text module 104. As such, at step 202, input text in form of the plurality of documents 104A and the user input query 104B may be received. At step 204, the input text may be segregated into a plurality of paragraphs. The plurality of paragraphs may be stored in an object database. It may be noted that in order to segregate the input text into the plurality of paragraphs, one or more techniques may be used. For example, the plurality of paragraphs may be identified by image analysis to distinguish the paragraphs from one another.
[019] At step 206, a plurality of vectors may be created corresponding to the plurality of paragraphs. For example, the plurality of vectors may be created corresponding to the plurality of paragraphs using a Word2vec model or a Global Vectors (GloVe) model. At step 208, the plurality of vectors may be clustered to generate one or more clusters of vectors. The clustering may be performed based on a clustering criterion. In some embodiments, the clustering criterion may be based on content similarity or context similarity. For example, the clustering criterion may be based on a hierarchical clustering or a K-Means clustering. Further, in some embodiments, at step 208, a silhouette analysis may be performed to determine a number of clusters. The clustering is further explained in conjunction with FIG. 3A.
[020] At step 208, a set of clusters may be identified from the one or more clusters, based on the user input query. At step 210, relevant sentences may be identified from the set of clusters, based on the user input query. In some embodiments, the relevant sentences may be identified from the set of clusters by matching cosine similarity between all the vectors of the cluster vectors with the input query received from an end-user. As a result of this matching, content of the cluster vector with maximum cosine similarity may be obtained and displayed. This is further explained in detail in conjunction with FIG. 3B.
[021] At step 212, a text summary may be generated using the relevant sentences. It may be noted that since the summarization of the input text is generated based on the user input query, a different summarization of the input text may be generated for every different user input query. This is further explained in detail in conjunction with FIG. 3C.
[022] Referring now to FIG. 3A, a process 300A of clustering (a plurality of vectors to generate one or more clusters of vectors) is illustrated, in accordance with an embodiment of the present disclosure. By way of an example, an input text 302 may be fed into the text summarization device 102. The clustering module 112 may generate clusters 304A, 304B, 304C, 304D from the input text 302. As mentioned above, the clustering may be performed based on a hierarchical clustering or a K-Means clustering. For example, the input text 302 may include multiple topics from various domains, such as mechanical, electrical, civil, bioengineering, etc. Each of the plurality of clusters may include one or more paragraphs of similar content.
[023] Referring now to FIG. 3B, a process 300B of identifying relevant sentences from the set of clusters is illustrated, in accordance with an embodiment of the present disclosure. For example, the clusters 304A, 304B, 304C, 304D may be fed into sentence identification module 116. The sentence identification module 116 may apply the cosine similarity between all the vectors of the cluster vectors with the input query 306 (received from an end-user). As a result of this matching, content 308 comprising relevant sentences with maximum cosine similarity may be obtained and displayed. As shown in FIG. 3B, the content 308 may contain one or more relevant sentences. For example, when the user inputs an input query related to biomedical engineering (BME), content 308 (i.e. relevant sentences) related to BME are identified.
[024] Referring now to FIG. 3C, a process 300C of generating text summary using the relevant sentences is illustrated, in accordance with an embodiment of the present disclosure. For example, the output module 118 of the text summarization device 102 may generate a text summary 310 using the relevant sentences 308 (i.e. the content 308). It may be noted that the text summary 310 may be generated based on the input query 306 using the cosine similarity. For example, when the user inputs an input query related to BME, then a cosine similarity matching may be performed between the vectors of the one or more paragraphs of each of the plurality of clusters and the vectors of the user input query 306. Upon matching the content related to BME from one of the plurality of clusters, the summarized text may be generated (as shown in FIG.3C).
[025] The present disclosure discusses various techniques for semantic based dynamic summarization of the input text based on the input keyword using the concept of clustering. The techniques employ hierarchical clustering/K-means algorithm for clustering one or more paragraphs. Further, the techniques provide topic wise segregation of the paragraphs of the input text. The techniques provide a unique solution of dynamic summarization of an input text by performing cosine similarity between all the vectors of the cluster vectors with the input topic by the end-user, thereby reducing difficulty and cost in retrieving relevant content from the plurality of documents.
[026] It is intended that the disclosure and examples be considered as exemplary only, with a true scope and spirit of disclosed embodiments being indicated by the following claims.

Documents

Application Documents

#	Name	Date
1	202041040119-STATEMENT OF UNDERTAKING (FORM 3) [16-09-2020(online)].pdf	2020-09-16
2	202041040119-PROVISIONAL SPECIFICATION [16-09-2020(online)].pdf	2020-09-16
3	202041040119-FORM 1 [16-09-2020(online)].pdf	2020-09-16
4	202041040119-DRAWINGS [16-09-2020(online)].pdf	2020-09-16
5	202041040119-DECLARATION OF INVENTORSHIP (FORM 5) [16-09-2020(online)].pdf	2020-09-16
6	202041040119-Proof of Right [04-03-2021(online)].pdf	2021-03-04
7	202041040119-DRAWING [16-09-2021(online)].pdf	2021-09-16
8	202041040119-Covering Letter [16-09-2021(online)].pdf	2021-09-16
9	202041040119-COMPLETE SPECIFICATION [16-09-2021(online)].pdf	2021-09-16
10	202041040119-Correspondence-14-12-2021.pdf	2021-12-14
11	202041040119-Correspondence_Request Form Mail Id Update_30-06-2022.pdf	2022-06-30
12	202041040119-Form18_Examination Request_29-08-2022.pdf	2022-08-29
13	202041040119-Correspondence_Form18_29-08-2022.pdf	2022-08-29
14	202041040119-FORM-26 [07-09-2022(online)].pdf	2022-09-07
15	202041040119-Covering Letter [09-05-2023(online)].pdf	2023-05-09
16	202041040119-FER.pdf	2023-11-21
17	202041040119-PETITION UNDER RULE 137 [21-05-2024(online)].pdf	2024-05-21
18	202041040119-OTHERS [21-05-2024(online)].pdf	2024-05-21
19	202041040119-FER_SER_REPLY [21-05-2024(online)].pdf	2024-05-21
20	202041040119-DRAWING [21-05-2024(online)].pdf	2024-05-21
21	202041040119-COMPLETE SPECIFICATION [21-05-2024(online)].pdf	2024-05-21
22	202041040119-CLAIMS [21-05-2024(online)].pdf	2024-05-21
23	202041040119-US(14)-HearingNotice-(HearingDate-16-07-2024).pdf	2024-06-27
24	202041040119-FORM-26 [02-07-2024(online)].pdf	2024-07-02
25	202041040119-Correspondence to notify the Controller [02-07-2024(online)].pdf	2024-07-02
26	202041040119-Written submissions and relevant documents [23-07-2024(online)].pdf	2024-07-23
27	202041040119-PETITION UNDER RULE 137 [23-07-2024(online)].pdf	2024-07-23
28	202041040119-FORM-26 [23-07-2024(online)].pdf	2024-07-23
29	202041040119-FORM 13 [23-07-2024(online)].pdf	2024-07-23
30	202041040119-PatentCertificate27-03-2025.pdf	2025-03-27
31	202041040119-IntimationOfGrant27-03-2025.pdf	2025-03-27

Search Strategy

1	Search040119E_12-10-2023.pdf