Abstract: The present subject matter relates to entity resolution, and in particular, relates to providing an entity resolution from documents. The method comprises obtaining the plurality of documents from at least one data source. The plurality of documents is blocked into at least one bucket based on textual similarity and inter-document references among the plurality of documents. Further, within each bucket, a merged document for each entity may be created based on an iterative match-merge technique. The iterative match-merge technique identifies, from the plurality of documents, at least one matching pair of documents and merges the at least one matching pair of documents to create the merged document for each entity. The merged documents may be merged to generate a resolved entity-document for each entity based on a graph clustering technique.
CLIAMS:1. A method for resolving entities from a plurality of documents, the method comprising:
obtaining, by a processor (110), the plurality of documents corresponding to a plurality of entities, from at least one data source;
blocking, by the processor (110), the plurality of documents into at least one bucket based on textual similarity and inter-document references among the plurality of documents;
creating, by the processor (110), within each bucket, a merged document for each entity based on an iterative match-merge technique, wherein the iterative match-merge technique identifies, from the plurality of documents, at least one matching pair of documents and merges the at least one matching pair of documents to create the merged document for each entity; and
generating, by the processor (110), a resolved entity-document for each entity by consolidating the merged documents pertaining to each entity from each bucket based on a graph-clustering technique.
2. The method as claimed in claim 1 further comprising updating a resolved entity-document collection upon receiving a new set of documents, wherein the updating is performed based on the textual similarity and inter-document references among the new set of documents and the resolved entity-documents.
3. The method as claimed in claim 1, wherein the at least one matching pair of documents is identified based on the textual similarity and the inter-document references among the plurality of documents.
4. The method as claimed in claim 1, wherein the textual similar documents are hashed using a Locality Sensitive Hashing (LSH) technique.
5. The method as claimed in claim 1, wherein the inter-document references among the plurality of documents is determined using a document traversal technique.
6. The method as claimed in claim 1, wherein the merged document for each entity is created using an R-Swoosh technique.
7. An entity resolution system (102) for entity resolution from a plurality of documents, the entity resolution system (102) comprising:
a processor (110);
a blocking module (120), coupled to the processor (110), to,
obtain the plurality of documents corresponding to a plurality of entities, from at least one data source; and
block the plurality of documents into at least one bucket based on textual similarity and inter-document references among the plurality of documents;
and
a merging module (122), coupled to the processor (110), to,
create, within each bucket, a merged document for each entity based on an iterative match-merge technique, wherein the iterative match-merge technique identifies, from the plurality of documents, at least one matching pair of documents and merges the at least one matching pair of documents to create the merged document for each entity; and
generate a resolved entity-document for each entity by consolidating the merged documents pertaining to each entity from each bucket based on a graph-clustering technique.
8. The entity resolution system (102) as claimed in claim 7 further comprising an updating module (124), coupled to the processor (110), to update the resolved entity document collection upon receiving a new set of documents, wherein the updating is performed based on the textual similarity and inter-document references among the new set of documents and the resolved entity-documents.
9. The entity resolution system (102) as claimed in claim 7, wherein the at least one matching pair of documents is identified based on the textual similarity and the inter-document references among the plurality of documents.
10. The entity resolution system (102) as claimed in claim 7, wherein the blocking module (120) hashes the textually similar documents using a Locality Sensitive Hashing (LSH) technique.
11. The entity resolution system (102) as claimed in claim 7, wherein the blocking module (120) determines the inter-document references among the plurality of documents using a document traversal technique.
12. The entity resolution system (102) as claimed in claim 7, wherein the merging module (122) creates the merged document for each entity using an R-Swoosh technique.
13. A non-transitory computer-readable medium having embodied thereon a computer program for executing a method comprising:
obtaining, by a processor (110), the plurality of documents corresponding to a plurality of entities, from at least one data source;
blocking, by the processor (110), the plurality of documents into at least one bucket based on textual similarity and inter-document references among the plurality of documents;
creating, by the processor (110), within each bucket, a merged document for each entity based on an iterative match-merge technique, wherein the iterative match-merge technique identifies, from the plurality of documents, at least one matching pair of documents and merges the at least one matching pair of documents to create the merged document for each entity; and
generating, by the processor (110), a resolved entity-document for each entity by consolidating the merged documents pertaining to each entity from each bucket based on a graph-clustering technique. ,TagSPECI:As Attached
| # | Name | Date |
|---|---|---|
| 1 | SPEC IN.pdf | 2018-08-11 |
| 2 | PD011515IN-SC_Request for Priority Documents-PCT.pdf | 2018-08-11 |
| 3 | FORM 5.pdf | 2018-08-11 |
| 4 | FORM 3.pdf | 2018-08-11 |
| 5 | FIG IN.pdf | 2018-08-11 |
| 6 | ABSTRACT1.jpg | 2018-08-11 |
| 7 | 169-MUM-2014-Power of Attorney-200115.pdf | 2018-08-11 |
| 8 | 169-MUM-2014-FORM 1(6-3-2014).pdf | 2018-08-11 |
| 9 | 169-MUM-2014-Correspondence-200115.pdf | 2018-08-11 |
| 10 | 169-MUM-2014-CORRESPONDENCE(6-3-2014).pdf | 2018-08-11 |
| 11 | 169-MUM-2014-FER.pdf | 2019-11-26 |
| 12 | 169-MUM-2014-FORM 3 [22-04-2020(online)].pdf | 2020-04-22 |
| 13 | 169-MUM-2014-OTHERS [26-05-2020(online)].pdf | 2020-05-26 |
| 14 | 169-MUM-2014-FER_SER_REPLY [26-05-2020(online)].pdf | 2020-05-26 |
| 15 | 169-MUM-2014-CLAIMS [26-05-2020(online)].pdf | 2020-05-26 |
| 1 | 169MUM2014SearchStrategy_14-10-2019.pdf |