Sign In to Follow Application
View All Documents & Correspondence

Entity Resolution From Documents

Abstract: The present subject matter relates to entity resolution, and in particular, relates to providing an entity resolution from documents. The method comprises obtaining the plurality of documents from at least one data source. The plurality of documents is blocked into at least one bucket based on textual similarity and inter-document references among the plurality of documents. Further, within each bucket, a merged document for each entity may be created based on an iterative match-merge technique. The iterative match-merge technique identifies, from the plurality of documents, at least one matching pair of documents and merges the at least one matching pair of documents to create the merged document for each entity. The merged documents may be merged to generate a resolved entity-document for each entity based on a graph clustering technique.

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #
Filing Date
17 January 2014
Publication Number
35/2015
Publication Type
INA
Invention Field
COMPUTER SCIENCE
Status
Email
iprdel@lakshmisri.com
Parent Application

Applicants

TATA CONSULTANCY SERVICES LIMITED
Nirmal Building, 9th Floor, Nariman Point, Mumbai, Maharashtra 400021

Inventors

1. AGARWAL, Puneet
154B Block A, Sector 63 Noida
2. SHROFF, Gautam
154B Block A, Sector 63 Noida
3. MALHOTRA, Pankaj
154B Block A, Sector 63 Noida

Specification

CLIAMS:1. A method for resolving entities from a plurality of documents, the method comprising:
obtaining, by a processor (110), the plurality of documents corresponding to a plurality of entities, from at least one data source;
blocking, by the processor (110), the plurality of documents into at least one bucket based on textual similarity and inter-document references among the plurality of documents;
creating, by the processor (110), within each bucket, a merged document for each entity based on an iterative match-merge technique, wherein the iterative match-merge technique identifies, from the plurality of documents, at least one matching pair of documents and merges the at least one matching pair of documents to create the merged document for each entity; and
generating, by the processor (110), a resolved entity-document for each entity by consolidating the merged documents pertaining to each entity from each bucket based on a graph-clustering technique.
2. The method as claimed in claim 1 further comprising updating a resolved entity-document collection upon receiving a new set of documents, wherein the updating is performed based on the textual similarity and inter-document references among the new set of documents and the resolved entity-documents.
3. The method as claimed in claim 1, wherein the at least one matching pair of documents is identified based on the textual similarity and the inter-document references among the plurality of documents.
4. The method as claimed in claim 1, wherein the textual similar documents are hashed using a Locality Sensitive Hashing (LSH) technique.
5. The method as claimed in claim 1, wherein the inter-document references among the plurality of documents is determined using a document traversal technique.
6. The method as claimed in claim 1, wherein the merged document for each entity is created using an R-Swoosh technique.
7. An entity resolution system (102) for entity resolution from a plurality of documents, the entity resolution system (102) comprising:
a processor (110);
a blocking module (120), coupled to the processor (110), to,
obtain the plurality of documents corresponding to a plurality of entities, from at least one data source; and
block the plurality of documents into at least one bucket based on textual similarity and inter-document references among the plurality of documents;
and
a merging module (122), coupled to the processor (110), to,
create, within each bucket, a merged document for each entity based on an iterative match-merge technique, wherein the iterative match-merge technique identifies, from the plurality of documents, at least one matching pair of documents and merges the at least one matching pair of documents to create the merged document for each entity; and
generate a resolved entity-document for each entity by consolidating the merged documents pertaining to each entity from each bucket based on a graph-clustering technique.
8. The entity resolution system (102) as claimed in claim 7 further comprising an updating module (124), coupled to the processor (110), to update the resolved entity document collection upon receiving a new set of documents, wherein the updating is performed based on the textual similarity and inter-document references among the new set of documents and the resolved entity-documents.
9. The entity resolution system (102) as claimed in claim 7, wherein the at least one matching pair of documents is identified based on the textual similarity and the inter-document references among the plurality of documents.
10. The entity resolution system (102) as claimed in claim 7, wherein the blocking module (120) hashes the textually similar documents using a Locality Sensitive Hashing (LSH) technique.
11. The entity resolution system (102) as claimed in claim 7, wherein the blocking module (120) determines the inter-document references among the plurality of documents using a document traversal technique.
12. The entity resolution system (102) as claimed in claim 7, wherein the merging module (122) creates the merged document for each entity using an R-Swoosh technique.
13. A non-transitory computer-readable medium having embodied thereon a computer program for executing a method comprising:
obtaining, by a processor (110), the plurality of documents corresponding to a plurality of entities, from at least one data source;
blocking, by the processor (110), the plurality of documents into at least one bucket based on textual similarity and inter-document references among the plurality of documents;
creating, by the processor (110), within each bucket, a merged document for each entity based on an iterative match-merge technique, wherein the iterative match-merge technique identifies, from the plurality of documents, at least one matching pair of documents and merges the at least one matching pair of documents to create the merged document for each entity; and
generating, by the processor (110), a resolved entity-document for each entity by consolidating the merged documents pertaining to each entity from each bucket based on a graph-clustering technique. ,TagSPECI:As Attached

Documents

Application Documents

# Name Date
1 SPEC IN.pdf 2018-08-11
2 PD011515IN-SC_Request for Priority Documents-PCT.pdf 2018-08-11
3 FORM 5.pdf 2018-08-11
4 FORM 3.pdf 2018-08-11
5 FIG IN.pdf 2018-08-11
6 ABSTRACT1.jpg 2018-08-11
7 169-MUM-2014-Power of Attorney-200115.pdf 2018-08-11
8 169-MUM-2014-FORM 1(6-3-2014).pdf 2018-08-11
9 169-MUM-2014-Correspondence-200115.pdf 2018-08-11
10 169-MUM-2014-CORRESPONDENCE(6-3-2014).pdf 2018-08-11
11 169-MUM-2014-FER.pdf 2019-11-26
12 169-MUM-2014-FORM 3 [22-04-2020(online)].pdf 2020-04-22
13 169-MUM-2014-OTHERS [26-05-2020(online)].pdf 2020-05-26
14 169-MUM-2014-FER_SER_REPLY [26-05-2020(online)].pdf 2020-05-26
15 169-MUM-2014-CLAIMS [26-05-2020(online)].pdf 2020-05-26

Search Strategy

1 169MUM2014SearchStrategy_14-10-2019.pdf