System And Method For Processing Of Contract Documents

< Back

System And Method For Processing Of Contract Documents

Abstract: ABSTRACT SYSTEM AND METHOD FOR PROCESSING OF CONTRACT DOCUMENTS The present disclosure provides a method for processing of contract documents and a system for rendering searchable a database comprising a set of contract documents utilizing the said method. The method comprises providing a Named Entity Recognition (NER) model configured for recognizing entities in text of contract documents; receiving a given contract document; extracting a set of entities from the text of the given contract document by implementing the NER model; and associating the extracted set of entities with the given contract document. The system comprises an interface configured to receive one or more keywords for searching of a desired contract document in the set of contract documents in the database; and a processing arrangement configured to compare and identify one or more contract documents based on match between the received one or more keywords and corresponding associated one or more entities in the sets of entities. FIG. 4

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #

Filing Date

26 September 2023

Publication Number

13/2025

Publication Type

INA

Invention Field

ELECTRONICS

Status

Parent Application

Applicants

QUOQO TECHNOLOGIES PRIVATE LIMITED

A-307 Brigade Omega, Banashankari VI Stage, Bangalore 560 062 India

Inventors

1. Chetan Nagendra

C/o Quoqo Technologies (P) Ltd., A-307 Brigade Omega, Banashankari VI Stage, Bangalore 560 062

2. Gurunath Gandikota

C/o Quoqo Technologies (P) Ltd., A-307 Brigade Omega, Banashankari VI Stage, Bangalore 560 062

Specification

Description:SYSTEM AND METHOD FOR PROCESSING OF CONTRACT DOCUMENTS

FIELD OF THE PRESENT DISCLOSURE
[0001] The present disclosure generally relates to processing of contract documents, and particularly to a method for extracting a set of entities for a given contract document, and a system for rendering searchable a plurality of contract documents based on associated entities therewith.

BACKGROUND
[0002] Companies, increasingly, have a huge corpus of historical legal documents that they will need to maintain and store and take action till their expiry. It is a great amount of effort if these documents are dumped into a data store and if they need to be referred to when required. Particularly, if the documents are still active, it is important to be able to refer to them from time to time to address customer queries or deal with legal proceedings. For instance, most modern enterprises have a large number of contracts in force at any given time. A contract document defines the scope of obligations and benefits with regards to external and internal parties involved. For example, a non-disclosure agreement (NDA) is a type of binding contract document between two or more parties that prevents sensitive information from being shared with others. Enterprises may regularly be adding new NDA contracts for each new business deal, for example, with a customer, a contractor, a vendor, or the like. Contract document review may be described as a process of reviewing content of documents to identify information relevant to one or more topics. For example, NDAs typically have a lot of binding clauses which need to be analysed. Such contract document review is typically performed in order to understand contractual obligations, navigate client or customer relationships, and understand compliance risk.
[0003] The process of sifting through the documents and finding the right information can potentially be automated. Putting in place an efficient data management system with obligation management functionalities needs concepts to be identified in the document so that when a user needs to extract information regarding a particular contract that was signed with a particular party, such information is easily available to access. Also, as another use case, in traditional organizations where legal professionals stay in a company for many years, historical information regarding engagements with certain parties of interest used to stay in their minds. With changing work cultures and increased employee churn, the continuity of information is difficult to establish and historical information about a particular engagement is lost. In such scenarios having a smart repository where contracts could be stored and easily accessed is very important. Such smart repositories should have all features like data storage, multi format file management, searchability across the documents using multiple criteria, obligation management etc. Advanced machine learning algorithms are necessary to handle some of these tasks.
[0004] In such scenarios, having a smart repository for storing and easily accessing contracts is of paramount importance. These smart repositories should possess features such as data storage, multi-format file management, searchability across documents using multiple criteria, and obligation management, among others. Advanced machine learning algorithms are necessary to handle some of these tasks, enabling organizations to streamline their legal document management and improve overall efficiency. However, currently there are no efficient tools that may streamline such processes.
[0005] Therefore, in light of the foregoing discussion, there exists a need to overcome problems associated with the prior-art, and provide efficient techniques for extracting a set of entities for a given contract document, and for rendering searchable a plurality of contract documents based on associated entities therewith.

SUMMARY
[0006] The present disclosure aims to provide a solution for organizing, indexing, and searching legal documents, leveraging advanced machine learning techniques to automate the process of identifying and extracting relevant information. The present disclosure supports better decision-making for enterprises, ensuring that critical knowledge is preserved and readily available when needed. In particular, the present disclosure provides a repository designed to maintain, store, and render searchable historical documents, with a primary focus on intelligent contract management. Such repository enables users to query information from the stored documents, streamlining the process of accessing relevant data.
[0007] In this context, when multiple documents are uploaded to a smart repository, they must be tagged with relevant metadata. Critical tags for a document may include the name of the document, contract type, effective date, term of the contract, renewal date, jurisdiction, and parties involved, among others. This metadata serves as the foundation for search engines to identify and retrieve specific documents. However, manually assigning tags can become a tedious process when dealing with hundreds or thousands of documents uploaded to a repository. The present disclosure proposes an automatic tag extraction algorithm that leverages machine learning models to efficiently extract tags from multiple documents uploaded by a user to a repository. By employing such techniques, the present disclosure is able to significantly reduce the time and effort involved in tagging documents, ultimately improving the overall efficiency of the repository's document management capabilities. This solution not only streamlines the process of organizing and indexing documents but also ensures that critical information is easily accessible and searchable when needed.
[0008] In an aspect, the present disclosure provides a method for processing of contract documents. The method comprises providing a Named Entity Recognition (NER) model trained on a training corpus of a plurality of contract documents and configured for recognizing entities in text of contract documents. The method further comprises receiving a given contract document. The method further comprises pre-processing text of the given contract document as per requirements of the NER model. The method further comprises extracting a set of entities from the pre-processed text of the given contract document by implementing the NER model. The method further comprises associating the extracted set of entities with the given contract document.
[0009] In one or more embodiments, the NER model is trained by pre-processing text in each contract document in the training corpus of the plurality of contract documents by utilizing one or more of tokenization technique, normalization technique, and encoding technique. The NER model is further trained by annotating the pre-processed text in each contract document in the training corpus of the plurality of contract documents, to generate labels representing entities within the text in each contract document in the training corpus of the plurality of contract documents. The NER model is further trained by training the NER model for extracting the entities in the contract documents by applying machine learning technique(s) to the pre-processed and annotated text from each contract document in the training corpus of the plurality of contract documents.
[0010] In one or more embodiments, annotating the pre-processed text comprises generating Inside-Outside-Beginning (IOB) labels for the pre-processed text in each contract document in the training corpus of the plurality of contract documents.
[0011] In one or more embodiments, the machine learning technique(s) comprises at least one of Conditional Random Fields (CRFs), Bidirectional Long Short-Term Memory (BiLSTM), Convolutional Neural Networks (CNN), Embeddings from Language Models (ElMo), Stanford NLP, and transformer-based models.
[0012] In one or more embodiments, the NER model is configured to be implemented for a group of types of entities. Herein, multiple NER models, configured for different groups of types of entities, are implemented to extract the set of entities in the pre-processed text of the given contract document.
[0013] In one or more embodiments, the extracted set of entities comprises one or more of: contract parties, contract term, contract jurisdiction, contract effective date, contract type.
[0014] In another aspect, the present disclosure provides a system for searching in a database of a set of contract documents. The system comprises a processing arrangement. The processing arrangement is configured to pre-process text of each contract document in the set of contract documents as per requirements of a Named Entity Recognition (NER) model, wherein the NER model is trained on a training corpus of a plurality of contract documents and configured for recognizing entities in text of contract documents. The processing arrangement is also configured to extract a set of entities from the pre-processed text of each contract document in the set of contract documents by implementing the NER model. The processing arrangement is further configured to associate the respective extracted set of entities to corresponding one of each contract document in the set of contract documents. The system further comprises an interface configured to receive a first input comprising one or more keywords for searching of a desired contract document in the database of the set of contract documents. Herein, the processing arrangement is configured to compare the received one or more keywords with the sets of entities associated with the contract documents in the set of contract documents. The processing arrangement is further configured to identify one or more contract documents in the set of contract documents based on match between the received one or more keywords and corresponding associated one or more entities in the sets of entities. The processing arrangement is further configured to output a list of the identified one or more contract documents. Herein, the interface is further configured to display the list of the identified one or more contract documents.
[0015] In one or more embodiments, the interface is further configured to receive a second input for selection of one of contract documents from the displayed list of the identified one or more contract documents. Herein, the processing arrangement is further configured to generate a second output comprising an identified respective set of entities corresponding to the said one of contract documents based on the received second input for the selection of one of contract documents. The interface is further configured to display the identified respective set of entities corresponding to the said one of contract documents.
[0016] In one or more embodiments, the interface is further configured to provide a filter with a list of entities based on the associated entities to the contract document in the set of contract documents. The interface is also configured to receive a third input for selection of one of entities from the list of entities. Herein, the processing arrangement is further configured to filter one or more contract documents in the set of contract documents based on a match between the received third input for selection of one of entities from the list of entities and the corresponding associated one or more entities in the sets of entities. The processing arrangement is also configured to generate a third output comprising a list of the filtered one or more contract documents. The interface is further configured to display the list of the filtered one or more contract documents.
[0017] In one or more embodiments, the interface is further configured to display a portion of text of each one of the filtered one or more contract documents being displayed, wherein the said portion of text comprises at least one of the entities corresponding to the received third input for selection of one of entities.
[0018] In yet another aspect, the present disclosure provides a computer program comprising computer executable program code, when executed the program code causes a computing arrangement to perform the method as described in the preceding paragraphs.
[0019] Embodiments of the present disclosure substantially eliminate or at least partially address the aforementioned problems in the prior art, and enables for an efficient and user-friendly system and method for training a machine learning model for processing of contract documents.
[0020] Additional aspects, advantages, features and objects of the present disclosure would be made apparent from the drawings and the detailed description of the illustrative embodiments construed in conjunction with the appended claims that follow.
[0021] It will be appreciated that features of the present disclosure are susceptible to being combined in various combinations without departing from the scope of the present disclosure as defined by the appended claims.

BRIEF DESCRIPTION OF THE FIGURES
[0022] The summary above, as well as the following detailed description of illustrative embodiments, is better understood when read in conjunction with the appended drawings. For the purpose of illustrating the present disclosure, exemplary constructions of the disclosure are shown in the drawings. However, the present disclosure is not limited to specific methods and instrumentalities disclosed herein. Moreover, those skilled in the art will understand that the drawings are not to scale. Wherever possible, like elements have been indicated by identical numbers.
[0023] For a more complete understanding of example embodiments of the present disclosure, reference is now made to the following descriptions taken in connection with the accompanying drawings in which:
FIG. 1 illustrates a system that may reside on and may be executed by a computer, which may be connected to a network, in accordance with one or more exemplary embodiments of the present disclosure;
FIG. 2 illustrates a diagrammatic view of a server, in accordance with one or more exemplary embodiments of the present disclosure;
FIG. 3 illustrates a diagrammatic view of a user device, in accordance with one or more exemplary embodiments of the present disclosure;
FIG. 4 illustrates an exemplary flowchart listing steps involved in a method for processing of contract documents, in accordance with one or more exemplary embodiments of the present disclosure;
FIG. 5 illustrates a schematic of a modelling pipeline for generating a Named Entity Recognition (NER) model as implemented in the method of FIG. 4, in accordance with one or more exemplary embodiments of the present disclosure;
FIG. 6 illustrates a schematic of an inference pipeline for implementing the generated NER model, as per the modelling pipeline of FIG. 5, in the method of FIG. 4, in accordance with one or more exemplary embodiments of the present disclosure;
FIG. 7 illustrates a high-level schematic of a workflow for training the NER model for processing of contract documents, in accordance with one or more exemplary embodiments of the present disclosure;
FIG. 8 illustrates a block diagram of system for rendering searchable a database comprising a set of contract documents, in accordance with one or more exemplary embodiments of the present disclosure;
FIG. 9 illustrates an exemplary interface implemented in the system for managing the database, in accordance with one or more exemplary embodiments of the present disclosure;
FIG. 10 illustrates an exemplary interface implemented in the system for uploading contract documents in the database, in accordance with one or more exemplary embodiments of the present disclosure;
FIG. 11 illustrates an exemplary interface implemented in the system for receiving a user input for searching a desired contract document in the database and displaying, as results, a set of contract documents, in accordance with one or more exemplary embodiments of the present disclosure;
FIG. 12 illustrates an exemplary interface implemented in the system for selection of one of contract documents from the set of contract documents, in accordance with one or more exemplary embodiments of the present disclosure;
FIG. 13 illustrates an exemplary interface implemented in the system for searching the set of contract documents, in accordance with one or more exemplary embodiments of the present disclosure; and
FIG. 14 illustrates an exemplary interface implemented in the system for displaying a portion of text of filtered one or more contract documents, in accordance with one or more exemplary embodiments of the present disclosure.
[0024] In the accompanying drawings, an underlined number is employed to represent an item over which the underlined number is positioned or an item to which the underlined number is adjacent. A non-underlined number relates to an item identified by a line linking the non-underlined number to the item. When a number is non-underlined and accompanied by an associated arrow, the non-underlined number is used to identify a general item at which the arrow is pointing.

DETAILED DESCRIPTION
[0025] In the following description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the present disclosure. It will be apparent, however, to one skilled in the art that the present disclosure is not limited to these specific details.
[0026] Reference in this specification to “one embodiment” or “an embodiment” means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the present disclosure. The appearance of the phrase “in one embodiment” in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Further, the terms “a” and “an” herein do not denote a limitation of quantity, but rather denote the presence of at least one of the referenced items. Moreover, various features are described which may be exhibited by some embodiments and not by others. Similarly, various requirements are described which may be requirements for some embodiments but not for other embodiments.
[0027] Furthermore, in the following detailed description of the present disclosure, numerous specific details are set forth in order to provide a thorough understanding of the present disclosure. However, it will be understood that the present disclosure may be practiced without these specific details. In other instances, well-known methods, procedures, components, and circuits have not been described in detail so as not to unnecessarily obscure aspects of the present disclosure.
[0028] Embodiments described herein may be discussed in the general context of computer-executable instructions residing on some form of computer-readable storage medium, such as program modules, executed by one or more computers or other devices. By way of example, and not limitation, computer-readable storage media may comprise non-transitory computer-readable storage media and communication media; non-transitory computer-readable media include all computer-readable media except for a transitory, propagating signal. Generally, program modules include routines, programs, objects, components, data structures, etc., that perform particular tasks or implement particular abstract data types. The functionality of the program modules may be combined or distributed as desired in various embodiments.
[0029] Some portions of the detailed description that follows are presented and discussed in terms of a process or method. Although steps and sequencing thereof are disclosed in figures herein describing the operations of this method, such steps and sequencing are exemplary. Embodiments are well suited to performing various other steps or variations of the steps recited in the flowchart of the figure herein, and in a sequence other than that depicted and described herein. Some portions of the detailed descriptions that follow are presented in terms of procedures, logic blocks, processing, and other symbolic representations of operations on data bits within a computer memory. These descriptions and representations are the means used by those skilled in the data processing arts to most effectively convey the substance of their work to others skilled in the art. In the present application, a procedure, logic block, process, or the like, is conceived to be a self-consistent sequence of steps or instructions leading to a desired result. The steps are those utilizing physical manipulations of physical quantities. Usually, although not necessarily, these quantities take the form of electrical or magnetic signals capable of being stored, transferred, combined, compared, and otherwise manipulated in a computer system. It has proven convenient at times, principally for reasons of common usage, to refer to these signals as transactions, bits, values, elements, symbols, characters, samples, pixels, or the like.
[0030] In some implementations, any suitable computer usable or computer readable medium (or media) may be utilized. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. The computer-usable, or computer-readable, storage medium (including a storage device associated with a computing device) may be, for example, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer-readable medium may include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), an optical fibre, a portable compact disc read-only memory (CD-ROM), an optical storage device, a digital versatile disk (DVD), a static random access memory (SRAM), a memory stick, a floppy disk, a mechanically encoded device such as punch-cards or raised structures in a groove having instructions recorded thereon, a media such as those supporting the internet or an intranet, or a magnetic storage device. Note that the computer-usable or computer-readable medium could even be a suitable medium upon which the program is stored, scanned, compiled, interpreted, or otherwise processed in a suitable manner, if necessary, and then stored in a computer memory. In the context of the present disclosure, a computer-usable or computer-readable, storage medium may be any tangible medium that can contain or store a program for use by or in connection with the instruction execution system, apparatus, or device.
[0031] In some implementations, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. In some implementations, such a propagated signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. In some implementations, the computer readable program code may be transmitted using any appropriate medium, including but not limited to the internet, wireline, optical fibre cable, RF, etc. In some implementations, a computer readable signal medium may be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
[0032] In some implementations, computer program code for carrying out operations of the present disclosure may be assembler instructions, instruction-set-architecture (ISA) instructions, machine instructions, machine dependent instructions, microcode, firmware instructions, state-setting data, or either source code or object code written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C++ or the like. Java and all Java-based trademarks and logos are trademarks or registered trademarks of Oracle and/or its affiliates. However, the computer program code for carrying out operations of the present disclosure may also be written in conventional procedural programming languages, such as the "C" programming language, PASCAL, or similar programming languages, as well as in scripting languages such as JavaScript, PERL, or Python. In present implementations, the used language for training may be one of Python, Tensorflow, Bazel, C, C++. Further, decoder in user device (as will be discussed) may use C, C++ or any processor specific ISA. Furthermore, assembly code inside C/C++ may be utilized for specific operation. Also, ASR (automatic speech recognition) and G2P decoder along with entire user system can be run in embedded Linux (any distribution), Android, iOS, Windows, or the like, without any limitations. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user’s computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user’s computer through a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the internet using an Internet Service Provider). In some implementations, electronic circuitry including, for example, programmable logic circuitry, field-programmable gate arrays (FPGAs) or other hardware accelerators, micro-controller units (MCUs), or programmable logic arrays (PLAs) may execute the computer readable program instructions/code by utilizing state information of the computer readable program instructions to personalize the electronic circuitry, in order to perform aspects of the present disclosure.
[0033] In some implementations, the flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus (systems), methods and computer program products according to various implementations of the present disclosure. Each block in the flowchart and/or block diagrams, and combinations of blocks in the flowchart and/or block diagrams, may represent a module, segment, or portion of code, which comprises one or more executable computer program instructions for implementing the specified logical function(s)/act(s). These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the computer program instructions, which may execute via the processor of the computer or other programmable data processing apparatus, create the ability to implement one or more of the functions/acts specified in the flowchart and/or block diagram block or blocks or combinations thereof. It should be noted that, in some implementations, the functions noted in the block(s) may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved.
[0034] In some implementations, these computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function/act specified in the flowchart and/or block diagram block or blocks or combinations thereof.
[0035] In some implementations, the computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed (not necessarily in a particular order) on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions/acts (not necessarily in a particular order) specified in the flowchart and/or block diagram block or blocks or combinations thereof.
[0036] Referring to example implementation of FIG. 1, there is shown a computing arrangement 100 that may reside on and may be executed by a computer (e.g., computer 12), which may be connected to a network (e.g., network 14) (e.g., the internet or a local area network). Examples of computer 12 may include, but are not limited to, a personal computer(s), a laptop computer(s), mobile computing device(s), a server computer, a series of server computers, a mainframe computer(s), or a computing cloud(s). In some implementations, each of the aforementioned may be generally described as a computing device. In certain implementations, a computing device may be a physical or virtual device. In many implementations, a computing device may be any device capable of performing operations, such as a dedicated processor, a portion of a processor, a virtual processor, a portion of a virtual processor, portion of a virtual device, or a virtual device. In some implementations, a processor may be a physical processor or a virtual processor. In some implementations, a virtual processor may correspond to one or more parts of one or more physical processors. In some implementations, the instructions/logic may be distributed and executed across one or more processors, virtual or physical, to execute the instructions/logic. Computer 12 may execute an operating system, for example, but not limited to, Microsoft Windows; Mac OS X; Red Hat Linux, or a custom operating system. (Microsoft and Windows are registered trademarks of Microsoft Corporation in the United States, other countries or both; Mac and OS X are registered trademarks of Apple Inc. in the United States, other countries or both; Red Hat is a registered trademark of Red Hat Corporation in the United States, other countries or both; and Linux is a registered trademark of Linus Torvalds in the United States, other countries or both).
[0037] In some implementations, the instruction sets and subroutines of computing arrangement 100, which may be stored on storage device, such as storage device 16, coupled to computer 12, may be executed by one or more processors (not shown) and one or more memory architectures included within computer 12. In some implementations, storage device 16 may include but is not limited to: a hard disk drive; a flash drive, a tape drive; an optical drive; a RAID array (or other array); a random-access memory (RAM); and a read-only memory (ROM).
[0038] In some implementations, network 14 may be connected to one or more secondary networks (e.g., network 18), examples of which may include but are not limited to: a local area network; a wide area network; or an intranet, for example.
[0039] In some implementations, computer 12 may include a data store, such as a database (e.g., relational database, object-oriented database, triplestore database, etc.) and may be located within any suitable memory location, such as storage device 16 coupled to computer 12. In some implementations, data, metadata, information, etc. described throughout the present disclosure may be stored in the data store. In some implementations, computer 12 may utilize any known database management system such as, but not limited to, DB2, in order to provide multi-user access to one or more databases, such as the above noted relational database. In some implementations, the data store may also be a custom database, such as, for example, a flat file database or an XML database. In some implementations, any other form(s) of a data storage structure and/or organization may also be used. In some implementations, computing arrangement 100 may be a component of the data store, a standalone application that interfaces with the above noted data store and/or an applet / application that is accessed via client applications 22, 24, 26, 28. In some implementations, the above noted data store may be, in whole or in part, distributed in a cloud computing topology. In this way, computer 12 and storage device 16 may refer to multiple devices, which may also be distributed throughout the network.
[0040] In some implementations, computer 12 may execute application 20 for processing of contract documents to extract and associate set of entities therewith, and thereby for rendering them searchable (as discussed later in more detail). In some implementations, computing arrangement 100 and/or application 20 may be accessed via one or more of client applications 22, 24, 26, 28. In some implementations, computing arrangement 100 may be a standalone application, or may be an applet / application / script / extension that may interact with and/or be executed within application 20, a component of application 20, and/or one or more of client applications 22, 24, 26, 28. In some implementations, application 20 may be a standalone application, or may be an applet / application / script / extension that may interact with and/or be executed within computing arrangement 100, a component of computing arrangement 100, and/or one or more of client applications 22, 24, 26, 28. In some implementations, one or more of client applications 22, 24, 26, 28 may be a standalone application, or may be an applet / application / script / extension that may interact with and/or be executed within and/or be a component of computing arrangement 100 and/or application 20. Examples of client applications 22, 24, 26, 28 may include, but are not limited to, a standard and/or mobile web browser, an email application (e.g., an email client application), a textual and/or a graphical user interface, a customized web browser, a plugin, an Application Programming Interface (API), or a custom application. The instruction sets and subroutines of client applications 22, 24, 26, 28, which may be stored on storage devices 30, 32, 34, 36, coupled to user devices 38, 40, 42, 44, may be executed by one or more processors and one or more memory architectures incorporated into user devices 38, 40, 42, 44.
[0041] In some implementations, one or more of storage devices 30, 32, 34, 36, may include but are not limited to: hard disk drives; flash drives, tape drives; optical drives; RAID arrays; random access memories (RAM); and read-only memories (ROM). Examples of user devices 38, 40, 42, 44 (and/or computer 12) may include, but are not limited to, a personal computer (e.g., user device 38), a laptop computer (e.g., user device 40), a smart/data-enabled, cellular phone (e.g., user device 42), a notebook computer (e.g., user device 44), a tablet (not shown), a server (not shown), a television (not shown), a smart television (not shown), a media (e.g., video, photo, etc.) capturing device (not shown), and a dedicated network device (not shown). User devices 38, 40, 42, 44 may each execute an operating system, examples of which may include but are not limited to, Android, Apple iOS, Mac OS X; Red Hat Linux, or a custom operating system.
[0042] In some implementations, one or more of client applications 22, 24, 26, 28 may be configured to effectuate some or all of the functionality of computing arrangement 100 (and vice versa). Accordingly, in some implementations, computing arrangement 100 may be a purely server-side application, a purely client-side application, or a hybrid server-side / client-side application that is cooperatively executed by one or more of client applications 22, 24, 26, 28 and/or computing arrangement 100.
[0043] In some implementations, one or more of client applications 22, 24, 26, 28 may be configured to effectuate some or all of the functionality of application 20 (and vice versa). Accordingly, in some implementations, application 20 may be a purely server-side application, a purely client-side application, or a hybrid server-side / client-side application that is cooperatively executed by one or more of client applications 22, 24, 26, 28 and/or application 20. As one or more of client applications 22, 24, 26, 28, computing arrangement 100, and application 20, taken singly or in any combination, may effectuate some or all of the same functionality, any description of effectuating such functionality via one or more of client applications 22, 24, 26, 28, computing arrangement 100, application 20, or combination thereof, and any described interaction(s) between one or more of client applications 22, 24, 26, 28, computing arrangement 100, application 20, or combination thereof to effectuate such functionality, should be taken as an example only and not to limit the scope of the disclosure.
[0044] In some implementations, one or more of users 46, 48, 50, 52 may access computer 12 and computing arrangement 100 (e.g., using one or more of user devices 38, 40, 42, 44) directly through network 14 or through secondary network 18. Further, computer 12 may be connected to network 14 through secondary network 18, as illustrated with phantom link line 54. Computing arrangement 100 may include one or more user interfaces, such as browsers and textual or graphical user interfaces, through which users 46, 48, 50, 52 may access computing arrangement 100.
[0045] In some implementations, the various user devices may be directly or indirectly coupled to communication network, such as communication network 14 and communication network 18, hereinafter simply referred to as network 14 and network 18, respectively. For example, user device 38 is shown directly coupled to network 14 via a hardwired network connection. Further, user device 44 is shown directly coupled to network 18 via a hardwired network connection. User device 40 is shown wirelessly coupled to network 14 via wireless communication channel 56 established between user device 40 and wireless access point (i.e., WAP) 58, which is shown directly coupled to network 14. WAP 58 may be, for example, an IEEE 802.11a, 802.11b, 802.11g, Wi-Fi, RFID, and/or Bluetooth (including Bluetooth Low Energy) device that is capable of establishing wireless communication channel 56 between user device 40 and WAP 58. User device 42 is shown wirelessly coupled to network 14 via wireless communication channel 60 established between user device 42 and cellular network / bridge 62, which is shown directly coupled to network 14.
[0046] In some implementations, some or all of the IEEE 802.11x specifications may use Ethernet protocol and carrier sense multiple access with collision avoidance (i.e., CSMA/CA) for path sharing. The various 802.11x specifications may use phase-shift keying (i.e., PSK) modulation or complementary code keying (i.e., CCK) modulation, for example, Bluetooth (including Bluetooth Low Energy) is a telecommunications industry specification that allows, e.g., mobile phones, computers, smart phones, and other electronic devices to be interconnected using a short-range wireless connection. Other forms of interconnection (e.g., Near Field Communication (NFC)) may also be used.
[0047] The computing arrangement 100 may include a server (such as server 200, as shown in FIG. 2) for processing of contract documents (as will be described later in more detail). In the present implementations, the computing arrangement 100 itself may be embodied as the server 200. Herein, FIG. 2 is a block diagram of an example of the server 200 capable of implementing embodiments according to the present disclosure. In the example of FIG. 2, the server 200 includes a processing arrangement 205 for running software applications (such as, the application 20 of FIG. 1) and optionally an operating system. As illustrated, the server 200 further includes a database 210 which stores applications and data for use by the processing arrangement 205. Storage 215 provides non-volatile storage for applications and data and may include fixed disk drives, removable disk drives, flash memory devices, and CD-ROM, DVD-ROM or other optical storage devices. An optional user input device 220 includes devices that communicate user inputs from one or more users to the server 200 and may include keyboards, mice, joysticks, touch screens, etc. A communication or network interface 225 is provided which allows the server 200 to communicate with other computer systems via an electronic communications network, including wired and/or wireless communication and including an Intranet or the Internet. In one embodiment, the server 200 receives instructions and user inputs from a remote computer through communication interface 225. Communication interface 225 can comprise a transmitter and receiver for communicating with remote devices. An optional display device 250 may be provided which can be any device capable of displaying visual information in response to a signal from the server 200. The components of the server 200, including the processing arrangement 205, the database 210, the data storage 215, the user input devices 220, the communication interface 225, and the display device 250, may be coupled via one or more data buses 260.
[0048] In the embodiment of FIG. 2, a graphics system 230 may be coupled with the data bus 260 and the components of the server 200. The graphics system 230 may include a physical graphics processing arrangement (GPU) 235 and graphics memory. The GPU 235 generates pixel data for output images from rendering commands. The physical GPU 235 can be configured as multiple virtual GPUs that may be used in parallel (concurrently) by a number of applications or processes executing in parallel. For example, mass scaling processes for rigid bodies or a variety of constraint solving processes may be run in parallel on the multiple virtual GPUs. Graphics memory may include a display memory 240 (e.g., a framebuffer) used for storing pixel data for each pixel of an output image. In another embodiment, the display memory 240 and/or additional memory 245 may be part of the database 210 and may be shared with the processing arrangement 205. Alternatively, the display memory 240 and/or additional memory 245 can be one or more separate memories provided for the exclusive use of the graphics system 230. In another embodiment, graphics processing arrangement 230 includes one or more additional physical GPUs 255, similar to the GPU 235. Each additional GPU 255 may be adapted to operate in parallel with the GPU 235. Each additional GPU 255 generates pixel data for output images from rendering commands. Each additional physical GPU 255 can be configured as multiple virtual GPUs that may be used in parallel (concurrently) by a number of applications or processes executing in parallel, e.g., processes that solve constraints. Each additional GPU 255 can operate in conjunction with the GPU 235, for example, to simultaneously generate pixel data for different portions of an output image, or to simultaneously generate pixel data for different output images. Each additional GPU 255 can be located on the same circuit board as the GPU 235, sharing a connection with the GPU 235 to the data bus 260, or each additional GPU 255 can be located on another circuit board separately coupled with the data bus 260. Each additional GPU 255 can also be integrated into the same module or chip package as the GPU 235. Each additional GPU 255 can have additional memory, similar to the display memory 240 and additional memory 245, or can share the memories 240 and 245 with the GPU 235. It is to be understood that the circuits and/or functionality of GPU as described herein could also be implemented in other types of processors, such as general-purpose or other special-purpose coprocessors, or within a CPU.
[0049] The computing arrangement 100 may also include a user device 300 (as shown in FIG. 3). In embodiments of the present disclosure, the user device 300 may embody a smartphone, a personal computer, a tablet, or the like. Herein, FIG. 3 is a block diagram of an example of the user device 300 capable of implementing embodiments according to the present disclosure. In the example of FIG. 3, the user device 300 includes a processor 305 (hereinafter, referred to as CPU 305) for running software applications (such as, the application 20 of FIG. 1) and optionally an operating system. A user input device 320 is provided which includes devices that communicate user inputs from one or more users and may include keyboards, mice, joysticks, touch screens, and/or microphones. Further, a network adapter 325 is provided which allows the user device 300 to communicate with other computer systems (e.g., the server 200 of FIG. 2) via an electronic communications network, including wired and/or wireless communication and including the Internet. The user device 300 may also include a decoder 355 may be any device capable of decoding (decompressing) data that may be encoded (compressed). A display device 350 may be provided which may be any device capable of displaying visual information, including information received from the decoder 355. In particular, as will be described below, the display device 350 may provide an interface, such that the display device 350 is configured to display information received from the server 200 of FIG. 2. The components of the user device 300 may be coupled via one or more data buses 360.
[0050] It may be seen that compared to the server 200 in the example of FIG. 2, the user device 300 in the example of FIG. 3 may have fewer components and less functionality. However, the user device 300 may include other components, for example, in addition to those described above. In general, the user device 300 may be any type of device that has one or more of display capability and the capability to receive inputs from a user and send such inputs to the server 200. However, it may be appreciated that the user device 300 may have additional capabilities beyond those just mentioned.
[0051] Referring to FIG. 4, illustrated is an exemplary flowchart listing steps involved in a method (as represented by reference numeral 400) for processing of contract documents. In particular, the method 400 is embodied for processing of a given contract document to extract a set of entities therefrom, and subsequently associate the extracted set of entities therewith, to render searchable the said given contract document. As used herein, the term “entities” refer to specific pieces of information or data points within the text of the contract documents. These entities can be understood as key elements that hold significant meaning and value in understanding and interpreting the contract. Examples of entities include contract parties, contract term, contract jurisdiction, contract effective date, and contract type. For the said purpose of extracting the set of entities from the given contract document, the present method 400 implements a Named Entity Recognition (NER) model, as discussed in detail in the proceeding paragraphs. When the NER model extracts these entities from the contract documents, they can be used as tags or metadata. Tags and metadata are descriptors or labels that provide additional context and structure to the contract documents, allowing for more efficient organization, searchability, and retrieval of relevant information within a database or repository. By associating these extracted entities with the corresponding contract documents, the system can enable users to search and filter the documents based on specific criteria, such as the contract type or involved parties, making the database of contract documents more accessible and user-friendly.
[0052] Herein, at step 402, the method 400 includes providing a Named Entity Recognition (NER) model trained on a training corpus of a plurality of contract documents and configured for recognizing entities in text of contract documents. As used herein, Named Entity Recognition is a task of information extraction that seeks to locate and classify named entities mentioned in the extracted text into pre-defined categories such as person names, organizations, locations, medical codes, time expressions, quantities, monetary values, percentages, etc., which, as may be contemplated, is required for any kind of analysis of a given contract document. NER is a subtask of Natural Language Processing (NLP) that focuses on identifying and classifying named entities within a given text into predefined categories. Named entities refer to proper nouns or specific terms that can be associated with people, organizations, locations, dates, quantities, monetary values, percentages, and more. The primary goal of NER is to extract structured information from unstructured text data, which can then be used for various purposes, such as information retrieval, data analysis, and relationship extraction.
[0053] As discussed, the NER model is trained on a training corpus of a plurality of contract documents and configured for recognizing entities in text of contract documents. In general, the NER model is typically developed using machine learning techniques, with supervised learning being the most common approach. In supervised learning, the model is trained on a labelled dataset, where the named entities within the text have been annotated with their respective categories. The model learns to recognize patterns and relationships between words, phrases, and their corresponding entity categories based on these annotations. There are several types of NER models that can be employed, ranging from rule-based models, which rely on linguistic patterns and heuristics, to more advanced machine learning-based models, as discussed later in the proceeding paragraphs in some more detail.
[0054] In an embodiment, the NER model is trained by, first, pre-processing text in each contract document in the training corpus of the plurality of contract documents. The documents are first cleaned to remove any unnecessary punctuation marks. The data is then tokenized by splitting sentences into individual words or subwords (tokens). Features are extracted from the text, for example: parts of speech tags, Word shapes, context information etc. Features can be word embeddings too.
[0055] The NER model is further trained by annotating the pre-processed text in each contract document in the training corpus of the plurality of contract documents, to generate labels representing entities within the text in each contract document in the training corpus of the plurality of contract documents. That is, to train the NER model effectively, the pre-processed text in each contract document within the training corpus is annotated. This annotation process involves manually or semi-automatically identifying and labelling entities of interest within the text. These labels represent the named entities that the NER model will learn to recognize and classify. By providing annotated examples, the NER model can learn the patterns and features that are associated with specific types of entities in the contract documents.
[0056] Before the annotation process begins, the categories of entities that are relevant to the task must be determined. These categories can include, but are not limited to, person names, organizations, locations, dates, monetary values, and percentages. The defined categories will guide the annotators during the labelling process and determine the types of entities the NER model will recognize. In the present embodiments, the entities of interest may include contract parties, contract term, contract jurisdiction, contract effective date, contract type. Thereafter, the annotation process typically includes involving annotators, who should be domain experts or crowdsourced workers, to review the pre-processed text in each contract document and identify instances of named entities within the text. An annotation tool is chosen to facilitate the process of labelling the entities with the appropriate category tags in the pre-processed text. Examples of annotation tools include Prodigy, Inception, Docanno, and Brat. Inception is a command-line NLP tool that is designed for speed and accuracy. It is not as user-friendly as Prodigy, but it can be used to process large amounts of data. Doccano is a web-based annotation tool that can be used to create training data for NER models. It is easy to use and can be used by people with no NLP experience. Brat is a command-line annotation tool that is similar to Doccano. It is not as user-friendly as Doccano, but it can be used to create more complex training data. These tools provide user-friendly interfaces and features to assist annotators in marking up the text with entity labels.
[0057] Once the annotation process is completed, the annotation tool compiles the marked-up text and generates labels representing the entities in the contract documents. In an embodiment, annotating the pre-processed text comprises generating Inside-Outside-Beginning (IOB) labels for the pre-processed text in each contract document in the training corpus of the plurality of contract documents. That is, these labels are often formatted using the Inside-Outside-Beginning (IOB) scheme, which captures the position and category of each entity in the text. The IOB format is particularly useful for handling multi-word entities and distinguishing between adjacent entities of the same or different categories. After the annotated text is obtained, the labelled data is used to train the NER model. The model learns to identify and classify named entities in the contract documents based on the patterns and relationships found in the annotated examples. This training process enables the NER model to recognize and extract entities of interest in unseen contract documents, thereby automating the process of information extraction and classification. In an alternate approach, annotation is done by simply identifying text and the underlying entities and putting the data in the form of a .json file. This kind of annotation is implemented for finetuning of LLM based models to perform NER tasks.
[0058] The NER model is finally trained by training the NER model for extracting the entities in the contract documents by applying machine learning technique(s) to the pre-processed and annotated text from each contract document in the training corpus of the plurality of contract documents. That is, by using machine learning technique(s) and the extracted features, the NER model is trained using the annotated text from the training corpus. During the training process, the NER model learns to map the input features to the corresponding entity labels by minimizing the prediction error. This involves adjusting the model's parameters iteratively using optimization algorithms such as gradient descent, until an optimal set of parameters is achieved.
[0059] Herein, based on the problem requirements and the nature of the data, an appropriate machine learning technique or a combination of techniques is selected to train the NER model. In an example embodiment, the machine learning technique(s) comprises at least one of Conditional Random Fields (CRFs), Bidirectional Long Short-Term Memory (BiLSTM), Convolutional Neural Networks (CNN), Embeddings from Language Models (ElMo), Stanford NLP, and transformer-based models. CRFs are a probabilistic graphical model designed for sequence labelling tasks like NER, capturing dependencies between adjacent labels and leveraging contextual information. BiLSTM is an RNN variant that processes input sequences in both forward and backward directions, allowing the model to better capture the context surrounding each word for NER tasks. CNNs, primarily used for image processing, can learn local text features, such as character or word n-grams, for NER tasks by applying convolutional and pooling layers. ELMo generates context-based word embeddings that capture both semantic and syntactic information, improving the performance of various machine learning models in NER tasks. Stanford NLP is a suite of NLP tools, including a Named Entity Recognizer, that combines rule-based and machine learning approaches to identify and classify named entities in text. Transformer models, such as BERT, RoBERTa, and GPT, use self-attention mechanisms to weigh the importance of different words in the context, and can be fine-tuned for specific NER tasks to achieve state-of-the-art performance. In alternate approaches, Large Language Models (LLMs) can be finetuned to enable them to perform NER tasks. Each of these machine learning techniques has its strengths and weaknesses, and the choice of the most suitable technique depends on factors such as the size and complexity of the dataset, the nature of the task, and the desired level of performance. In some cases, a combination of these techniques can be employed to achieve better results in NER tasks.
[0060] In some examples, the trained NER model is further validated/evaluated. To assess the performance of the trained NER model, it is tested on a validation dataset that is separate from the training dataset. This dataset consists of pre-processed contract documents with annotated entities, similar to the training data. The model's predictions are compared against the actual entity labels in the validation data to calculate various performance metrics, such as precision. This evaluation process helps in identifying any potential issues with the model, such as overfitting or underfitting, and provides insights for further model improvement. If necessary, the NER model can be fine-tuned and optimized to enhance its performance. This can involve adjusting hyperparameters, experimenting with different machine learning techniques, or incorporating additional features. The goal is to develop an NER model that effectively and accurately extracts and classifies named entities in the contract documents.
[0061] Referring to FIG. 5, illustrated is a schematic of a modelling pipeline (as represented by reference numeral 500) for generating the NER model, as per embodiments of the present disclosure. As shown, in the modelling pipeline 500, at step 502, the extracted text from the contract document(s) is first inputted. At step 504, the modelling pipeline 500 involves data preparation for extracting entities. Herein, the data preparation includes manual/algorithmic extraction of entities. Herein, the manual extraction may involve legal professional(s) manually separating each entity from each of the sample contract documents in a training dataset. The algorithmic extraction may involve implementing available algorithm(s) to automate the data preparation task. At step 506, the modelling pipeline 500 involves data labelling for different entity types. This step 506 of data labelling may be performed by legal professionals (expert annotators). In the present implementation, the data labelling may result in a training list of various entities, which may be provided as a sample dataset of entities (as may be reviewed by legal professional(s)). At step 508, the modelling pipeline 500 involves data cleaning. Herein, the data cleaning may include text processing to prepare complete sentences from text, stop-word removal, removal of punctuations/numbers, etc. At step 510, the modelling pipeline 500 involves data preparation. Herein, the data preparation may include data normalization, including data stemming (which involves crude heuristic process that chops off the ends) and/or data lemmatization (which involves doing things properly with the use of a vocabulary and morphological analysis of words). The data preparation may also include data vectorization which is used to get some distinct features out of the text for the model to train on. In the present examples, the data vectorization may be achieved by using one or more of techniques, such as TFIDF (term frequency–inverse document frequency), frequency vectorization, word embeddings, word encoding using transformer architectures, and the like, as known in the art. The data preparation may further include POS (Parts-Of-Speech) tagging for different languages and retention of specific types of POS for training. Such techniques for the data preparation may be contemplated by a person skilled in the art, and thus have not been described in detail herein for the brevity of the present disclosure.
[0062] Further, at step 512, the modelling pipeline 500 involves using the data (as result of above steps) for training of the NER model. At step 514, the modelling pipeline 500 may further involve checking a performance of the implemented (trained) NER model. This may be achieved by feeding the NER model with test text, and evaluating the output(s) therefrom manually by legal professionals. If the results may be satisfying (as represented by block 516), the NER model may be considered for deployment. And, if the results may not be satisfying, the modelling pipeline 500 may move to step 518 (as shown) which involves further training of the NER model using supplemental data generated by using data augmentation techniques. Herein, the data augmentation may include adding new entities manually and/or programmatically. The supplemental data generated by data augmentation (in the step 518) may be processed using data preparation techniques (as discussed in the step 510), to be further fed to the NER model for its training as discussed in the step 512 of the modelling pipeline 500. Again, it may be checked if the results may be satisfying (as represented by block 516), and thereafter the NER model may be considered for deployment. It may be appreciated that for purposes of the present disclosure, the NER model may be deployed using cloud platforms as either server-based or serverless architectures, without any limitations.
[0063] Referring back to FIG. 4, at step 404, the method 400 includes receiving a given contract document(s) to be processed. For the purposes of the present disclosure, an interface may be provided for a user to upload the contract document(s) to be processed. Herein, the interface may allow the user to upload the said contract document(s) directly from the user device (such as, the user device 300), or from a cloud platform, like Google Drive, OneDrive, Dropbox, etc. by implementing the corresponding APIs (as known in the art) without any limitations. In other examples, the method 400 may involve automatically fetching the contract document(s) from a data repository of the user (like an enterprise client), given the access thereto, to directly process each newly added contract document.
[0064] Further, at step 406, the method 400 includes pre-processing text of the given contract document as per requirements of the NER model. For this purpose, the method 400 may involve performing initial pre-processing of the received contract document(s) to make those usable for further processing as per embodiments of the present disclosure. In the present implementations, the contract document(s) as received may be converted to a suitable format, such as, but not limited to, PDF for further processing. Herein, the received contract document(s) may be in the form of scanned images with machine unreadable text. In such cases, the received contract document(s) may be pre-processed using Optical Character Recognition (OCR) techniques to convert the text in the contract document(s) to machine readable form for further processing. In an example, the machine readable text from each contract document is extracted into a separate file (such as, a text file). Such pre-processing step may utilize available resources, such as, but not limited to, Python libraries, AWS Textract, Azure computer vision, and the like, as known and widely used in the art.
[0065] It may be appreciated that the step 406 of pre-processing text of the given contract document as per requirements of the NER model may involve similar or same sub-steps as described for pre-processing of each contract document for training of the NER model, to ensure that the text is in a suitable format for the NER model to process and analyse effectively. These sub-steps may include, but not restricted or limited to, tokenization, lowercasing, removing special characters and punctuation, stop-word removal, stemming and lemmatization, removing or replacing specific domain-specific terms or jargon, encoding, padding and truncating sequences, etc. In some examples, the method 400 may further involve implementing co-reference resolution procedure. Co-reference resolution is the task of finding all expressions that refer to the same entity in a text. In the present method 400, the text is sanitized for possible co-references before entity extraction. These pre-processing steps prepare the text of the given contract document for efficient and accurate analysis by the NER model, ensuring that the model can effectively extract and classify named entities present within the text.
[0066] Further, at step 408, the method 400 includes extracting a set of entities from the pre-processed text of the given contract document by implementing the NER model. Herein, the NER model is implemented to extract important details from the pre-processed text of the given contract document. Generally, the extraction of the set of entities from the pre-processed text of the given contract document by implementing the NER model involves analysing the text to identify and classify named entities based on their types, such as person names, organizations, locations, dates, monetary values, and percentages. The NER model utilizes various features, patterns, and contextual information present in the pre-processed text to accurately recognize and distinguish between different types of entities. It takes into account the linguistic structure, syntactic patterns, and domain-specific terminology to improve the effectiveness of the entity extraction process. The extraction process continues throughout the entire pre-processed text, generating a comprehensive set of entities that represent crucial information present in the contract document. These extracted entities can then be used for further analysis, organization, or searchability within the smart repository, enhancing the overall contract management capabilities and facilitating easy access to critical contract details when required.
[0067] In the present embodiments, the extracted set of entities comprises one or more of: contract parties, contract term, contract jurisdiction, contract effective date, contract type. Herein, contract parties related entities refer to the individuals, organizations, or entities involved in the contract. This may include the names of companies, individuals, or other legal entities that have entered into the agreement, which is crucial for understanding the parties' rights, obligations, and responsibilities under the contract. Contract term related entity refers to the duration of the contract, which may be expressed in days, months, or years. The contract term is important for determining the period during which the contract remains active and enforceable, as well as any applicable deadlines, milestones, or renewal dates associated with the agreement. Contract jurisdiction related entity represents the legal jurisdiction governing the contract, which typically refers to the country, state, or region whose laws and regulations apply to the agreement. Contract effective date related entity refers to the date on which the contract comes into force and becomes legally binding on the parties involved. Contract type related entity refers to the specific category or classification of the contract, such as a non-disclosure agreement, service agreement, employment contract, or license agreement. Identifying the contract type is essential for understanding the nature and scope of the agreement, as well as the specific rights and obligations that apply to the parties involved.
[0068] In the present embodiments, the NER model is configured to be implemented for a group of types of entities, and herein multiple NER models, configured for different groups of types of entities, are implemented to extract the set of entities in the pre-processed text of the given contract document. In other words, each NER model of the multiple NER models is configured for one of group of types of entities from different groups of types of entities. The method 400 may utilize a combined NER model for a group of entities at a time, or separate NER models for each entity type, or a mix of both, without any limitations. These multiple NER models may work in parallel or sequentially to analyse the pre-processed text of the given contract document and extract the set of entities belonging to their respective entity groups. For example, one NER model might be configured to identify and classify entities related to organizations, locations, and dates, while another NER model could be focused on monetary values, percentages, and quantities. Herein, multiple NER models, each configured for different groups of types of entities, are employed to increase the effectiveness and coverage of the entity extraction process. By utilizing multiple NER models, each tailored to specific groups of entity types, it may be possible to achieve a more comprehensive and accurate extraction of entities in the contract document.
[0069] As may be understood, a given contract document may include a plurality of clauses. It may be appreciated by a person skilled in the art that the clauses may be classified under certain predefined categories, such as clause(s) related to: (i) Parties, (ii) Purpose, (iii) Confidential Information, (iv) Recipient’s Treatment of Confidential Information, (v) Tangible Confidential Information, (vi) Exceptions to Confidential Information, (vii) Information that was available in the public domain., (viii) Information that is obtained other than through a breach of confidentiality, (ix) Information disclosure compelled by legal process, (x) Information that was developed independently, (xi) Term, (xii) No License, (xiii) Governing Law, (xiv) Equitable Relief, (xv) Entire Agreement, (xvi) No Assignment, (xvii) Severability, (xviii) Notices, (xix) No Implied Waiver, (xx) Headings and Interpretation, etc. Therefore, in order to implement the NER model for extracting entities in the given contract document, it may also be required to extract and classify clauses from the text of the given contract document. The present method 400 may implement a clause classification model for said purpose of extracting clauses and classifying clauses from the text of the given contract document. In the present implementation, in some examples, the clause classification model may be part of the NER model itself. Herein, the clause classification model may first be trained, and then may be utilized for providing inferences in the form of by processing of the text from the contract document(s) for clause classification. A simple model using a limited number of clauses per type may be first trained, and then be used to label clauses using a much bigger dataset. The labelled clauses may then be reviewed by experts and corrected. The corrected labelled dataset may then be used to develop the final model. The clause classification model may utilize one or more machine learning techniques for its implementation. In the present examples, the clause classification model may be based on any one of Naive Bayes classifier, Logistic regression, Support Vector Machine (SVM) Algorithm, Neural Embeddings based classifier, Convolutional Neural Networks (CNNs), Long Short Term Memory (LSTM) models, Transfer Learning using transformers functions, and the like. Such algorithmic extraction may use techniques like splitting paragraphs from the text, identifying keywords from a list of predefined keywords using phrase matching in Spacy library, automatic labelling of the text, and the like. Therefore, by extracting clauses, specific entities can be extracted from specific clauses of interest. Herein, for instance, each of the said implemented multiple NER models may be trained on extracting types of entities from specific types of clauses (as extracted using the clause classification model).
[0070] Finally, at step 410, the method 400 includes associating the extracted set of entities with the given contract document. Herein, associating the extracted set of entities with the given contract document involves linking or attaching the identified entities to the document in a structured manner, typically in the form of metadata. As may be understood, metadata is data that provides information about other data, making it easier to manage, search, and analyse the content. In the context of contract documents, associating the extracted entities as metadata can involve creating a structured record that includes the identified entities such as contract parties, contract term, contract jurisdiction, contract effective date, and contract type. This record can be stored alongside the contract document, either within the document itself or in a separate database or metadata repository. The association of these entities as metadata offers several benefits for contract management and analysis, such as, but not limited to, enhanced searchability, improved organization, streamlined analysis, and the like.
[0071] Referring to FIG. 6, illustrated is a schematic of an inference pipeline (as represented by reference numeral 600) for implementing the NER model for extracting entities from the contract document(s) and associating the extracted entities with the contract document(s), as per the steps 404-410 of the present method 400 described in the preceding paragraphs. As shown, in the inference pipeline 600, at step 602, the contract document(s), which may be in the form of PDF file, DOCX/DOC file, etc., is first received. At step 604, the inference pipeline 600 involves extracting text from the contract document(s) by using OCR techniques, i.e., converting the text into machine readable form. At step 606, the inference pipeline 600 involves data cleaning. Herein, the data cleaning may include text processing to split the extracted text into sentences and also, if required, to prepare complete sentences from the extracted text. The data cleaning may further include removal of punctuations and stop-words from the extracted text. At step 608, the inference pipeline 600 involves data preparation. Herein, the data preparation may include data normalization, including data stemming and/or data lemmatization; data vectorization using one or more of techniques, such as TFIDF (term frequency–inverse document frequency), frequency vectorization, word embeddings, word encoding using transformer architectures, and the like; POS tagging; etc. (as described with reference to the step 510 of the modelling pipeline 500 of FIG. 5). Further, at step 610, the inference pipeline 600 involves using the data (as result of above steps) to be fed to the NER model (as trained using the modelling pipeline 500 of FIG. 5). At step 612 of the inference pipeline 600, the NER model provides extracted entities. Herein, the NER model may classify each extracted entity as per the clause categories/types to allow for corresponding analysis thereof. Further, in the step 612 of the inference pipeline 600, the identified entities are then associated with the respective contract document(s) in a structured manner.
[0072] The present disclosure further provides a scheme for continuous training of the implemented NER model for processing of the contract documents for extracting entities therein, in order to improve the NER model over time. That is, once deployed, the NER model needs to be continuously re-trained/updated to improve its performance. For this purpose, the present method 400 may involve providing a user-interface to allow for a provision for the user to indicate if predictions of the NER model are correct or not. For example, the extracted entities may be verified by a competent person who may further identify whether a particular entity is properly extracted or not, and then correct the prediction for the particular entity if incorrect, as would be required.
[0073] Referring to FIG. 7, illustrated is a high-level schematic of a workflow (as represented by numeral 700) for (continuous) training the NER model for processing of contract documents. As shown, the workflow 700 includes three separate processes 710, 720, 730 which may be occurring simultaneously or non-simultaneously without any limitations. Herein, the process 710 relates to initial training the NER model for development of a (base) pre-trained instance of the NER model (as represented by numeral 724); the process 720 relates to implementation of the pre-trained instance of the NER model for information extraction; and the process 730 relates to re-training the NER model to generate an updated instance of the NER model for model improvement.
[0074] As shown, in the process 710, at step 712, user inputs related to training the NER model are received. Herein, the user inputs may be in the form of training dataset comprising information for training the NER model. In the present implementation, such user inputs may be received via a user input device (such as, the user input device 320) of a user device (such as, the user device 300). At step 714, the process 710 includes data preparation and pre-processing. Such steps of data preparation and pre-processing have been explained in the preceding paragraphs in reference to the modelling pipeline 500 (specifically, steps 502 to 510) of FIG. 5, and thus not repeated herein for brevity of the present disclosure. Further, at step 716, the process 710 includes initial training of the NER model. Such step of initial training of the NER model has been explained in the preceding paragraphs in reference to the modelling pipeline 500 (specifically, steps 512 to 518) of FIG. 5, and thus not repeated herein for brevity of the present disclosure.
[0075] Also, as shown in FIG. 7, in the process 720, at step 722, a contract document for information extraction is received. In some embodiments, the process 720 may involve checking for duplicity of the contract document being utilized for training the NER model by comparing the contract document to existing contract documents having previously been used for training the NER model. That is, in order to ensure that there is no duplication of documents, the backend algorithm compares documents that are being uploaded and checks for the underlying text with existing documents and disallows the user from uploading them. This saves the effort which may have otherwise been involved in unnecessary labelling of the contract document. Further, in the process 720, at step 724, the received contract document is processed via the pre-trained instance of the NER model. At step 726, the process 720 includes extracting information from the received contract document. In the present implementation, the process 720 may implement the pre-trained instance of the NER model 724 to parse the received contract document and identify one or more entities in the received contract document. Further, at step 728, the process 720 includes generating results in the form of a list of the one or more entities in the contract document. Such steps of implementation of the NER model have been explained in the preceding paragraphs in reference to the inference pipeline 600 of FIG. 6, and thus not repeated herein for brevity of the present disclosure.
[0076] Further, as shown in FIG. 7, in the process 730, at step 732, the contract document may be reviewed in respect of the extracted information therefrom. Herein, the process 730 may include allowing the user to either confirm the correctness of the corresponding one of the one or more entities with respect to the text of the contract document or to edit the corresponding one of the one or more entities with respect to the text of the contract document. At step 734, the process 730 includes utilizing the text with corresponding confirmed entities and the text with corresponding edited entities for re-training the NER model to generate an updated instance of the NER model.
[0077] Referring now to FIG. 8, illustrated is a schematic block diagram of a system (as represented by reference numeral 800) for implementing the present method 400 (as described) for processing contract documents. Herein, the method 400 for processing contract documents serves as the foundation for functionality of the system 800 and provides the core processes and algorithms to be executed by the system 800. The system 800 is built upon the method 400, incorporating it as a core component, and is designed to facilitate the execution of the method 400 by providing the necessary hardware and software infrastructure. The system 800 leverages the method 400 to enable efficient and automated extraction, processing, and association of entities within contract documents, rendering the database 802 of the set of contract documents 804 searchable.
[0078] As illustrated in FIG. 8, the system 800 is adapted for execution with a database (as represented by reference numeral 802) comprising a set of contract documents (as represented by reference numeral 804) and is configured for rendering searchable the database 802 for processing of the set of contract documents 804 therein. The system 800 includes an interface 806 which may be adapted for various needs of the system 800 at any particular instance. Referring to FIG. 9, as illustrated, the interface 806 is adapted as a first interface (as represented by reference numeral 900) which may be adapted for managing the database 802, as may be contemplated by a person skilled in the art. Referring to FIG. 10, as illustrated, the interface 806 is adapted as a second interface (as represented by reference numeral 1000) which may be adapted for uploading of new contract documents, either as single document upload or bulk documents upload, in the database 802. The second interface 1000 may also provide various fields for naming of the uploaded contract document(s), defining contract type for the uploaded contract document(s), providing manual tags for the uploaded contract document(s), defining effective or renewal date for the uploaded contract document(s), and the like.
[0079] Referring back to FIG. 8, in present implementation, the system 800 is executed in the computing arrangement 100, and may utilize the processing arrangement 205 therein, for executing the steps 402-410 of the method 400 for extracting a set of entities from the set of contract documents 804 and associating the extracted set of entities with the set of contract documents 804. In particular, in the system 800, the processing arrangement 205 includes a first module (as represented by reference numeral 810) which is configured to pre-process text of each contract document in the set of contract documents as per requirements of a Named Entity Recognition (NER) model (as represented by reference numeral 812), wherein the NER model 812 is trained on a training corpus of a plurality of contract documents and configured for recognizing entities in text of contract documents. The first module 810 of the processing arrangement 205 is further configured to extract a set of entities from the pre-processed text of each contract document in the set of contract documents by implementing the NER model 812. The first module 810 of the processing arrangement 205 is further configured to associate the respective extracted set of entities to corresponding one of each contract document in the set of contract documents. These steps for the first module 810 have been described in respect to the method 400 in the preceding paragraphs, and can be applied mutatis mutandis for the present system 800; and are thus not repeated herein for brevity of the present disclosure.
[0080] Referring to FIG. 11, as illustrated, the interface 806 is adapted as a third interface (as represented by reference numeral 1100) configured to receive a first input (as represented by reference numeral 808a in FIG. 8) comprising one or more keywords for searching of a desired contract document in the set of contract documents 804 in the database 802. In such case, in the system 800 of FIG. 8, the processing arrangement 205 is configured to process the first input 808a and facilitate searching through the database 802 of the set of contract documents 804. This is achieved by utilizing the extracted entities associated with each contract document, as metadata or tags, in the set of contract documents 804. In particular, the processing arrangement 205 is further configured to compare the received one or more keywords with the sets of entities associated with the contract documents in the set of contract documents 804. That is, when a user provides one or more keywords as search input, the processing arrangement 205 compares these keywords against the sets of entities associated with each contract document in the database 802. This comparison can be performed using various techniques such as exact matching, partial matching, or semantic similarity, depending on the specific requirements and the desired level of search accuracy. The processing arrangement 205 is further configured to identify one or more contract documents in the set of contract documents based on match between the received one or more keywords and corresponding associated one or more entities in the sets of entities. That is, upon comparing the received keywords with the associated sets of entities, the processing arrangement 205 identifies one or more contract documents where there is a match between the keywords and the corresponding entities in the sets of entities. This matching process helps the system 800 to filter and narrow down the list of relevant contract documents based on the user's search criteria. The processing arrangement 205 is further configured to generate a first output (as represented by reference numeral 808b in FIG. 8) a list of the identified one or more contract documents. That is, the processing arrangement 205 provides the first output 808b with the list of the identified contract documents that match the user's search query. This efficient search capability is enabled by the extraction and association of entities from the contract documents using the NER model 812, as described. Herein, as shown in FIG. 11, the third interface 1100 is further configured to display the list of the identified one or more contract documents. This may be achieved by providing the first output 808b to the user device 300, as shown in FIG. 8.
[0081] Referring to FIG. 12, as illustrated, the interface 806 is adapted as a fourth interface (as represented by reference numeral 1200) configured to receive a second input (as represented by reference numeral 808c in FIG. 8) for selection of one of contract documents from the displayed list of the identified one or more contract documents. That is, when the list of identified contract documents is displayed to the user, they have the option to further refine their search or focus on a specific document. To facilitate this, the system 800 receives the second input 808c from the user, indicating their selection of one of the contract documents from the displayed list. In such case, the processing arrangement 205 is further configured to generate a second output (as represented by reference numeral 808d in FIG. 8) comprising an identified respective set of entities corresponding to the said one of contract documents based on the received second input 808c for the selection of one of contract documents. That is, upon receiving the second input 808c, the processing arrangement 205 may retrieve the respective set of entities corresponding to the selected contract document. These entities, which were previously extracted using the NER model 812 and associated with the respective contract document as metadata or tags, provide valuable information about content of the corresponding contract document. Further, as shown in FIG. 12, the fourth interface 1200 is configured to display the identified respective set of entities corresponding to the said one of contract documents. That is, the system 800 then displays the respective set of entities to the user, offering a summary or an overview of the essential information contained within the selected contract document. This allows the user to quickly understand the key details of the document without having to read through the entire text, thus saving time and effort. Moreover, it enables the user to easily compare and analyse different contract documents in the set of contract documents 804 based on the displayed entities, further improving the efficiency of the document search and review process.
[0082] Referring to FIG. 13, as illustrated, the interface 806 is adapted as a fifth interface (as represented by reference numeral 1300) configured to for searching the set of contract documents 804. In particular, to enhance the user's experience in searching and analysing the contract documents, the system 800 provides a filter functionality, via the fifth interface 1300, based on the associated entities of the contract documents in the set of contract documents 804. This filter presents a list of entities to the user, which can be used to further narrow down the search results or focus on specific aspects of the documents. The fifth interface 1300 is further configured to receive a third input (as represented by reference numeral 808e in FIG. 8) for selection of one of the entities from the list of entities. That is, upon presenting the list of entities, the system 800 receives the third input 808e from the user, indicating their selection of one of the entities from the list. Such third input 808e enables the user to refine their search or analysis by focusing on the chosen entity, allowing them to quickly access relevant information within the set of contract documents 804. As shown in FIG. 13, the third input 808e may be the form of selection of one or more of contact type, entity (tag) name, contract date, etc. By providing the filter functionality with a list of entities, the system 800 empowers users to efficiently navigate through the set of contract documents 804 based on the metadata or tags associated with them. This feature not only simplifies the search process but also allows users to easily compare and analyse specific aspects of multiple contract documents, leading to more informed decision-making and an overall improvement in the management of contract documents.
[0083] In such case, in the system 800, the processing arrangement 205 is further configured to filter one or more contract documents in the set of contract documents 804 based on a match between the received third input 808e for selection of one of the entities from the list of entities and the corresponding associated one or more entities in the sets of entities. That is, upon receiving the third input 808e from the user, the processing arrangement 205 filters the contract documents in the set of contract documents 804 by identifying those documents whose associated entities match the selected entity from the list. This filtering process allows the user to focus on a specific aspect or attribute of the contract documents that is of interest to them. The processing arrangement 205 is further configured to generate a third output (as represented by reference numeral 808f in FIG. 8) comprising a list of the filtered one or more contract documents. That is, once the filtering process is complete, the processing arrangement 205 generates the third output 808f comprising the list of the filtered contract documents. This list presents the user with a narrowed-down set of contract documents that contain the selected entity, making it easier for the user to locate and analyse the relevant information. Further, as illustrated in FIG. 13, the fifth interface 1300 is configured to display the list of the filtered one or more contract documents. By providing this filtering functionality, the system 800 further improves the efficiency of searching and analysing contract documents, enabling users to quickly identify the documents that are most relevant to their needs and interests. This feature streamlines the contract management process and supports more informed decision-making in various business contexts.
[0084] Referring to FIG. 14, in some embodiments, the interface 806 is further configured is adapted as a sixth interface (as represented by reference numeral 1400) to display a portion of text of each one of the filtered one or more contract documents being displayed, wherein the said portion of text comprises at least one of the entities corresponding to the received third input for selection of one of entities. Such sixth interface 1400 is designed to enhance the user experience by displaying a portion of text from each of the filtered contract documents. This portion of text contains at least one of the entities corresponding to the received third input 808e for the selection of one of the entities. By displaying the relevant portions of text, the sixth interface 1400 allows users to quickly assess the context in which the selected entity appears within the filtered contract documents. This feature provides users with an immediate understanding of how the entity relates to the content of each contract document, without requiring them to open and read the entire document. This additional display functionality not only improves the overall usability of the system 800 but also enables users to efficiently locate and review the specific information they are interested in. As a result, users can make more informed decisions and better manage their contracts, saving time and resources in the process.
[0085] In some embodiments, the interface 806, or specifically the sixth interface 1400, is further configured to highlight the said at least one of the entities in the displayed portion of text of each one of the filtered one or more contract documents. That is, the sixth interface 1400 is further optimized to improve the user experience by highlighting the said at least one of the entities in the displayed portion of text of each one of the filtered contract documents. By visually emphasizing the selected entities, users can quickly identify and focus on the relevant information within the displayed text. This highlighting feature not only directs users' attention to the specific entities they are interested in but also simplifies the process of locating and reviewing the related information within the contract documents. As a result, users can more easily understand the context in which the highlighted entities appear, enabling them to assess the significance of the entities within the contract document. Thereby, the system 800 significantly streamlines the process of searching and analysing contract documents, leading to increased efficiency and productivity for users managing large volumes of contracts.
[0086] It may be understood that when documents are uploaded into a smart repository, tags need to be manually set. If hundreds of documents are being uploaded at a time, setting up tags is a painful process. Since identifying tags, document types, effective dates, renewal dates, etc. is primordial to the functioning of the smart repository, automation of the process of tag identification using the machine learning model would be helpful. Herein, an NER based algorithm is used to extract tags for the uploaded documents. Some tags of interest are: party1, party2, effective date which are available in the party clause of a contract, contract type, which may, for example, be available in the heading of the document and the term/termination clause of the contract document. The proposed process of dynamic review and the present machine learning model training platform may be used to train the tag extraction model and further be finetuned over time. Firstly, the entities are identified by a reviewer in corresponding clauses. That is, the first step is to train a model with some data. Ideally, the user may need to identify where the entity can be found in the entire document, and use that paragraph only for training the machine learning model. The reviewer may identify required information for 20-30 such clauses, and the training process is initiated to train the model. Then, a prediction module may identify hidden information from the remaining thousands of clauses. A list of such paragraphs across multiple documents may then be used for training the model. In an example, separate models may be created to identify each or a group of these entities by training the corresponding NER models. The NER algorithm is trained by using data with tags in an IOB form (short for inside, outside, beginning), extracted from various documents. The machine learning model may be trained using a variety of models that are available in the open domain. The process can be iterated multiple times till a desired level of accuracy is achieved. Once the models are trained, tag extraction is carried out on relevant clauses using the inference pipeline. The users can extract information of their interest using this framework. This is an intuitive and quick way for users to enable to extract custom information from text data.
[0087] The foregoing description of the specific embodiments will so fully reveal the general nature of the embodiments herein that others can, by applying current knowledge, readily modify and/or adapt for various applications such specific embodiments without departing from the generic concept, and, therefore, such adaptations and modifications should and are intended to be comprehended within the meaning and range of equivalents of the disclosed embodiments. It is to be understood that the phraseology or terminology employed herein is for the purpose of description and not of limitation. Therefore, while the embodiments herein have been described in terms of preferred embodiments, those skilled in the art will recognize that the embodiments herein can be practiced with modification within the spirit and scope of the present disclosure. , Claims:WE CLAIM:
What is claimed is:
1. A computer-implemented method for processing of contract documents, the method comprising:
providing a Named Entity Recognition (NER) model trained on a training corpus of a plurality of contract documents and configured for recognizing entities in text of contract documents;
receiving a given contract document;
pre-processing text of the given contract document as per requirements of the NER model;
extracting a set of entities from the pre-processed text of the given contract document by implementing the NER model; and
associating the extracted set of entities with the given contract document.

2. The method as claimed in claim 1, wherein the NER model is trained by:
pre-processing text in each contract document in the training corpus of the plurality of contract documents by utilizing one or more of tokenization technique, normalization technique, and encoding technique;
annotating the pre-processed text in each contract document in the training corpus of the plurality of contract documents, to generate labels representing entities within the text in each contract document in the training corpus of the plurality of contract documents; and
training the NER model for extracting the entities in the contract documents by applying machine learning technique(s) to the pre-processed and annotated text from each contract document in the training corpus of the plurality of contract documents.

3. The method as claimed in claim 2, wherein annotating the pre-processed text comprises generating Inside-Outside-Beginning (IOB) labels for the pre-processed text in each contract document in the training corpus of the plurality of contract documents.

4. The method as claimed in claim 2, wherein the machine learning technique(s) comprises at least one of Conditional Random Fields (CRFs), Bidirectional Long Short-Term Memory (BiLSTM), Convolutional Neural Networks (CNN), Embeddings from Language Models (ElMo), Stanford NLP, and transformer-based models.

5. The method as claimed in claim 1, wherein the NER model is configured to be implemented for a group of types of entities, and wherein multiple NER models, configured for different groups of types of entities, are implemented to extract the set of entities in the pre-processed text of the given contract document.

6. The method as claimed in claim 1, wherein the extracted set of entities comprises one or more of: contract parties, contract term, contract jurisdiction, contract effective date, contract type.

7. A system for rendering searchable a database comprising a set of contract documents, the system comprising:
a processing arrangement configured to:
pre-process text of each contract document in the set of contract documents as per requirements of a Named Entity Recognition (NER) model, wherein the NER model is trained on a training corpus of a plurality of contract documents and configured for recognizing entities in text of contract documents;
extract a set of entities from the pre-processed text of each contract document in the set of contract documents by implementing the NER model; and
associate the respective extracted set of entities to corresponding one of each contract document in the set of contract documents; and
an interface configured to receive a first input comprising one or more keywords for searching of a desired contract document in the set of contract documents,
wherein the processing arrangement is further configured to:
compare the received one or more keywords with the sets of entities associated with the contract documents in the set of contract documents;
identify one or more contract documents in the set of contract documents based on match between the received one or more keywords and corresponding associated one or more entities in the sets of entities;
generate a first output comprising a list of the identified one or more contract documents, and
wherein the interface is further configured to display the list of the identified one or more contract documents.

8. The system as claimed in claim 7, wherein the interface is further configured to receive a second input for selection of one of contract documents from the displayed list of the identified one or more contract documents.

9. The system as claimed in claim 8, wherein the processing arrangement is further configured to generate a second output comprising an identified respective set of entities corresponding to the said one of contract documents based on the received second input for the selection of one of contract documents.

10. The system as claimed in claim 9, wherein the interface is further configured to display the identified respective set of entities corresponding to the said one of contract documents.

11. The system as claimed in claim 7, wherein the interface is further configured to:
provide a filter with a list of entities based on the associated entities to the contract document in the set of contract documents; and
receive a third input for selection of one of entities from the list of entities.

12. The system as claimed in claim 11, wherein the processing arrangement is further configured to:
filter one or more contract documents in the set of contract documents based on a match between the received third input for selection of one of entities from the list of entities and the corresponding associated one or more entities in the sets of entities; and
generate a third output comprising a list of the filtered one or more contract documents.

13. The system as claimed in claim 12, wherein the interface is further configured to display the list of the filtered one or more contract documents.

14. The system as claimed in claim 13, wherein the interface is further configured to display a portion of text of each one of the filtered one or more contract documents being displayed, wherein the said portion of text comprises at least one of the entities corresponding to the received third input for selection of one of entities.

15. A computer program comprising computer executable program code, when executed the program code causes a computing arrangement to perform the method according to any one of claims 1-6.

Documents

Application Documents

#	Name	Date
1	202341064475-FORM FOR STARTUP [26-09-2023(online)].pdf	2023-09-26
2	202341064475-FORM FOR SMALL ENTITY(FORM-28) [26-09-2023(online)].pdf	2023-09-26
3	202341064475-FORM 1 [26-09-2023(online)].pdf	2023-09-26
4	202341064475-FIGURE OF ABSTRACT [26-09-2023(online)].pdf	2023-09-26
5	202341064475-EVIDENCE FOR REGISTRATION UNDER SSI(FORM-28) [26-09-2023(online)].pdf	2023-09-26
6	202341064475-EVIDENCE FOR REGISTRATION UNDER SSI [26-09-2023(online)].pdf	2023-09-26
7	202341064475-DRAWINGS [26-09-2023(online)].pdf	2023-09-26
8	202341064475-DECLARATION OF INVENTORSHIP (FORM 5) [26-09-2023(online)].pdf	2023-09-26
9	202341064475-COMPLETE SPECIFICATION [26-09-2023(online)].pdf	2023-09-26
10	202341064475-Proof of Right [06-12-2023(online)].pdf	2023-12-06
11	202341064475-FORM-26 [22-12-2023(online)].pdf	2023-12-22