Method And System To Extract And Analyse An Information

< Back

Method And System To Extract And Analyse An Information

Abstract: A system and method for providing one or more recommendations. The method encompasses transforming, one or more Curriculum Vitae (CVs) in a corresponding HTML format comprising HTML element(s). The method thereafter comprises extracting, section(s) from the CV(s) based on an analysis of the HTML element(s) present in the corresponding HTML format of the CV(s). Further the method encompasses extracting, information from the section(s) using at least one of a set of rules, a pre-trained data set, a layout based sectional identification, a layout and text based parsing, one or more natural language processing techniques and one or more patterns associated with the one or more sections. The method thereafter comprises segregating, the information into an important and/or a non-important information based on a work history context associated with the CV(s) and/or normalized job title(s), to provide the one or more recommendations based on the important information.

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #

Filing Date

20 March 2021

Publication Number

52/2022

Publication Type

INA

Invention Field

BIO-MEDICAL ENGINEERING

Status

Parent Application

Applicants

HT Media Ltd

18-20, HT House, KG Marg, New Delhi- 110001

Inventors

1. Madhukar Kumar

L 1401, JMD Garden, Sohna Road, Sector 33, Gurgaon, Haryana - 122001

2. Krishan Kumar Arya

3/919, G2, Sector-3, Vasundhara, Ghaziabad, Uttar Pradesh - 201012

3. Kumar Harsh

Flat no. 13 B, Kailash apartment, behind Rajeshwar hospital, Tilak Nagar, kankarbagh, Patna, Bihar - 800020

4. Narender Singh

B1/144 Gali no 3, Near Main Shiv Market, Vijay Enclave Dwarka Palam Road, Delhi - 110045

5. Shubham Bharadwaj

W-56 Regency Park 2, DLF Phase 4, Gurgaon, Haryana - 122001

6. Aman Kumar Sawarn

C P N Colony, Club Road, Mithanpura, Muzaffarpur, Bihar - 842002

Claims

2. The method as claimed in claim 1, wherein each format from the one or more formats is one of a doc format, docx format, pdf format and rtf format.

3. The method as claimed in claim 1, wherein extracting, by the extraction unit [106], one or more information from the one or more sections further comprises: generating, by the extraction unit [106], one or more sentences for each of the one or more sections using at least one of the set of rules, the pre-trained data set, a density parameter and an analysis of the one or more CVs, and extracting, by the extraction unit [106], the one or more information from the one or more sections based at least on the one or more sentences.

4. The method as claimed in claim 3, wherein the set of rules comprises one or more rules to divide each section from the one or more sections into one or more sentences based on one or more field information stored in one or more data libraries, and wherein the one or more data libraries comprises: at least one of a manually curated and an automatically curated data for one or more fields, and a trie based data structure.

5. The method as claimed in claim 3, wherein the pre-trained data set comprises at least a trained data associated with multiple sentences.

6. A system of providing one or more recommendations, the system comprising: a transceiver unit [102], configured to receive, one or more Curriculum Vitae (CVs) wherein the one or more Curriculum Vitae (CVs) are received in one or more formats; a processing unit [104], configured to transform, each CV from the one or more CVs in a corresponding HTML format comprising one or more HTML elements; and an extraction unit [106], configured to extract: one or more sections from the one or more CVs based on an analysis of the one or more HTML elements present in the corresponding HTML format of the one or more CVs, and one or more information from the one or more sections using at least one of a set of rules, a pre-trained data set, a layout based sectional identification, a layout and text based parsing, one or more natural language processing techniques and one or more patterns associated with the one or more sections, wherein the processing unit [104] is further configured to: segregate, each information from the one or more information associated with the one or more sections into one of an important information and a non-important information based on at least one of a work history context associated with the one or more CVs and one or more normalized job titles, and provide, the one or more recommendations based on the important information.

7. The system as claimed in claim 6, wherein each format from the one or more formats is one of a doc format, docx format, pdf format and rtf format.

8. The system as claimed in claim 6, the extraction unit [106] to extract the one or more information from the one or more sections is configured to: generate, one or more sentences for each of the one or more sections using at least one of the set of rules, the pre-trained data set, a density parameter and an analysis of the one or more CVs, and extract, the one or more information from the one or more sections based at least on the one or more sentences.

9. The system as claimed in claim 8, wherein the set of rules comprises one or more rules to divide each section from the one or more sections into one or more sentences based on one or more field information stored in one or more data libraries, and wherein the one or more data libraries comprises: at least one of a manually curated and an automatically curated data for one or more fields, and a trie based data structure.

10. The system as claimed in claim 8, wherein the pre-trained data set comprises at least a trained data associated with multiple sentences.

Specification

TECHNICAL FIELD:
The present invention generally relates to the field of data management and more particularly to systems and methods for providing one or more recommendations based on an extraction and analysis of an information from one or more Curriculum Vitae (CVs).
BACKGROUND OF THE DISCLOSURE:
The following description of the related art is intended to provide background information pertaining to the field of the disclosure. This section may include certain aspects of the art that may be related to various features of the present disclosure. However, it should be appreciated that this section is used only to enhance the understanding of the reader with respect to the present disclosure, and not as admissions of the prior art.
The Curriculum Vitae (CV)/resume of candidates is a very important document that plays an important role in analyzing various parameters such as skills, work experience, educational qualifications of the candidates. Further, with the advancement in the digital technologies it is now possible to manage a huge number of CVs via a digital platform but it is really challenging to analyze each CV to extract a relevant information. The analysis and extraction of the relevant information from a CV may further be used to provide various recommendations to recruiters and/or job seekers, but as the resume/CV data is unstructured, it is difficult to digitally gather relevant data out of such unstructured data. Also, manually analyzing and gathering relevant data from the huge number of CVs is not possible.
Furthermore, with the advancement of the digital technologies it is now possible to store a CV in different formats such as a pdf file, a doc file etc. Therefore there are high chances that the huge number of CVs uploaded on a database of a digital platform may have different formats and hence the extraction of

information and analysis of such CVs becomes a complex and time consuming process.
Therefore, in order to at least provide one or more relevant recommendations to at least one of the recruiters and the candidates, there is a need in the art to provide a solution to overcome the problems related to extraction and parsing of relevant data from CVs and to provide a solution to efficiently analyze and extract relevant data from one or more CVs having unstructured data and different file formats.
Although the existing technologies have provided various solutions to extract and analyze information from the CV/s but these currently known solutions have many limitations and fail to provide recommendation(s) based on an efficient and effective analysis and extraction of relevant information from CV(s) and therefore, there is a need for improvement in this area of technology.
SUMMARY OF THE DISCLOSURE
This section is provided to introduce certain objects and aspects of the present invention in a simplified form that are further described below in the detailed description. This summary is not intended to identify the key features or the scope of the claimed subject matter.
In order to overcome at least some of the drawbacks mentioned in the previous section and those otherwise known to persons skilled in the art, an object of the present invention is to provide a method and system for providing one or more recommendations based on an extraction and analysis of an information from one or more Curriculum Vitae (CVs). Another object of present invention is to extract a relevant information from different sections such as Designation, Skills etc. of one or more CVs having unstructured data and different file formats. Also an object of present invention is to convert an unstructured form of a data of one or more CVs into a structured form such that the structured data can be used across various domains like Search, Recommendations, CV Styler etc.

Another object of present invention is to convert a resume data into a structured form irrespective of its file format. Also, an object of present invention is to identify important and scalable information out of the resume(s). Yet another object of present invention is to segregate various parameters such as important and non-important skills, past employer and current employer, client name and company name etc. form the CV(s), at least to provide various recommendations.
Furthermore, in order to achieve the aforementioned objectives, the present invention provides a method and system for providing one or more recommendations.
A first aspect of the present invention relates to the method for providing one or more recommendations. The method comprises receiving, by a transceiver unit, one or more Curriculum Vitae (CVs) wherein the one or more Curriculum Vitae (CVs) are received in one or more formats. The method further comprises transforming, by a processing unit, each CV from the one or more CVs in a corresponding HTML format comprising one or more HTML elements. The method thereafter comprises extracting, by an extraction unit, one or more sections from the one or more CVs based on an analysis of the one or more HTML elements present in the corresponding HTML format of the one or more CVs. Further the method encompasses extracting, by the extraction unit, one or more information from the one or more sections using at least one of a set of rules, a pre-trained data set, a layout based sectional identification, a layout and text based parsing, one or more natural language processing techniques and one or more patterns associated with the one or more sections. The method thereafter comprises segregating, by the processing unit, each information from the one or more information associated with the one or more sections into one of an important information and a non-important information based on at least one of a work history context associated with the one or more CVs and one or more normalized job titles. Thereafter the method comprises providing, by the processing unit, the one or more recommendations based on the important information.

Another aspect of the present invention relates to a system for providing one or more recommendations. The system comprises a transceiver unit, configured to receive, one or more Curriculum Vitae (CVs) wherein the one or more Curriculum Vitae (CVs) are received in one or more formats. The system also comprises a processing unit, configured to transform, each CV from the one or more CVs in a corresponding HTML format comprising one or more HTML elements. Also, the system comprises an extraction unit, configured to extract one or more sections from the one or more CVs based on an analysis of the one or more HTML elements present in the corresponding HTML format of the one or more CVs. Further the extraction unit is configured to extract one or more information from the one or more sections using at least one of a set of rules, a pre-trained data set, a layout based sectional identification, a layout and text based parsing, one or more natural language processing techniques and one or more patterns associated with the one or more sections. Also, the processing unit is further configured to: segregate, each information from the one or more information associated with the one or more sections into one of an important information and a non-important information based on at least one of a work history context associated with the one or more CVs and one or more normalized job titles, and provide, the one or more recommendations based on the important information.
BRIEF DESCRIPTION OF DRAWINGS
The accompanying drawings, which are incorporated herein, and constitute a part of this disclosure, illustrate exemplary embodiments of the disclosed methods and systems in which like reference numerals refer to the same parts throughout the different drawings. Components in the drawings are not necessarily to scale, emphasis instead being placed upon clearly illustrating the principles of the present disclosure. Some drawings may indicate the components using block diagrams and may not represent the internal circuitry of each component. It will be appreciated by those skilled in the art that disclosure of such drawings includes disclosure of electrical components, electronic components or circuitry commonly used to implement such components.

Figure 1 illustrates an exemplary block diagram of a system [100] for providing one or more recommendations, in accordance with exemplary embodiments of the present invention.
Figure 2 illustrates an exemplary method flow diagram [200], for providing one or more recommendations, in accordance with exemplary embodiments of the present invention.
The foregoing shall be more apparent from the following more detailed description of the disclosure.
DESCRIPTION OF THE INVENTION
In the following description, for the purposes of explanation, various specific details are set forth in order to provide a thorough understanding of embodiments of the present disclosure. It will be apparent, however, that embodiments of the present disclosure may be practiced without these specific details. Several features described hereafter can each be used independently of one another or with any combination of other features. An individual feature may not address any of the problems discussed above or might address only some of the problems discussed above.
The ensuing description provides exemplary embodiments only, and is not intended to limit the scope, applicability, or configuration of the disclosure. Rather, the ensuing description of the exemplary embodiments will provide those skilled in the art with an enabling description for implementing an exemplary embodiment. It should be understood that various changes may be made in the function and arrangement of elements without departing from the spirit and scope of the disclosure as set forth.
Specific details are given in the following description to provide a thorough understanding of the embodiments. However, it will be understood by one of ordinary skill in the art that the embodiments may be practiced without these specific details. For example, circuits, systems, processes, and other components

may be shown as components in block diagram form in order not to obscure the embodiments in unnecessary detail.
Also, it is noted that individual embodiments may be described as a process which is depicted as a flowchart, a flow diagram, a data flow diagram, a structure diagram, or a block diagram. Although a flowchart may describe the operations as a sequential process, many of the operations can be performed in parallel or concurrently. In addition, the order of the operations may be re-arranged. A process is terminated when its operations are completed but could have additional steps not included in a figure.
The word "exemplary" and/or "demonstrative" is used herein to mean serving as an example, instance, or illustration. For the avoidance of doubt, the subject matter disclosed herein is not limited by such examples. In addition, any aspect or design described herein as "exemplary" and/or "demonstrative" is not necessarily to be construed as preferred or advantageous over other aspects or designs, nor is it meant to preclude equivalent exemplary structures and techniques known to those of ordinary skill in the art. Furthermore, to the extent that the terms "includes," "has," "contains," and other similar words are used in either the detailed description or the claims, such terms are intended to be inclusive—in a manner similar to the term "comprising" as an open transition word—without precluding any additional or other elements.
As used herein, a "processing unit" or "processor" or "operating processor" includes one or more processors, wherein processor refers to any logic circuitry for processing instructions. A processor may be a general-purpose processor, a special purpose processor, a conventional processor, a digital signal processor, a plurality of microprocessors, one or more microprocessors in association with a DSP core, a controller, a microcontroller, Application Specific Integrated Circuits, Field Programmable Gate Array circuits, any other type of integrated circuits, a graphics processing unit etc. The processor may perform signal coding data processing, input/output processing, and/or any other functionality that enables

the working of the system according to the present disclosure. More specifically, the processor or processing unit is a hardware processor. Furthermore, to execute certain operations, the processing unit/processor as disclosed in the present disclosure may include one or more Central Processing Unit (CPU) and one or more Graphics Processing Unit (GPU), selected based on said certain operations. Furthermore, the graphics processing unit (GPU) is a specialized electronic circuit designed to rapidly manipulate and alter a memory to accelerate a creation of images in a frame buffer intended for output to a display device. The GPUs have a highly parallel structure which makes them more efficient than the general-purpose central processing units (CPUs) to process large blocks of data in parallel in use cases that includes modern deep learning algorithm which functions on vectors and tensors and have millions of parameters to process. Such use cases generally requires the parallel processing capability of the GPU to reduce the computation by even 20 times compared to the CPU. In an implementation in the present invention, the Central Processing Unit (CPU) and/or the Graphics Processing Unit (GPU) are implemented at least for general processing (such as processing of one or more rules, data extraction and analysis etc.) and natural language processing techniques, respectively.
As used herein, "a user equipment", "a user device", "a smart-user-device", "a smart-device", "an electronic device", "a mobile device", "a handheld device", "a wireless communication device", "a mobile communication device", "a communication device" may be any electrical, electronic and/or computing device or equipment, capable of implementing the features of the present disclosure. The user equipment/device may include, but is not limited to, a mobile phone, smart phone, laptop, a general-purpose computer, desktop, personal digital assistant, tablet computer, wearable device or any other computing device which is capable of implementing the features of the present disclosure. Also, the user device may contain at least one input means configured to receive an input from a transceiver unit, a processing unit, a storage unit and

any other such unit(s) which are required to implement the features of the present disclosure.
As used herein, "storage unit" or "memory unit" refers to a machine or computer-readable medium including any mechanism for storing information in a form readable by a computer or similar machine. For example, a computer-readable medium includes read-only memory ("ROM"), random access memory ("RAM"), magnetic disk storage media, optical storage media, flash memory devices or other types of machine-accessible storage media. The storage unit stores at least the data that may be required by one or more units of the system to perform their respective functions.
As used herein, a "transceiver unit" may comprise one or more transmitter units and one or more receiver units, configured to transmit and receive respectively, at least one of one or more signals, data and commands from various units/modules of the system/server/user device to implement the features of the present disclosure. The transceiver unit may be any such transmitting and receiving unit known to the person skilled in the art, to implement the features of the present disclosure.
As disclosed in the background section, existing technologies have many limitations and in order to overcome at least some of the limitations of the prior known solutions, the present disclosure provides a solution for efficiently and effectively providing one or more recommendations based on an extraction and analysis of an information from one or more Curriculum Vitae (CVs). More specifically, the solution as disclosed in the present disclosure encompasses use of a data extracted from sections like personal, professional, education & skills etc. of the CV(s) to generate a relevant data, wherein such relevant data is further used to provide the one or more recommendations. In an implementation as used herein a 'recommendation' may comprise details of one or more relevant candidates (i.e., the one or more candidates having details similar to the relevant data) and said recommendation may be provided on a

user device of one or more recruiters. In another implementation as used herein a 'recommendation' may comprise details of one or more relevant jobs (i.e., the one or more jobs having details similar to the relevant data) and said recommendation may be provided on a user device of one or more candidates (job seekers). Therefore, the relevant data (i.e., an important data) also helps in improving search experience of recruiters and/or candidates.
Furthermore, for extraction and analysis of the information/data from the section(s) of the one or more CVs, the present disclosure discloses use of technologies for instance such as including but not limited to custom transformer-based architectures, various rules based on trie based library lookups, regex based matching, grammar based extraction to extract & parse CV/resume data, parallel execution of modules/units etc. Further, to segregate the data extracted from the section(s) of the CV(s) into one of the relevant and irrelevant data, the present disclosure discloses use of work history context associated with the CV(s), normalized job title(s) and such similar data. The solution as disclosed in the present invention has capability to parse all formats of CVs and identification of relevant/important information (such as Job Skills etc.) along with segregation of important & non-important information (for instance skills) out of the CVs.
The present invention therefore provides a novel solution of providing one or more recommendations. The present invention also provides a technical advancement over the currently known solutions at least by: providing novel techniques for parsing and extracting information from one or more CVs having different file formats, generating structured data from the unstructured data of CV(s), reducing manual efforts in identification and extraction of relevant data from huge number of CVs. Also, based on the features of the present disclosure a reduction in manual steps for a candidate during profile creation/ updation is achieved. Also, during searching, use of relevant/important data as generated in the present solution (such as important skills, last 2 companies and designation etc.), can improve search experience for recruiters. The implementation of the

features of the present disclosure provides structured information which further results into better recommendations of jobs to the candidate(s) basis their parsed structured information from resume like important skills, organizations, highest degree etc. Also, the features of the present disclosure results into better recommendations of candidates to the recruiter(s) basis mapping of a job description to a parsed structured information of the CV(s) of the candidate(s).
Hereinafter, exemplary embodiments of the present disclosure will be described in detail with reference to the accompanying drawings so that those skilled in the art can easily carry out the present disclosure.
Referring to Figure 1, an exemplary block diagram of a system [100] for providing one or more recommendations is shown. The system [100] comprises at least one transceiver unit [102], at least one processing unit [104], at least one extraction unit [106] and at least one storage unit [108]. Also, all of the components/ units of the system [100] are assumed to be connected to each other unless otherwise indicated below. Also, in Fig. 1 only a few units are shown, however, the system [100] may comprise multiple such units or the system [100] may comprise any such numbers of said units, as required to implement the features of the present disclosure. Further, in an implementation, the system [100] may be present in a server device to implement the features of the present invention.
The system [100] is configured to provide one or more recommendations, with the help of the interconnection between the components/units of the system [100].
The transceiver unit [102] of the system [100] is connected to the at least one processing unit [104], the at least one extraction unit [106] and the at least one storage unit [108]. The transceiver unit [102] is configured to receive, one or more Curriculum Vitae (CVs) wherein the one or more Curriculum Vitae (CVs) are received in one or more formats. Also, each format from the one or more formats is one of a doc format, docx format, pdf format and rtf format, but the

same is not limited thereto and there may be any format of the CV that is obvious to a person skilled in the art. Also, the transceiver unit [102] is configured to receive the one or more CVs from at least one of the storage unit [108], one or more cloud storages and/or one or more external storage units.
Further, the transceiver unit [102] is configured to transmit the received one or more CVs to the processing unit [104] of the system [100]. The processing unit [104] is further connected to the extraction unit [106] and the storage unit [108]. Also, after receiving the one or more CVs, the processing unit [104] is configured to analyse each CV of the one or more CVs. Thereafter based on the analysis of the one or more CVs, the processing unit [104] is configured to transform each CV from the one or more CVs in a corresponding HTML format comprising one or more HTML elements. In an implementation the processing unit [104] that is configured to analyse the one or more CVs, to convert the one or more CVs into the corresponding HTML format is a Central Processing Unit (CPU).
Further, the processing unit [104] is also configured to provide the one or more CVs in their corresponding HTML format to the extraction unit [106]. In an implementation the extraction unit [106] is same as that of a processing unit that is obvious to a person skilled in the art to implement the functions of the present disclosure. After receiving the one or more CVs in their corresponding HTML format, the extraction unit [106] is configured to analyze the one or more HTML elements present in the corresponding HTML format of the one or more CVs. Thereafter, the extraction unit [106] is configured to identify and extract one or more sections from the one or more CVs based on the analysis of the one or more HTML elements present in the corresponding HTML format of the one or more CVs. In an example a section may be one of a personal details section, an education details section, a professional details section and a skills details section, but the same is not limited thereto.
After extracting the one or more sections from the one or more CVs, the extraction unit [106] is configured to parse the one or more sections to extract

one or more information from the one or more sections. Also, each information of the one or more information may include a data relating to a section from which it is extracted, for instance: an information extracted from a personal details section may include but not limited to a person name (candidate name, father name, spouse name etc.), an email, contact number etc., an information extracted from a professional details section may include but not limited to a work history such as company, designation, duration, job specific skills etc. and skills data such as work specific/generic/important skills etc., and an information extracted from an educational details section may include but not limited to education details such as degree, college, duration etc. More particularly, the extraction unit [106] is configured to extract the one or more information from the one or more sections using at least one of a set of rules, a pre-trained data set, a layout based sectional identification, a layout and text based parsing, one or more natural language processing techniques and one or more patterns associated with the one or more sections, wherein the one or more patterns may be identified based on various use cases. As used herein the layout is an HTML layout having a trie based structure including one or more headings, where each heading represents a type of section, therefore for extraction of one or more information from a CV using the HTML layout (i.e., the one or more headings in the trie based structure), one or more sections are identified and/or parsed. Thereafter based on such identification and/or parsing the one or more information is extracted. Also as used herein the pattern indicates a writing pattern associated with one or more sections present in a CV such as for instance a writing manner (i.e., say words, language etc. used) in which an information in a particular section is captured may be referred as a pattern. In an implementation other techniques such as the techniques that are obvious to a person skilled in the art to extract the one or more information may also be used. Also, the extraction unit [106] configured to process the techniques to extract the one or more information may be a Graphics Processing Unit (GPU).

In an implementation, to extract the one or more information from the one or more sections, the extraction unit [106] is configured to generate, one or more sentences for each of the one or more sections using at least one of the set of rules, the pre-trained data set, a density parameter and an analysis of the one or more CVs. Once the one or more sentences for each of the one or more sections are generated, the extraction unit [106] is then configured to extract the one or more information from said one or more sentences. Also, the set of rules used to generate the one or more sentences to further extract the one or more information comprises one or more rules to divide each section from the one or more sections into the one or more sentences based on one or more field information. The one or more field information is stored in one or more data libraries, wherein the one or more data libraries comprises at least one of a manually curated and an automatically curated data for the one or more fields, and a trie based data structure. Also, each field of the one or more fields may be any field associated with the one or more sections, such as including but not limited to a Company, Designation, Skills, College, Course, City, Names (Person's name) and the like fields obvious to a person skilled in the art. Therefore the one or more data libraries comprises at least one of the manually curated and the automatically curated data for the fields such as Company, Designation, Skills, College, Course, City, Names (Person's name) and the like fields obvious to a person skilled in the art to identify relevant fields/information from the one or more sections for generation of the one or more sentences. Also, the trie based data structure of the one or more data libraries helps for faster lookups and therefore provides technical advancement over known arts. Further, the density parameter indicates a ratio of a total number of a specific type of keywords (such as work experience related keywords etc.) in a section and a total number words in said section. In an implementation a sentence may be generated for a section that is associated with a highest density parameter. And, in an event if for two or more sections of a CV a same density parameter is determined, in the given event a sentence may be generated for the section that has an upper level in trie based structure (i.e., in the HTML format) of the CV. Also, the analysis of the one

or more CVs for the generation of the one or more sentences for the one or more sections may include an analysis of the whole CVs from which the one or more sections are extracted to further extract the one or more information. Furthermore, the pre-trained data set used for the generation of the one or more sentences comprises at least a trained data associated with multiple sentences. In an implementation said multiple sentences may be further associated with a plurality of CVs. More specifically, in said implementation each sentence of the multiple sentences may be manually and/or automatically mapped with one or more entities/fields present in one or more CVs from the plurality of CVs. Further these multiple sentences mapped with the fields may be multi processed and transformed into the trained data comprising of details such as including but not limited to resume/CV index, sentence index, sentence label, entity/field, part of speech (POS) tag columns etc. Also, in an event sentences associated with only non-entities from said multiple sentences may be massively under sampled to create more focus on detecting patterns on occurrence of one or more entities in the trained data. Furthermore in an event a sub-system (such as an ML model) may be fine-tuned using the trained data in order to generate the one or more sentences for each of the one or more sections.
Further, once the one or more sentences are generated for the one or more sections, the extraction unit [106] is configured to extract the one or more information from the one or more sections based at least on the one or more sentences. Also, the extraction unit [106] is thereafter configured to provide the one or more information associated with the one or more sections to the processing unit [104]. The processing unit [104] is then configured to segregate, each information from the one or more information into one of an important information and a non-important information based on at least one of a work history context associated with the one or more CVs and one or more normalized job titles. The one or more job titles comprises one or more titles associated with one or more jobs and the one or more job titles may be normalized by incorporating synonyms, identifying and correcting misspellings and other

variations along with the one or more job titles such as skill mapping and the like normalization techniques that are obvious to a person skilled in the art. Also, the work history context associated with the one or more CVs may comprise one or more historical details related to one or more work profiles indicated in the one or more CVs. Therefore, the processing unit [104] is configured to segregate, each information from the one or more information into one of the important information and the non-important information based on the one or more historical details related to the one or more work profiles indicated in the one or more CVs and the one or more normalized titles associated with one or more jobs. For example, if a normalized job title is 'Manager for ABC profile with 5 years' experience', a work history context comprises a historical work detail related to ABC profile and an information extracted from a professional details section of a CV indicates a work history - ABC Manager from last 5 years, the processing unit [104] in the given example is configured to segregate the information (i.e., the work history - ABC Manager from last 5 years) as an important information based on the matching details in the work history context (i.e., historical work detail related to ABC profile) and the normalized job title (i.e., Manager for ABC profile with 5 years' experience).
After segregating the one or more information associated with the one or more sections into one of the important information and the non-important information, the processing unit [104] is further configured to provide, the one or more recommendations based on the important information. In an implementation as used herein the 'recommendation' may comprise details of one or more relevant candidates (i.e., the one or more candidates having details similar to the important information) and said recommendation may be provided on the user device of one or more recruiters. In another implementation as used herein the 'recommendation' may comprise details of one or more relevant jobs (i.e., the one or more jobs having details similar to the important information) and said recommendation may be provided on the user device of one or more candidates (job seekers). For instance in the above example where the

information i.e., the work history - ABC Manager from last 5 years is segregated as the important information, the processing unit [104] may be further configured to provide: 1) a recommendation such as a notification comprising of details from one or more CVs of one or more candidates having similar details as that of the work history - ABC Manager from last 5 years (i.e., the important information), on the user device of the one or more recruiters and/or 2) a recommendation such as a notification comprising of details from one or more job openings having similar details as that of the work history - ABC Manager from last 5 years (i.e., the important information), on the user device of the one or more candidates.
Therefore, the present disclosure provides a novel and inventive solution of providing the one or more recommendations based on the extraction and analysis of the information from the one or more Curriculum Vitae (CVs).
Referring to Figure 2 an exemplary method flow diagram [200], for providing one or more recommendations, in accordance with exemplary embodiments of the present invention is shown. In an implementation the method is performed by the system [100]. Further, in an implementation, the system [100] may be present in a server device to implement the features of the present invention. Also, as shown in Figure 2, the method starts at step [202].
At step [204] the method comprises receiving, by the transceiver unit [102], one or more Curriculum Vitae (CVs) wherein the one or more Curriculum Vitae (CVs) are received in one or more formats. Also, each format from the one or more formats is one of a doc format, docx format, pdf format and rtf format, but the same is not limited thereto and there may be any format of the CV that is obvious to a person skilled in the art. Also, the transceiver unit [102] receives the one or more CVs from at least one of the storage unit [108], the one or more cloud storages and/or the one or more external storage units. Further, the method comprises transmitting by the transceiver unit [102], the received one or more CVs to the processing unit [104] of the system [100].

Next at step [206] after receiving the one or more CVs at the processing unit [104], the method comprises transforming, by the processing unit [104], each CV from the one or more CVs in a corresponding HTML format comprising one or more HTML elements, wherein said transformation is based on an analysis of each CV of the one or more CVs by the processing unit [104]. In an implementation the processing unit [104] that analyses the one or more CVs, to convert the one or more CVs into the corresponding HTML format is a Central Processing Unit (CPU).
Further, the one or more CVs in their corresponding HTML format are provided to the extraction unit [106] by the processing unit [104]. In an implementation the extraction unit [106] is same as that of a processing unit that is obvious to a person skilled in the art to implement the functions of the present disclosure. Next, at step [208] the method comprises identifying and extracting, by the extraction unit [106], one or more sections from the one or more CVs based on an analysis of the one or more HTML elements present in the corresponding HTML format of the one or more CVs. In an example a section may be one of a personal details section, an education details section, a professional details section and a skills details section, but the same is not limited thereto and there may be any section that may be present in a CV and is obvious to a person skilled in the art.
After extracting the one or more sections from the one or more CVs, at step [210] the method comprises extracting, by the extraction unit [106], one or more information from the one or more sections using at least one of a set of rules, a pre-trained data set, a layout based sectional identification, a layout and text based parsing, one or more natural language processing techniques and one or more patterns associated with the one or more sections, wherein the one or more patterns may be identified based on various use cases. As used herein the layout is an HTML layout having a trie based structure including one or more headings, where each heading represents a type of section, therefore for extraction of one or more information from a CV using the HTML layout (i.e., the

one or more headings in the trie based structure), one or more sections are identified and/or parsed. Thereafter based on such identification and/or parsing the one or more information is extracted. Also as used herein the pattern indicates a writing pattern associated with one or more sections present in a CV such as for instance a writing manner (i.e., say words, language etc. used) in which an information in a particular section is captured may be referred as a pattern. Also, each information of the one or more information may include a data relating to a section from which it is extracted, for instance: an information extracted from a personal details section may include but not limited to a person name (candidate name, father name, spouse name etc.), an email, contact number etc., an information extracted from a professional details section may include but not limited to a work history such as company, designation, duration, job specific skills etc. and skills data such as work specific/generic/important skills etc., and an information extracted from an educational details section may include but not limited to education details such as degree, college, duration etc. In an implementation other techniques such as the techniques that are obvious to a person skilled in the art to extract the one or more information may also be used. Also, the extraction unit [106] that processes the techniques to extract the one or more information may be a Graphics Processing Unit (GPU).
Also, in an implementation the process of extracting, by the extraction unit [106], the one or more information from the one or more sections comprises generating, by the extraction unit [106], one or more sentences for each of the one or more sections using at least one of the set of rules, the pre-trained data set, a density parameter and an analysis of the one or more CVs. Further in the given implementation once the one or more sentences for each of the one or more sections are generated, said process encompasses extracting, by the extraction unit [106], the one or more information from the one or more sections based at least on the one or more sentences. Also, the set of rules used to generate the one or more sentences to further extract the one or more information comprises one or more rules to divide each section from the one or

more sections into the one or more sentences based on one or more field information. The one or more field information is stored in one or more data libraries, wherein the one or more data libraries comprises at least one of a manually curated and an automatically curated data for the one or more fields, and a trie based data structure. Also, each field of the one or more fields may be any field associated with the one or more sections, such as including but not limited to a Company, Designation, Skills, College, Course, City, Names (Person's name) and the like fields obvious to a person skilled in the art. Therefore the one or more data libraries comprises at least one of the manually curated and the automatically curated data for the fields such as Company, Designation, Skills, College, Course, City, Names (Person's name) and the like fields obvious to a person skilled in the art to identify relevant fields/information from the one or more sections for generation of the one or more sentences. Also, the trie based data structure of the one or more data libraries helps for faster lookups and therefore provides technical advancement over known arts. Further, the density parameter indicates a ratio of a total number of a specific type of keywords (such as work experience related keywords etc.) in a section and a total number words in said section. In an implementation a sentence may be generated for a section that is associated with a highest density parameter. And, in an event if for two or more sections of a CV a same density parameter is determined, in the given event a sentence may be generated for the section that has an upper level in trie based structure (i.e., in the HTML format) of the CV. Also, the analysis of the one or more CVs for the generation of the one or more sentences for the one or more sections may include an analysis of the whole CVs from which the one or more sections are extracted to further extract the one or more information. Furthermore, the pre-trained data set used for the generation of the one or more sentences comprises at least a trained data associated with multiple sentences. In an implementation said multiple sentences may be further associated with a plurality of CVs. More specifically, in said implementation each sentence of the multiple sentences may be manually and/or automatically mapped with one or more entities/fields present in one or more CVs from the

plurality of CVs. Further these multiple sentences mapped with the fields may be multi processed and transformed into the trained data comprising of details such as including but not limited to resume/CV index, sentence index, sentence label, entity/field, part of speech (POS) tag columns etc. Also, in an event sentences associated with only non-entities from said multiple sentences may be massively under sampled to create more focus on detecting patterns on occurrence of one or more entities in the trained data. Furthermore in an event a sub-system (such as an ML model) may be fine-tuned using the trained data in order to generate the one or more sentences for each of the one or more sections.
Further, the method comprises providing by the extraction unit [106] the one or more information associated with the one or more sections to the processing unit [104]. Thereafter, at step [212] the method comprises segregating, by the processing unit [104], each information from the one or more information associated with the one or more sections into one of an important information and a non-important information based on at least one of a work history context associated with the one or more CVs and one or more normalized job titles. The one or more job titles comprises one or more titles associated with one or more jobs and the one or more job titles may be normalized by incorporating synonyms, identifying and correcting misspellings and other variations along with the one or more job titles such as skill mapping and the like normalization techniques that are obvious to a person skilled in the art. Also, the work history context associated with the one or more CVs may comprise one or more historical details related to one or more work profiles indicated in the one or more CVs. Therefore, the processing unit [104] segregates each information from the one or more information into one of the important information and the non-important information based on the one or more historical details related to the one or more work profiles indicated in the one or more CVs and the one or more normalized titles associated with one or more jobs. For example, if a normalized job title is 'Sr. Manager for CBA profile with 5 years' experience', a work history context comprises a historical work detail related to CBA profile and an

information extracted from a professional details section of a CV indicates a work history - CBA Expert from last 4+ years, the processing unit [104] in the given example segregates the information (i.e., the work history - CBA Expert from last 4+ years) as an important information based on the matching details in the work history context (i.e., historical work detail related to CBA profile) and the normalized job title (i.e., Sr. Manager for CBA profile with 5 years' experience).
Next after segregating the one or more information associated with the one or more sections into one of the important information and the non-important information at step [214] the method comprises providing, by the processing unit [104], the one or more recommendations based on the important information. In an implementation as used herein the 'recommendation' may comprise details of one or more relevant candidates (i.e., the one or more candidates having details similar to the important information) and said recommendation may be provided on the user device of one or more recruiters. In another implementation as used herein the 'recommendation' may comprise details of one or more relevant jobs (i.e., the one or more jobs having details similar to the important information) and said recommendation may be provided on the user device of one or more candidates (job seekers). For instance in the above example where the information i.e., the work history - CBA Expert from last 4+ years is segregated as the important information, the processing unit [104] may provide: 1) a recommendation such as a notification comprising of details from one or more CVs of one or more candidates having similar details as that of the work history - CBA Expert from last 4+ years (i.e., the important information), on the user device of the one or more recruiters and/or 2) a recommendation such as a notification comprising of details from one or more job openings having similar details as that of the work history - CBA Expert from last 4+ years (i.e., the important information), on the user device of the one or more candidates.
After providing the one or more recommendations, the method terminates at step [216].

Thus, the present invention therefore provides a novel solution of providing one or more recommendations. The present invention also provides a technical advancement over the currently known solutions at least by: providing novel techniques for parsing and extracting information from one or more CVs having different file formats, generating structured data from the unstructured data of CV(s), reducing manual efforts in identification and extraction of relevant data from huge number of CVs. Also, based on the features of the present disclosure a reduction in manual steps for a candidate during profile creation/ updation is achieved. Also, during searching, use of relevant/important data as generated in the present solution (such as important skills, last 2 companies and designation etc.), can improve search experience for recruiters. The implementation of the features of the present disclosure provides structured information which further results into better recommendations of jobs to the candidate(s) basis their parsed structured information from resume like important skills, organizations, highest degree etc. Also, the features of the present disclosure results into better recommendations of candidates to the recruiter(s) basis mapping of a job description to a parsed structured information of the CV(s) of the candidate(s).
While considerable emphasis has been placed herein on the preferred embodiments, it will be appreciated that many embodiments can be made and that many changes can be made in the preferred embodiments without departing from the principles of the invention. These and other changes in the preferred embodiments of the invention will be apparent to those skilled in the art from the disclosure herein, whereby it is to be distinctly understood that the foregoing descriptive matter to be implemented merely as illustrative of the invention and not as limitation.

We Claim:

1. A method of providing one or more recommendations, the method comprising:
receiving, by a transceiver unit [102], one or more Curriculum Vitae (CVs) wherein the one or more Curriculum Vitae (CVs) are received in one or more formats;
transforming, by a processing unit [104], each CV from the one or more CVs in a corresponding HTML format comprising one or more HTML elements;
extracting, by an extraction unit [106], one or more sections from the one or more CVs based on an analysis of the one or more HTML elements present in the corresponding HTML format of the one or more CVs;
extracting, by the extraction unit [106], one or more information from the one or more sections using at least one of a set of rules, a pre-trained data set, a layout based sectional identification, a layout and text based parsing, one or more natural language processing techniques and one or more patterns associated with the one or more sections;
segregating, by the processing unit [104], each information from the one or more information associated with the one or more sections into one of an important information and a non-important information based on at least one of a work history context associated with the one or more CVs and one or more normalized job titles; and
providing, by the processing unit [104], the one or more recommendations based on the important information.

2. The method as claimed in claim 1, wherein each format from the one or more formats is one of a doc format, docx format, pdf format and rtf format.
3. The method as claimed in claim 1, wherein extracting, by the extraction unit [106], one or more information from the one or more sections further comprises:
generating, by the extraction unit [106], one or more sentences for each of the one or more sections using at least one of the set of rules, the pre-trained data set, a density parameter and an analysis of the one or more CVs, and
extracting, by the extraction unit [106], the one or more information from the one or more sections based at least on the one or more sentences.
4. The method as claimed in claim 3, wherein the set of rules comprises one
or more rules to divide each section from the one or more sections into
one or more sentences based on one or more field information stored in
one or more data libraries, and wherein the one or more data libraries
comprises:
at least one of a manually curated and an automatically curated data for one or more fields, and
a trie based data structure.
5. The method as claimed in claim 3, wherein the pre-trained data set comprises at least a trained data associated with multiple sentences.
6. A system of providing one or more recommendations, the system comprising:

a transceiver unit [102], configured to receive, one or more Curriculum Vitae (CVs) wherein the one or more Curriculum Vitae (CVs) are received in one or more formats;
a processing unit [104], configured to transform, each CV from the one or more CVs in a corresponding HTML format comprising one or more HTML elements; and
an extraction unit [106], configured to extract:
one or more sections from the one or more CVs based on an analysis of the one or more HTML elements present in the corresponding HTML format of the one or more CVs, and
one or more information from the one or more sections using at least one of a set of rules, a pre-trained data set, a layout based sectional identification, a layout and text based parsing, one or more natural language processing techniques and one or more patterns associated with the one or more sections, wherein the processing unit [104] is further configured to:
segregate, each information from the one or more information associated with the one or more sections into one of an important information and a non-important information based on at least one of a work history context associated with the one or more CVs and one or more normalized job titles, and
provide, the one or more recommendations based on the important information.
7. The system as claimed in claim 6, wherein each format from the one or more formats is one of a doc format, docx format, pdf format and rtf format.

8. The system as claimed in claim 6, the extraction unit [106] to extract the
one or more information from the one or more sections is configured to:
generate, one or more sentences for each of the one or more sections using at least one of the set of rules, the pre-trained data set, a density parameter and an analysis of the one or more CVs, and
extract, the one or more information from the one or more sections based at least on the one or more sentences.
9. The system as claimed in claim 8, wherein the set of rules comprises one
or more rules to divide each section from the one or more sections into
one or more sentences based on one or more field information stored in
one or more data libraries, and wherein the one or more data libraries
comprises:
at least one of a manually curated and an automatically curated data for one or more fields, and
a trie based data structure.
10. The system as claimed in claim 8, wherein the pre-trained data set
comprises at least a trained data associated with multiple sentences.

Documents

Application Documents

#	Name	Date
1	202111011905-STATEMENT OF UNDERTAKING (FORM 3) [20-03-2021(online)].pdf	2021-03-20
2	202111011905-PROVISIONAL SPECIFICATION [20-03-2021(online)].pdf	2021-03-20
3	202111011905-FORM 1 [20-03-2021(online)].pdf	2021-03-20
4	202111011905-DRAWINGS [20-03-2021(online)].pdf	2021-03-20
5	202111011905-FORM-26 [03-06-2021(online)].pdf	2021-06-03
6	202111011905-Proof of Right [18-06-2021(online)].pdf	2021-06-18
7	202111011905-Power of Attorney-060921.pdf	2021-10-19
8	202111011905-OTHERS-060921.pdf	2021-10-19
9	202111011905-Correspondence-060921.pdf	2021-10-19
10	202111011905-ENDORSEMENT BY INVENTORS [15-02-2022(online)].pdf	2022-02-15
11	202111011905-DRAWING [15-02-2022(online)].pdf	2022-02-15
12	202111011905-COMPLETE SPECIFICATION [15-02-2022(online)].pdf	2022-02-15
13	202111011905-FORM 18 [09-05-2022(online)].pdf	2022-05-09
14	202111011905-FER.pdf	2023-02-17
15	202111011905-AbandonedLetter.pdf	2024-02-16

Search Strategy

1	SS_202111011905E_14-02-2023.pdf