Sign In to Follow Application
View All Documents & Correspondence

System And Method For Creating And Building A Domain Dictionary

Abstract: This disclosure relates to system and method for dynamically creating and building a domain dictionary. In one embodiment, the method comprises computing a syntactic similarity score, a usage similarity score, and a contextual similarity score for an input word with respect to each of a plurality of domain specific words in each of a plurality of existing domains. The method further comprises computing a weighted overall similarity score for the input word with respect to each of the plurality of domain specific words in each of the plurality of existing domains based on the syntactic similarity score, the usage similarity score, and the contextual similarity score. The method further comprises determining belongingness of the input word to each of the plurality of existing domains based on the weighted overall similarity score. FIG. 3

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #
Filing Date
21 December 2016
Publication Number
25/2018
Publication Type
INA
Invention Field
COMPUTER SCIENCE
Status
Email
Parent Application
Patent Number
Legal Status
Grant Date
2022-11-10
Renewal Date

Applicants

WIPRO LIMITED
Doddakannelli, Sarjapur Road, Bangalore 560035, Karnataka, India.

Inventors

1. RAGHAVENDRA HOSABETTU
#3080, Venkatadri Nilaya, 2nd Main, 3rd Cross, VHBCS Layout, Banashankari 3rd Stage, Bangalore – 560085, Karnataka, India.

Specification

Claims:WE CLAIM
1. A method for creating and building a domain dictionary, the method comprising:
computing, by a domain dictionary creation and building (DDCB) engine, a syntactic similarity score, a usage similarity score, and a contextual similarity score for an input word with respect to each of a plurality of domain specific words in each of a plurality of existing domains;
computing, by the DDCB engine, a weighted overall similarity score for the input word with respect to each of the plurality of domain specific words in each of the plurality of existing domains based on the syntactic similarity score, the usage similarity score, and the contextual similarity score; and
determining, by the DDCB engine, belongingness of the input word to each of the plurality of existing domains based on the weighted overall similarity score.
2. The method of claim 1, further comprising determining the input word from an input document by:
extracting a plurality of words from the input document; and
curating the plurality of words to determine a plurality of input words.
3. The method of claim 1, further comprising creating each of the plurality of existing domains comprising the plurality of domain specific words from a plurality of domain specific input documents by:
extracting a plurality of words from the plurality of domain specific input documents for a given domain;
curating the plurality of words to determine a plurality of domain specific input words;
computing a contextual similarity score for each of the plurality of domain specific input words as a parent word or a child word with respect to each of the plurality of remaining domain specific input words;
determining a fuzzy membership value for each of the plurality of domain specific input words based on the corresponding contextual similarity score; and
building an n-array tree of the plurality of domain specific input words for the given domain based on the fuzzy membership value for each of the plurality of domain specific input words.
4. The method of claim 1, wherein computing the syntactic similarity score for the input word with respect to each of the plurality of domain specific words in each of the plurality of existing domains comprises determining a Jaro distance between the input word and the given domain specific word.
5. The method of claim 1, wherein computing the usage similarity score for the input word with respect to each of the plurality of domain specific words in each of the plurality of existing domains comprises determining a ratio between a number of occurrence of the input word in a neighborhood of each of the plurality of domain specific words in each of the plurality of existing domains and a total number of occurrence of the input word.
6. The method of claim 1, wherein computing the contextual similarity score for the input word with respect to each of the plurality of domain specific words in each of the plurality of existing domains comprises determining a ratio between a number of occurrence of the input word as a parent word or as a child word with respect to each of the plurality of domain specific words in each of the plurality of existing domains and a total number of occurrence of the input word.
7. The method of claim 1, wherein determining belongingness of the input word comprises:
comparing the weighted overall similarity score for the input word with respect to each of the plurality of domain specific words in each of the plurality of existing domains with a pre-defined threshold score to determine one or more of the plurality of existing domains to which the input word belongs; and
comparing a number of the one or more existing domains to which the input word belongs with a pre-defined threshold number of domains.
8. The method of claim 7, further comprising categorizing the input word as an unresolved input word if:
the weighted overall similarity score for the input word with respect to each of the plurality of domain specific words in each of the plurality of existing domains is less than the pre-defined threshold score; or
the number of the one or more existing domains to which the input word belongs exceeds the pre-defined threshold number of domains.
9. The method of claim 8, further comprising resolving a plurality of unresolved input words upon reaching a pre-defined threshold number of unresolved input words, or at a pre-defined time interval, or upon an indication by a user.
10. The method of claim 7, further comprising providing the input word along with the one or more of the plurality of existing domains to which the input word belongs for cognitive validation and correction.
11. The method of claim 7, further comprising building an n-array tree for each of the one or more of the plurality of existing domains to which the input word belongs.
12. A system for creating and building a domain dictionary, the system comprising:
at least one processor; and
a memory for storing instructions that, when executed by the at least one processor, cause the at least one processor to perform operations comprising:
computing a syntactic similarity score, a usage similarity score, and a contextual similarity score for an input word with respect to each of a plurality of domain specific words in each of a plurality of existing domains;
computing a weighted overall similarity score for the input word with respect to each of the plurality of domain specific words in each of the plurality of existing domains based on the syntactic similarity score, the usage similarity score, and the contextual similarity score; and
determining belongingness of the input word to each of the plurality of existing domains based on the weighted overall similarity score.
13. The system of claim 12, wherein the operations further comprise determining the input word from an input document by:
extracting a plurality of words from the input document; and
curating the plurality of words to determine a plurality of input words.
14. The system of claim 12, wherein the operations further comprise creating each of the plurality of existing domains comprising the plurality of domain specific words from a plurality of domain specific input documents by:
extracting a plurality of words from the plurality of domain specific input documents for a given domain;
curating the plurality of words to determine a plurality of domain specific input words;
computing a contextual similarity score for each of the plurality of domain specific input words as a parent word or a child word with respect to each of the plurality of remaining domain specific input words;
determining a fuzzy membership value for each of the plurality of domain specific input words based on the corresponding contextual similarity score; and
building an n-array tree of the plurality of domain specific input words for the given domain based on the fuzzy membership value for each of the plurality of domain specific input words.
15. The system of claim 12, wherein determining belongingness of the input word comprises:
comparing the weighted overall similarity score for the input word with respect to each of the plurality of domain specific words in each of the plurality of existing domains with a pre-defined threshold score to determine one or more of the plurality of existing domains to which the input word belongs; and
comparing a number of the one or more existing domains to which the input word belongs with a pre-defined threshold number of domains.
16. The system of claim 15, wherein the operations further comprise:
categorizing the input word as an unresolved input word if:
the weighted overall similarity score for the input word with respect to each of the plurality of domain specific words in each of the plurality of existing domains is less than the pre-defined threshold score; or
the number of the one or more existing domains to which the input word belongs exceeds the pre-defined threshold number of domains; and
resolving a plurality of unresolved input words upon reaching a pre-defined threshold number of unresolved input words, or at a pre-defined time interval, or upon an indication by a user.
17 The system of claim 15, wherein the operations further comprise:
providing the input word along with the one or more of the plurality of existing domains to which the input word belongs for cognitive validation and correction; and
building an n-array tree for each of the one or more of the plurality of existing domains to which the input word belongs.
18. A non-transitory computer-readable medium storing instructions for creating and building a domain dictionary, wherein upon execution of the instructions by one or more processors, the processors perform operations comprising:
computing a syntactic similarity score, a usage similarity score, and a contextual similarity score for an input word with respect to each of a plurality of domain specific words in each of a plurality of existing domains;
computing a weighted overall similarity score for the input word with respect to each of the plurality of domain specific words in each of the plurality of existing domains based on the syntactic similarity score, the usage similarity score, and the contextual similarity score; and
determining belongingness of the input word to each of the plurality of existing domains based on the weighted overall similarity score.
19. The non-transitory computer-readable medium of claim 18, wherein the operations further comprise creating each of the plurality of existing domains comprising the plurality of domain specific words from a plurality of domain specific input documents by:
extracting a plurality of words from the plurality of domain specific input documents for a given domain;
curating the plurality of words to determine a plurality of domain specific input words;
computing a contextual similarity score for each of the plurality of domain specific input words as a parent word or a child word with respect to each of the plurality of remaining domain specific input words;
determining a fuzzy membership value for each of the plurality of domain specific input words based on the corresponding contextual similarity score; and
building an n-array tree of the plurality of domain specific input words for the given domain based on the fuzzy membership value for each of the plurality of domain specific input words.
20. The non-transitory computer-readable medium of claim 18, wherein determining belongingness of the input word comprises:
comparing the weighted overall similarity score for the input word with respect to each of the plurality of domain specific words in each of the plurality of existing domains with a pre-defined threshold score to determine one or more of the plurality of existing domains to which the input word belongs;
comparing a number of the one or more existing domains to which the input word belongs with a pre-defined threshold number of domains;
categorizing the input word as an unresolved input word if:
the weighted overall similarity score for the input word with respect to each of the plurality of domain specific words in each of the plurality of existing domains is less than the pre-defined threshold score; or
the number of the one or more existing domains to which the input word belongs exceeds the pre-defined threshold number of domains; and
resolving a plurality of unresolved input words upon reaching a pre-defined threshold number of unresolved input words, or at a pre-defined time interval.

Dated this 21st day of December, 2016

R Ramya Rao
Of K&S Partners
Agent for the Applicant
, Description:TECHNICAL FIELD
This disclosure relates generally to information processing, and more particularly to system and method for dynamically creating and building a domain dictionary.

Documents

Orders

Section Controller Decision Date

Application Documents

# Name Date
1 201641043708-IntimationOfGrant10-11-2022.pdf 2022-11-10
1 Form5_As Filed_21-12-2016.pdf 2016-12-21
2 201641043708-PatentCertificate10-11-2022.pdf 2022-11-10
2 Form3_As Filed_21-12-2016.pdf 2016-12-21
3 Form2 Title Page_Complete_21-12-2016.pdf 2016-12-21
3 201641043708-Written submissions and relevant documents [04-10-2022(online)].pdf 2022-10-04
4 Form18_Normal Request_21-12-2016.pdf 2016-12-21
4 201641043708-AMENDED DOCUMENTS [23-08-2022(online)].pdf 2022-08-23
5 Drawings_Normal Request_21-12-2016.pdf 2016-12-21
5 201641043708-Correspondence to notify the Controller [23-08-2022(online)].pdf 2022-08-23
6 Description Complete_As Filed_21-12-2016.pdf 2016-12-21
6 201641043708-FORM 13 [23-08-2022(online)].pdf 2022-08-23
7 Claims_As Filed_21-12-2016.pdf 2016-12-21
7 201641043708-POA [23-08-2022(online)].pdf 2022-08-23
8 Abstract_As Filed_21-12-2016.pdf 2016-12-21
8 201641043708-US(14)-HearingNotice-(HearingDate-21-09-2022).pdf 2022-08-17
9 201641043708-CLAIMS [14-10-2020(online)].pdf 2020-10-14
9 Form26_General Power Of Attorney_22-12-2016.pdf 2016-12-22
10 201641043708-COMPLETE SPECIFICATION [14-10-2020(online)].pdf 2020-10-14
10 Form26_General Power Of Attorney_22-12-2016..pdf 2016-12-22
11 201641043708-CORRESPONDENCE [14-10-2020(online)].pdf 2020-10-14
11 Correspondence by Agent_Certified Copy Of GPA _22-12-2016.pdf 2016-12-22
12 201641043708-DRAWING [14-10-2020(online)].pdf 2020-10-14
12 abstract 201641043708.jpg 2016-12-29
13 201641043708-FER_SER_REPLY [14-10-2020(online)].pdf 2020-10-14
13 Other Patent Document [26-04-2017(online)].pdf 2017-04-26
14 201641043708-FORM 3 [14-10-2020(online)].pdf 2020-10-14
14 Correspondence by Agent_Form 1_27-04-2017.pdf 2017-04-27
15 201641043708-FER.pdf 2020-05-11
15 201641043708-Information under section 8(2) [14-10-2020(online)].pdf 2020-10-14
16 201641043708-OTHERS [14-10-2020(online)].pdf 2020-10-14
16 201641043708-RELEVANT DOCUMENTS [14-10-2020(online)].pdf 2020-10-14
17 201641043708-PETITION UNDER RULE 137 [14-10-2020(online)].pdf 2020-10-14
18 201641043708-RELEVANT DOCUMENTS [14-10-2020(online)].pdf 2020-10-14
18 201641043708-OTHERS [14-10-2020(online)].pdf 2020-10-14
19 201641043708-FER.pdf 2020-05-11
19 201641043708-Information under section 8(2) [14-10-2020(online)].pdf 2020-10-14
20 201641043708-FORM 3 [14-10-2020(online)].pdf 2020-10-14
20 Correspondence by Agent_Form 1_27-04-2017.pdf 2017-04-27
21 201641043708-FER_SER_REPLY [14-10-2020(online)].pdf 2020-10-14
21 Other Patent Document [26-04-2017(online)].pdf 2017-04-26
22 201641043708-DRAWING [14-10-2020(online)].pdf 2020-10-14
22 abstract 201641043708.jpg 2016-12-29
23 201641043708-CORRESPONDENCE [14-10-2020(online)].pdf 2020-10-14
23 Correspondence by Agent_Certified Copy Of GPA _22-12-2016.pdf 2016-12-22
24 Form26_General Power Of Attorney_22-12-2016..pdf 2016-12-22
24 201641043708-COMPLETE SPECIFICATION [14-10-2020(online)].pdf 2020-10-14
25 201641043708-CLAIMS [14-10-2020(online)].pdf 2020-10-14
25 Form26_General Power Of Attorney_22-12-2016.pdf 2016-12-22
26 201641043708-US(14)-HearingNotice-(HearingDate-21-09-2022).pdf 2022-08-17
26 Abstract_As Filed_21-12-2016.pdf 2016-12-21
27 201641043708-POA [23-08-2022(online)].pdf 2022-08-23
27 Claims_As Filed_21-12-2016.pdf 2016-12-21
28 201641043708-FORM 13 [23-08-2022(online)].pdf 2022-08-23
28 Description Complete_As Filed_21-12-2016.pdf 2016-12-21
29 201641043708-Correspondence to notify the Controller [23-08-2022(online)].pdf 2022-08-23
29 Drawings_Normal Request_21-12-2016.pdf 2016-12-21
30 201641043708-AMENDED DOCUMENTS [23-08-2022(online)].pdf 2022-08-23
30 Form18_Normal Request_21-12-2016.pdf 2016-12-21
31 Form2 Title Page_Complete_21-12-2016.pdf 2016-12-21
31 201641043708-Written submissions and relevant documents [04-10-2022(online)].pdf 2022-10-04
32 Form3_As Filed_21-12-2016.pdf 2016-12-21
32 201641043708-PatentCertificate10-11-2022.pdf 2022-11-10
33 Form5_As Filed_21-12-2016.pdf 2016-12-21
33 201641043708-IntimationOfGrant10-11-2022.pdf 2022-11-10

Search Strategy

1 SearchStrategyMatrix(43708)E_30-04-2020.pdf

ERegister / Renewals

3rd: 01 Feb 2023

From 21/12/2018 - To 21/12/2019

4th: 01 Feb 2023

From 21/12/2019 - To 21/12/2020

5th: 01 Feb 2023

From 21/12/2020 - To 21/12/2021

6th: 01 Feb 2023

From 21/12/2021 - To 21/12/2022

7th: 01 Feb 2023

From 21/12/2022 - To 21/12/2023

8th: 19 Dec 2023

From 21/12/2023 - To 21/12/2024

9th: 18 Dec 2024

From 21/12/2024 - To 21/12/2025