Abstract: This disclosure relates to system and method for dynamically creating and building a domain dictionary. In one embodiment, the method comprises computing a syntactic similarity score, a usage similarity score, and a contextual similarity score for an input word with respect to each of a plurality of domain specific words in each of a plurality of existing domains. The method further comprises computing a weighted overall similarity score for the input word with respect to each of the plurality of domain specific words in each of the plurality of existing domains based on the syntactic similarity score, the usage similarity score, and the contextual similarity score. The method further comprises determining belongingness of the input word to each of the plurality of existing domains based on the weighted overall similarity score. FIG. 3
Claims:WE CLAIM
1. A method for creating and building a domain dictionary, the method comprising:
computing, by a domain dictionary creation and building (DDCB) engine, a syntactic similarity score, a usage similarity score, and a contextual similarity score for an input word with respect to each of a plurality of domain specific words in each of a plurality of existing domains;
computing, by the DDCB engine, a weighted overall similarity score for the input word with respect to each of the plurality of domain specific words in each of the plurality of existing domains based on the syntactic similarity score, the usage similarity score, and the contextual similarity score; and
determining, by the DDCB engine, belongingness of the input word to each of the plurality of existing domains based on the weighted overall similarity score.
2. The method of claim 1, further comprising determining the input word from an input document by:
extracting a plurality of words from the input document; and
curating the plurality of words to determine a plurality of input words.
3. The method of claim 1, further comprising creating each of the plurality of existing domains comprising the plurality of domain specific words from a plurality of domain specific input documents by:
extracting a plurality of words from the plurality of domain specific input documents for a given domain;
curating the plurality of words to determine a plurality of domain specific input words;
computing a contextual similarity score for each of the plurality of domain specific input words as a parent word or a child word with respect to each of the plurality of remaining domain specific input words;
determining a fuzzy membership value for each of the plurality of domain specific input words based on the corresponding contextual similarity score; and
building an n-array tree of the plurality of domain specific input words for the given domain based on the fuzzy membership value for each of the plurality of domain specific input words.
4. The method of claim 1, wherein computing the syntactic similarity score for the input word with respect to each of the plurality of domain specific words in each of the plurality of existing domains comprises determining a Jaro distance between the input word and the given domain specific word.
5. The method of claim 1, wherein computing the usage similarity score for the input word with respect to each of the plurality of domain specific words in each of the plurality of existing domains comprises determining a ratio between a number of occurrence of the input word in a neighborhood of each of the plurality of domain specific words in each of the plurality of existing domains and a total number of occurrence of the input word.
6. The method of claim 1, wherein computing the contextual similarity score for the input word with respect to each of the plurality of domain specific words in each of the plurality of existing domains comprises determining a ratio between a number of occurrence of the input word as a parent word or as a child word with respect to each of the plurality of domain specific words in each of the plurality of existing domains and a total number of occurrence of the input word.
7. The method of claim 1, wherein determining belongingness of the input word comprises:
comparing the weighted overall similarity score for the input word with respect to each of the plurality of domain specific words in each of the plurality of existing domains with a pre-defined threshold score to determine one or more of the plurality of existing domains to which the input word belongs; and
comparing a number of the one or more existing domains to which the input word belongs with a pre-defined threshold number of domains.
8. The method of claim 7, further comprising categorizing the input word as an unresolved input word if:
the weighted overall similarity score for the input word with respect to each of the plurality of domain specific words in each of the plurality of existing domains is less than the pre-defined threshold score; or
the number of the one or more existing domains to which the input word belongs exceeds the pre-defined threshold number of domains.
9. The method of claim 8, further comprising resolving a plurality of unresolved input words upon reaching a pre-defined threshold number of unresolved input words, or at a pre-defined time interval, or upon an indication by a user.
10. The method of claim 7, further comprising providing the input word along with the one or more of the plurality of existing domains to which the input word belongs for cognitive validation and correction.
11. The method of claim 7, further comprising building an n-array tree for each of the one or more of the plurality of existing domains to which the input word belongs.
12. A system for creating and building a domain dictionary, the system comprising:
at least one processor; and
a memory for storing instructions that, when executed by the at least one processor, cause the at least one processor to perform operations comprising:
computing a syntactic similarity score, a usage similarity score, and a contextual similarity score for an input word with respect to each of a plurality of domain specific words in each of a plurality of existing domains;
computing a weighted overall similarity score for the input word with respect to each of the plurality of domain specific words in each of the plurality of existing domains based on the syntactic similarity score, the usage similarity score, and the contextual similarity score; and
determining belongingness of the input word to each of the plurality of existing domains based on the weighted overall similarity score.
13. The system of claim 12, wherein the operations further comprise determining the input word from an input document by:
extracting a plurality of words from the input document; and
curating the plurality of words to determine a plurality of input words.
14. The system of claim 12, wherein the operations further comprise creating each of the plurality of existing domains comprising the plurality of domain specific words from a plurality of domain specific input documents by:
extracting a plurality of words from the plurality of domain specific input documents for a given domain;
curating the plurality of words to determine a plurality of domain specific input words;
computing a contextual similarity score for each of the plurality of domain specific input words as a parent word or a child word with respect to each of the plurality of remaining domain specific input words;
determining a fuzzy membership value for each of the plurality of domain specific input words based on the corresponding contextual similarity score; and
building an n-array tree of the plurality of domain specific input words for the given domain based on the fuzzy membership value for each of the plurality of domain specific input words.
15. The system of claim 12, wherein determining belongingness of the input word comprises:
comparing the weighted overall similarity score for the input word with respect to each of the plurality of domain specific words in each of the plurality of existing domains with a pre-defined threshold score to determine one or more of the plurality of existing domains to which the input word belongs; and
comparing a number of the one or more existing domains to which the input word belongs with a pre-defined threshold number of domains.
16. The system of claim 15, wherein the operations further comprise:
categorizing the input word as an unresolved input word if:
the weighted overall similarity score for the input word with respect to each of the plurality of domain specific words in each of the plurality of existing domains is less than the pre-defined threshold score; or
the number of the one or more existing domains to which the input word belongs exceeds the pre-defined threshold number of domains; and
resolving a plurality of unresolved input words upon reaching a pre-defined threshold number of unresolved input words, or at a pre-defined time interval, or upon an indication by a user.
17 The system of claim 15, wherein the operations further comprise:
providing the input word along with the one or more of the plurality of existing domains to which the input word belongs for cognitive validation and correction; and
building an n-array tree for each of the one or more of the plurality of existing domains to which the input word belongs.
18. A non-transitory computer-readable medium storing instructions for creating and building a domain dictionary, wherein upon execution of the instructions by one or more processors, the processors perform operations comprising:
computing a syntactic similarity score, a usage similarity score, and a contextual similarity score for an input word with respect to each of a plurality of domain specific words in each of a plurality of existing domains;
computing a weighted overall similarity score for the input word with respect to each of the plurality of domain specific words in each of the plurality of existing domains based on the syntactic similarity score, the usage similarity score, and the contextual similarity score; and
determining belongingness of the input word to each of the plurality of existing domains based on the weighted overall similarity score.
19. The non-transitory computer-readable medium of claim 18, wherein the operations further comprise creating each of the plurality of existing domains comprising the plurality of domain specific words from a plurality of domain specific input documents by:
extracting a plurality of words from the plurality of domain specific input documents for a given domain;
curating the plurality of words to determine a plurality of domain specific input words;
computing a contextual similarity score for each of the plurality of domain specific input words as a parent word or a child word with respect to each of the plurality of remaining domain specific input words;
determining a fuzzy membership value for each of the plurality of domain specific input words based on the corresponding contextual similarity score; and
building an n-array tree of the plurality of domain specific input words for the given domain based on the fuzzy membership value for each of the plurality of domain specific input words.
20. The non-transitory computer-readable medium of claim 18, wherein determining belongingness of the input word comprises:
comparing the weighted overall similarity score for the input word with respect to each of the plurality of domain specific words in each of the plurality of existing domains with a pre-defined threshold score to determine one or more of the plurality of existing domains to which the input word belongs;
comparing a number of the one or more existing domains to which the input word belongs with a pre-defined threshold number of domains;
categorizing the input word as an unresolved input word if:
the weighted overall similarity score for the input word with respect to each of the plurality of domain specific words in each of the plurality of existing domains is less than the pre-defined threshold score; or
the number of the one or more existing domains to which the input word belongs exceeds the pre-defined threshold number of domains; and
resolving a plurality of unresolved input words upon reaching a pre-defined threshold number of unresolved input words, or at a pre-defined time interval.
Dated this 21st day of December, 2016
R Ramya Rao
Of K&S Partners
Agent for the Applicant
, Description:TECHNICAL FIELD
This disclosure relates generally to information processing, and more particularly to system and method for dynamically creating and building a domain dictionary.
| Section | Controller | Decision Date |
|---|---|---|
| # | Name | Date |
|---|---|---|
| 1 | 201641043708-IntimationOfGrant10-11-2022.pdf | 2022-11-10 |
| 1 | Form5_As Filed_21-12-2016.pdf | 2016-12-21 |
| 2 | 201641043708-PatentCertificate10-11-2022.pdf | 2022-11-10 |
| 2 | Form3_As Filed_21-12-2016.pdf | 2016-12-21 |
| 3 | Form2 Title Page_Complete_21-12-2016.pdf | 2016-12-21 |
| 3 | 201641043708-Written submissions and relevant documents [04-10-2022(online)].pdf | 2022-10-04 |
| 4 | Form18_Normal Request_21-12-2016.pdf | 2016-12-21 |
| 4 | 201641043708-AMENDED DOCUMENTS [23-08-2022(online)].pdf | 2022-08-23 |
| 5 | Drawings_Normal Request_21-12-2016.pdf | 2016-12-21 |
| 5 | 201641043708-Correspondence to notify the Controller [23-08-2022(online)].pdf | 2022-08-23 |
| 6 | Description Complete_As Filed_21-12-2016.pdf | 2016-12-21 |
| 6 | 201641043708-FORM 13 [23-08-2022(online)].pdf | 2022-08-23 |
| 7 | Claims_As Filed_21-12-2016.pdf | 2016-12-21 |
| 7 | 201641043708-POA [23-08-2022(online)].pdf | 2022-08-23 |
| 8 | Abstract_As Filed_21-12-2016.pdf | 2016-12-21 |
| 8 | 201641043708-US(14)-HearingNotice-(HearingDate-21-09-2022).pdf | 2022-08-17 |
| 9 | 201641043708-CLAIMS [14-10-2020(online)].pdf | 2020-10-14 |
| 9 | Form26_General Power Of Attorney_22-12-2016.pdf | 2016-12-22 |
| 10 | 201641043708-COMPLETE SPECIFICATION [14-10-2020(online)].pdf | 2020-10-14 |
| 10 | Form26_General Power Of Attorney_22-12-2016..pdf | 2016-12-22 |
| 11 | 201641043708-CORRESPONDENCE [14-10-2020(online)].pdf | 2020-10-14 |
| 11 | Correspondence by Agent_Certified Copy Of GPA _22-12-2016.pdf | 2016-12-22 |
| 12 | 201641043708-DRAWING [14-10-2020(online)].pdf | 2020-10-14 |
| 12 | abstract 201641043708.jpg | 2016-12-29 |
| 13 | 201641043708-FER_SER_REPLY [14-10-2020(online)].pdf | 2020-10-14 |
| 13 | Other Patent Document [26-04-2017(online)].pdf | 2017-04-26 |
| 14 | 201641043708-FORM 3 [14-10-2020(online)].pdf | 2020-10-14 |
| 14 | Correspondence by Agent_Form 1_27-04-2017.pdf | 2017-04-27 |
| 15 | 201641043708-FER.pdf | 2020-05-11 |
| 15 | 201641043708-Information under section 8(2) [14-10-2020(online)].pdf | 2020-10-14 |
| 16 | 201641043708-OTHERS [14-10-2020(online)].pdf | 2020-10-14 |
| 16 | 201641043708-RELEVANT DOCUMENTS [14-10-2020(online)].pdf | 2020-10-14 |
| 17 | 201641043708-PETITION UNDER RULE 137 [14-10-2020(online)].pdf | 2020-10-14 |
| 18 | 201641043708-RELEVANT DOCUMENTS [14-10-2020(online)].pdf | 2020-10-14 |
| 18 | 201641043708-OTHERS [14-10-2020(online)].pdf | 2020-10-14 |
| 19 | 201641043708-FER.pdf | 2020-05-11 |
| 19 | 201641043708-Information under section 8(2) [14-10-2020(online)].pdf | 2020-10-14 |
| 20 | 201641043708-FORM 3 [14-10-2020(online)].pdf | 2020-10-14 |
| 20 | Correspondence by Agent_Form 1_27-04-2017.pdf | 2017-04-27 |
| 21 | 201641043708-FER_SER_REPLY [14-10-2020(online)].pdf | 2020-10-14 |
| 21 | Other Patent Document [26-04-2017(online)].pdf | 2017-04-26 |
| 22 | 201641043708-DRAWING [14-10-2020(online)].pdf | 2020-10-14 |
| 22 | abstract 201641043708.jpg | 2016-12-29 |
| 23 | 201641043708-CORRESPONDENCE [14-10-2020(online)].pdf | 2020-10-14 |
| 23 | Correspondence by Agent_Certified Copy Of GPA _22-12-2016.pdf | 2016-12-22 |
| 24 | Form26_General Power Of Attorney_22-12-2016..pdf | 2016-12-22 |
| 24 | 201641043708-COMPLETE SPECIFICATION [14-10-2020(online)].pdf | 2020-10-14 |
| 25 | 201641043708-CLAIMS [14-10-2020(online)].pdf | 2020-10-14 |
| 25 | Form26_General Power Of Attorney_22-12-2016.pdf | 2016-12-22 |
| 26 | 201641043708-US(14)-HearingNotice-(HearingDate-21-09-2022).pdf | 2022-08-17 |
| 26 | Abstract_As Filed_21-12-2016.pdf | 2016-12-21 |
| 27 | 201641043708-POA [23-08-2022(online)].pdf | 2022-08-23 |
| 27 | Claims_As Filed_21-12-2016.pdf | 2016-12-21 |
| 28 | 201641043708-FORM 13 [23-08-2022(online)].pdf | 2022-08-23 |
| 28 | Description Complete_As Filed_21-12-2016.pdf | 2016-12-21 |
| 29 | 201641043708-Correspondence to notify the Controller [23-08-2022(online)].pdf | 2022-08-23 |
| 29 | Drawings_Normal Request_21-12-2016.pdf | 2016-12-21 |
| 30 | 201641043708-AMENDED DOCUMENTS [23-08-2022(online)].pdf | 2022-08-23 |
| 30 | Form18_Normal Request_21-12-2016.pdf | 2016-12-21 |
| 31 | Form2 Title Page_Complete_21-12-2016.pdf | 2016-12-21 |
| 31 | 201641043708-Written submissions and relevant documents [04-10-2022(online)].pdf | 2022-10-04 |
| 32 | Form3_As Filed_21-12-2016.pdf | 2016-12-21 |
| 32 | 201641043708-PatentCertificate10-11-2022.pdf | 2022-11-10 |
| 33 | Form5_As Filed_21-12-2016.pdf | 2016-12-21 |
| 33 | 201641043708-IntimationOfGrant10-11-2022.pdf | 2022-11-10 |
| 1 | SearchStrategyMatrix(43708)E_30-04-2020.pdf |