Abstract: A method and device for n-gram identification and extraction is disclosed. The method includes identifying at least one n-gram from a sentence inputted by a user based on a confidence score associated with each of the at least one n-gram. The method further includes determining a direction context entropy coefficient for each of the at least one n-gram. The method includes iteratively expanding one or more of the at least one n-gram by the smallest n-gram unit at each iteration in a predefined direction in the sentence to generate at least one expanded n-gram, based on an associated direction context entropy coefficient. The method further includes extracting at each expanding iteration one or more of the at least one expanded n-gram based on an associated confidence score. The method includes grouping semantically linked n-grams from the one or more of the at least one expanded n-gram. FIG.1
Claims:WE CLAIM:
1. A method for n-gram identification and extraction, the method comprising:
identifying, by a computing device, at least one n-gram from a sentence inputted by a user based on a confidence score associated with each of the at least one n-gram, wherein a confidence score for an n-gram is computed based on comparison of the n-gram with existing word patterns;
determining, by the computing device, a direction context entropy coefficient for each of the at least one n-gram, based on the existing word patterns;
iteratively expanding, by the computing device, one or more of the at least one n-gram by the smallest n-gram unit at each iteration in a predefined direction in the sentence to generate at least one expanded n-gram, based on an associated direction context entropy coefficient;
extracting at each expanding iteration, by the computing device, one or more of the at least one expanded n-gram based on a confidence score associated with each of the one or more of the at least one expanded n-gram, wherein a confidence score for an expanded n-gram is computed based on comparison of the expanded n-gram with the existing word patterns; and
grouping, by the computing device, semantically linked n-grams from the one or more of the at least one expanded n-gram.
2. The method of claim 1, further comprising removing one or more n-grams from the at least one n-gram, wherein the one or more n-grams cannot be expanded by the smallest n-gram unit based on the associated direction context entropy coefficient.
3. The method of claim 1, wherein the predefined direction comprises one of left direction and right direction in the sentence with respect to an n-gram from the at least one n-gram.
4. The method of claim 1, wherein the smallest n-gram unit is a unigram, and wherein an n-gram is expanded by a unigram in the predefined direction in each iteration to generate an (n+1)-gram.
5. The method of claim 1, wherein the confidence score for an n-gram is computed based on association coefficient computed for degree of association of an n-gram relative to at least one adjacent word when compared with the existing word patterns.
6. The method of claim 1, wherein the direction context entropy coefficient for an n-gram comprises at least one of a left context entropy coefficient and a right context entropy coefficient,
wherein the n-gram is expanded in the left direction in the sentence, when the left context entropy coefficient for the n-gram is higher than an associated predefined context threshold, and
wherein the n-gram is expanded in the right direction in the sentence, when the right context entropy coefficient for the n-gram is higher than an associated predefined context threshold.
7. The method of claim 1, further comprising validating expansion of an n-gram in the predefined direction based on at least one of a cross context entropy coefficient and a reverse cross context entropy coefficient.
8. The method of claim 1, further comprising filtering one or more of the at least one n-gram and one or more of the at least one expanded n-gram based on associated at least one context divergence coefficient.
9. The method of claim 8, wherein each of one or more filtered n-grams comprise low relevancy confidence score determined based on the associated at least one context divergence coefficient.
10. The method of claim 1, further comprising limiting the number of expanding iterations based on a predefined iteration threshold.
11. A computing device for n-gram identification and extraction, the computing device comprises:
a processor; and
a memory communicatively coupled to the processor, wherein the memory stores processor instructions, which, on execution, causes the processor to:
identify at least one n-gram from a sentence inputted by a user based on a confidence score associated with each of the at least one n-gram, wherein a confidence score for an n-gram is computed based on comparison of the n-gram with existing word patterns;
determine a direction context entropy coefficient for each of the at least one n-gram, based on the existing word patterns;
iteratively expand one or more of the at least one n-gram by the smallest n-gram unit at each iteration in a predefined direction in the sentence to generate at least one expanded n-gram, based on an associated direction context entropy coefficient;
extract at each expanding iteration one or more of the at least one expanded n-gram based on a confidence score associated with each of the one or more of the at least one expanded n-gram, wherein a confidence score for an expanded n-gram is computed based on comparison of the expanded n-gram with the existing word patterns; and
group semantically linked n-grams from the one or more of the at least one expanded n-gram.
12. The computing device of claim 11, wherein the processor instructions further cause the processor to remove one or more n-grams from the at least one n-gram, wherein the one or more n-grams cannot be expanded by the smallest n-gram unit based on the associated direction context entropy coefficient.
13. The computing device of claim 11, wherein the predefined direction comprises one of left direction and right direction in the sentence with respect to an n-gram from the at least one n-gram.
14. The computing device of claim 11, wherein the confidence score for an n-gram is computed based on association coefficient computed for degree of association of an n-gram relative to at least one adjacent word when compared with the existing word patterns.
15. The computing device of claim 1, wherein the direction context entropy coefficient for an n-gram comprises at least one of a left context entropy coefficient and a right context entropy coefficient,
wherein the n-gram is expanded in the left direction in the sentence, when the left context entropy coefficient for the n-gram is higher than an associated predefined context threshold, and
wherein the n-gram is expanded in the right direction in the sentence, when the right context entropy coefficient for the n-gram is higher than an associated predefined context threshold.
16. The computing device of claim 11, wherein the processor instructions further cause the processor to validate expansion of an n-gram in the predefined direction based on at least one of a cross context entropy coefficient and a reverse cross context entropy coefficient.
17. The computing device of claim 11, wherein the processor instructions further cause the processor to filter one or more of the at least one n-gram and one or more of the at least one expanded n-gram based on associated at least one context divergence coefficient.
18. The computing device of claim 17, wherein each of one or more filtered n-grams comprise low relevancy confidence score determined based on the associated at least one context divergence coefficient.
19. The computing device of claim 11, wherein the processor instructions further cause the processor to limit the number of expanding iterations based on a predefined iteration threshold.
Dated this 20th day of March, 2018
Swetha SN
Of K&S Partners
Agent for the Applicant
IN/PA-2123
, Description:TECHNICAL FIELD
This disclosure relates generally to n-grams and more particularly to method and device for n-gram identification and extraction.
| # | Name | Date |
|---|---|---|
| 1 | 201841010111-STATEMENT OF UNDERTAKING (FORM 3) [20-03-2018(online)].pdf | 2018-03-20 |
| 2 | 201841010111-REQUEST FOR EXAMINATION (FORM-18) [20-03-2018(online)].pdf | 2018-03-20 |
| 3 | 201841010111-POWER OF AUTHORITY [20-03-2018(online)].pdf | 2018-03-20 |
| 4 | 201841010111-FORM 18 [20-03-2018(online)].pdf | 2018-03-20 |
| 5 | 201841010111-FORM 1 [20-03-2018(online)].pdf | 2018-03-20 |
| 6 | 201841010111-DRAWINGS [20-03-2018(online)].pdf | 2018-03-20 |
| 7 | 201841010111-DECLARATION OF INVENTORSHIP (FORM 5) [20-03-2018(online)].pdf | 2018-03-20 |
| 8 | 201841010111-COMPLETE SPECIFICATION [20-03-2018(online)].pdf | 2018-03-20 |
| 9 | abstract201841010111.jpg | 2018-03-21 |
| 10 | 201841010111-REQUEST FOR CERTIFIED COPY [04-05-2018(online)].pdf | 2018-05-04 |
| 11 | 201841010111-Proof of Right (MANDATORY) [30-07-2018(online)].pdf | 2018-07-30 |
| 12 | Correspondence by Agent_Form1_01-08-2018.pdf | 2018-08-01 |
| 13 | 201841010111-PETITION UNDER RULE 137 [22-03-2021(online)].pdf | 2021-03-22 |
| 14 | 201841010111-OTHERS [22-03-2021(online)].pdf | 2021-03-22 |
| 15 | 201841010111-FORM 3 [22-03-2021(online)].pdf | 2021-03-22 |
| 16 | 201841010111-FER_SER_REPLY [22-03-2021(online)].pdf | 2021-03-22 |
| 17 | 201841010111-DRAWING [22-03-2021(online)].pdf | 2021-03-22 |
| 18 | 201841010111-CORRESPONDENCE [22-03-2021(online)].pdf | 2021-03-22 |
| 19 | 201841010111-COMPLETE SPECIFICATION [22-03-2021(online)].pdf | 2021-03-22 |
| 20 | 201841010111-CLAIMS [22-03-2021(online)].pdf | 2021-03-22 |
| 21 | 201841010111-ABSTRACT [22-03-2021(online)].pdf | 2021-03-22 |
| 22 | 201841010111-FER.pdf | 2021-10-17 |
| 23 | 201841010111-US(14)-HearingNotice-(HearingDate-02-06-2023).pdf | 2023-05-08 |
| 24 | 201841010111-POA [17-05-2023(online)].pdf | 2023-05-17 |
| 25 | 201841010111-FORM 13 [17-05-2023(online)].pdf | 2023-05-17 |
| 26 | 201841010111-Correspondence to notify the Controller [17-05-2023(online)].pdf | 2023-05-17 |
| 27 | 201841010111-AMENDED DOCUMENTS [17-05-2023(online)].pdf | 2023-05-17 |
| 28 | 201841010111-US(14)-ExtendedHearingNotice-(HearingDate-08-06-2023).pdf | 2023-06-01 |
| 29 | 201841010111-Correspondence to notify the Controller [05-06-2023(online)].pdf | 2023-06-05 |
| 30 | 201841010111-Written submissions and relevant documents [23-06-2023(online)].pdf | 2023-06-23 |
| 31 | 201841010111-FORM-26 [23-06-2023(online)].pdf | 2023-06-23 |
| 32 | 201841010111-FORM 3 [23-06-2023(online)].pdf | 2023-06-23 |
| 33 | 201841010111-PatentCertificate27-07-2023.pdf | 2023-07-27 |
| 34 | 201841010111-IntimationOfGrant27-07-2023.pdf | 2023-07-27 |
| 1 | 2020-09-2913-05-34E_29-09-2020.pdf |