Sign In to Follow Application
View All Documents & Correspondence

Method And Device For N Gram Identification And Extraction

Abstract: A method and device for n-gram identification and extraction is disclosed. The method includes identifying at least one n-gram from a sentence inputted by a user based on a confidence score associated with each of the at least one n-gram. The method further includes determining a direction context entropy coefficient for each of the at least one n-gram. The method includes iteratively expanding one or more of the at least one n-gram by the smallest n-gram unit at each iteration in a predefined direction in the sentence to generate at least one expanded n-gram, based on an associated direction context entropy coefficient. The method further includes extracting at each expanding iteration one or more of the at least one expanded n-gram based on an associated confidence score. The method includes grouping semantically linked n-grams from the one or more of the at least one expanded n-gram. FIG.1

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #
Filing Date
20 March 2018
Publication Number
39/2019
Publication Type
INA
Invention Field
COMPUTER SCIENCE
Status
Email
bangalore@knspartners.com
Parent Application
Patent Number
Legal Status
Grant Date
2023-07-27
Renewal Date

Applicants

WIPRO LIMITED
Doddakannelli, Sarjapur Road, Bangalore

Inventors

1. BALAJI JAGAN
G01, Kristal Amber II, 8th Main, 14th Cross, N S Palya, BTM Stage 2, Bangalore 560076
2. MEENAKSHI SUNDARAM MURUGESHAN
No.1264, Ground Floor, 15th Main, BTM 2nd Stage, Bangalore 560076

Specification

Claims:WE CLAIM:
1. A method for n-gram identification and extraction, the method comprising:
identifying, by a computing device, at least one n-gram from a sentence inputted by a user based on a confidence score associated with each of the at least one n-gram, wherein a confidence score for an n-gram is computed based on comparison of the n-gram with existing word patterns;
determining, by the computing device, a direction context entropy coefficient for each of the at least one n-gram, based on the existing word patterns;
iteratively expanding, by the computing device, one or more of the at least one n-gram by the smallest n-gram unit at each iteration in a predefined direction in the sentence to generate at least one expanded n-gram, based on an associated direction context entropy coefficient;
extracting at each expanding iteration, by the computing device, one or more of the at least one expanded n-gram based on a confidence score associated with each of the one or more of the at least one expanded n-gram, wherein a confidence score for an expanded n-gram is computed based on comparison of the expanded n-gram with the existing word patterns; and
grouping, by the computing device, semantically linked n-grams from the one or more of the at least one expanded n-gram.
2. The method of claim 1, further comprising removing one or more n-grams from the at least one n-gram, wherein the one or more n-grams cannot be expanded by the smallest n-gram unit based on the associated direction context entropy coefficient.
3. The method of claim 1, wherein the predefined direction comprises one of left direction and right direction in the sentence with respect to an n-gram from the at least one n-gram.
4. The method of claim 1, wherein the smallest n-gram unit is a unigram, and wherein an n-gram is expanded by a unigram in the predefined direction in each iteration to generate an (n+1)-gram.
5. The method of claim 1, wherein the confidence score for an n-gram is computed based on association coefficient computed for degree of association of an n-gram relative to at least one adjacent word when compared with the existing word patterns.
6. The method of claim 1, wherein the direction context entropy coefficient for an n-gram comprises at least one of a left context entropy coefficient and a right context entropy coefficient,
wherein the n-gram is expanded in the left direction in the sentence, when the left context entropy coefficient for the n-gram is higher than an associated predefined context threshold, and
wherein the n-gram is expanded in the right direction in the sentence, when the right context entropy coefficient for the n-gram is higher than an associated predefined context threshold.
7. The method of claim 1, further comprising validating expansion of an n-gram in the predefined direction based on at least one of a cross context entropy coefficient and a reverse cross context entropy coefficient.
8. The method of claim 1, further comprising filtering one or more of the at least one n-gram and one or more of the at least one expanded n-gram based on associated at least one context divergence coefficient.
9. The method of claim 8, wherein each of one or more filtered n-grams comprise low relevancy confidence score determined based on the associated at least one context divergence coefficient.
10. The method of claim 1, further comprising limiting the number of expanding iterations based on a predefined iteration threshold.
11. A computing device for n-gram identification and extraction, the computing device comprises:
a processor; and
a memory communicatively coupled to the processor, wherein the memory stores processor instructions, which, on execution, causes the processor to:
identify at least one n-gram from a sentence inputted by a user based on a confidence score associated with each of the at least one n-gram, wherein a confidence score for an n-gram is computed based on comparison of the n-gram with existing word patterns;
determine a direction context entropy coefficient for each of the at least one n-gram, based on the existing word patterns;
iteratively expand one or more of the at least one n-gram by the smallest n-gram unit at each iteration in a predefined direction in the sentence to generate at least one expanded n-gram, based on an associated direction context entropy coefficient;
extract at each expanding iteration one or more of the at least one expanded n-gram based on a confidence score associated with each of the one or more of the at least one expanded n-gram, wherein a confidence score for an expanded n-gram is computed based on comparison of the expanded n-gram with the existing word patterns; and
group semantically linked n-grams from the one or more of the at least one expanded n-gram.
12. The computing device of claim 11, wherein the processor instructions further cause the processor to remove one or more n-grams from the at least one n-gram, wherein the one or more n-grams cannot be expanded by the smallest n-gram unit based on the associated direction context entropy coefficient.
13. The computing device of claim 11, wherein the predefined direction comprises one of left direction and right direction in the sentence with respect to an n-gram from the at least one n-gram.
14. The computing device of claim 11, wherein the confidence score for an n-gram is computed based on association coefficient computed for degree of association of an n-gram relative to at least one adjacent word when compared with the existing word patterns.
15. The computing device of claim 1, wherein the direction context entropy coefficient for an n-gram comprises at least one of a left context entropy coefficient and a right context entropy coefficient,
wherein the n-gram is expanded in the left direction in the sentence, when the left context entropy coefficient for the n-gram is higher than an associated predefined context threshold, and
wherein the n-gram is expanded in the right direction in the sentence, when the right context entropy coefficient for the n-gram is higher than an associated predefined context threshold.
16. The computing device of claim 11, wherein the processor instructions further cause the processor to validate expansion of an n-gram in the predefined direction based on at least one of a cross context entropy coefficient and a reverse cross context entropy coefficient.
17. The computing device of claim 11, wherein the processor instructions further cause the processor to filter one or more of the at least one n-gram and one or more of the at least one expanded n-gram based on associated at least one context divergence coefficient.
18. The computing device of claim 17, wherein each of one or more filtered n-grams comprise low relevancy confidence score determined based on the associated at least one context divergence coefficient.
19. The computing device of claim 11, wherein the processor instructions further cause the processor to limit the number of expanding iterations based on a predefined iteration threshold.

Dated this 20th day of March, 2018

Swetha SN
Of K&S Partners
Agent for the Applicant
IN/PA-2123
, Description:TECHNICAL FIELD
This disclosure relates generally to n-grams and more particularly to method and device for n-gram identification and extraction.

Documents

Application Documents

# Name Date
1 201841010111-STATEMENT OF UNDERTAKING (FORM 3) [20-03-2018(online)].pdf 2018-03-20
2 201841010111-REQUEST FOR EXAMINATION (FORM-18) [20-03-2018(online)].pdf 2018-03-20
3 201841010111-POWER OF AUTHORITY [20-03-2018(online)].pdf 2018-03-20
4 201841010111-FORM 18 [20-03-2018(online)].pdf 2018-03-20
5 201841010111-FORM 1 [20-03-2018(online)].pdf 2018-03-20
6 201841010111-DRAWINGS [20-03-2018(online)].pdf 2018-03-20
7 201841010111-DECLARATION OF INVENTORSHIP (FORM 5) [20-03-2018(online)].pdf 2018-03-20
8 201841010111-COMPLETE SPECIFICATION [20-03-2018(online)].pdf 2018-03-20
9 abstract201841010111.jpg 2018-03-21
10 201841010111-REQUEST FOR CERTIFIED COPY [04-05-2018(online)].pdf 2018-05-04
11 201841010111-Proof of Right (MANDATORY) [30-07-2018(online)].pdf 2018-07-30
12 Correspondence by Agent_Form1_01-08-2018.pdf 2018-08-01
13 201841010111-PETITION UNDER RULE 137 [22-03-2021(online)].pdf 2021-03-22
14 201841010111-OTHERS [22-03-2021(online)].pdf 2021-03-22
15 201841010111-FORM 3 [22-03-2021(online)].pdf 2021-03-22
16 201841010111-FER_SER_REPLY [22-03-2021(online)].pdf 2021-03-22
17 201841010111-DRAWING [22-03-2021(online)].pdf 2021-03-22
18 201841010111-CORRESPONDENCE [22-03-2021(online)].pdf 2021-03-22
19 201841010111-COMPLETE SPECIFICATION [22-03-2021(online)].pdf 2021-03-22
20 201841010111-CLAIMS [22-03-2021(online)].pdf 2021-03-22
21 201841010111-ABSTRACT [22-03-2021(online)].pdf 2021-03-22
22 201841010111-FER.pdf 2021-10-17
23 201841010111-US(14)-HearingNotice-(HearingDate-02-06-2023).pdf 2023-05-08
24 201841010111-POA [17-05-2023(online)].pdf 2023-05-17
25 201841010111-FORM 13 [17-05-2023(online)].pdf 2023-05-17
26 201841010111-Correspondence to notify the Controller [17-05-2023(online)].pdf 2023-05-17
27 201841010111-AMENDED DOCUMENTS [17-05-2023(online)].pdf 2023-05-17
28 201841010111-US(14)-ExtendedHearingNotice-(HearingDate-08-06-2023).pdf 2023-06-01
29 201841010111-Correspondence to notify the Controller [05-06-2023(online)].pdf 2023-06-05
30 201841010111-Written submissions and relevant documents [23-06-2023(online)].pdf 2023-06-23
31 201841010111-FORM-26 [23-06-2023(online)].pdf 2023-06-23
32 201841010111-FORM 3 [23-06-2023(online)].pdf 2023-06-23
33 201841010111-PatentCertificate27-07-2023.pdf 2023-07-27
34 201841010111-IntimationOfGrant27-07-2023.pdf 2023-07-27

Search Strategy

1 2020-09-2913-05-34E_29-09-2020.pdf

ERegister / Renewals

3rd: 16 Oct 2023

From 20/03/2020 - To 20/03/2021

4th: 16 Oct 2023

From 20/03/2021 - To 20/03/2022

5th: 16 Oct 2023

From 20/03/2022 - To 20/03/2023

6th: 16 Oct 2023

From 20/03/2023 - To 20/03/2024

7th: 18 Mar 2024

From 20/03/2024 - To 20/03/2025

8th: 17 Mar 2025

From 20/03/2025 - To 20/03/2026