System And Method For Authenticating Customer

< Back

System And Method For Authenticating Customer

Abstract: ABSTRACT SYSTEM AND METHOD FOR AUTHENTICATING CUSTOMER A system and a method for authenticating a customer are provided. The method comprises receiving, from an image source, a series of image frames, with each image frame of the series of image frames having captured at least a face of the customer therein. The method further comprises generating facial embeddings from at least one of temporal pitch value and temporal yaw value of the face for each image frame of the series of image frames. The method further comprises processing the facial embeddings, by implementing a trained liveness classifier, to compute a probability of liveness of the series of image frames. The method further comprises confirming facial liveness for the customer based on the computed probability of liveness being above a predefined movement threshold. FIG. 4

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #

Filing Date

10 December 2021

Publication Number

07/2022

Publication Type

INA

Invention Field

COMPUTER SCIENCE

Status

sujit@jupiterlawpartners.com

Parent Application

Applicants

IDFC First Bank

BKC - Naman Branch NAMAN CHAMBERS, C-32, G BLOCK, BANDRA EAST, BANDRA KURLA COMPLEX, MUMBAI, MAHARASHTRA, 400051

Inventors

1. Vivek Kalyanarangan

C1303 Lake Pleasant Powai Mumbai – 400076

2. Nida Aibani

3rd floor Flat no 11 Mohini Mansion, Mori Road, Mahim, Mumbai - 400016

Specification

Claims:WE CLAIM:
1. A method for authenticating a customer, comprising:
receiving, from an image source, a series of image frames, with each image frame of the series of image frames having captured at least a face of the customer therein;
generating facial embeddings from at least one of temporal pitch value and temporal yaw value of the face for each image frame of the series of image frames;
processing the facial embeddings, by implementing a trained liveness classifier, to compute a probability of liveness of the series of image frames; and
confirming facial liveness for the customer based on the computed probability of liveness being above a predefined movement threshold.

2. The method as claimed in claim 1 further comprising:
processing at least one image frame of the series of image frames to generate a facial encoding vector corresponding to the face of the customer;
comparing the generated facial encoding vector with pre-generated facial encoding vectors stored in a customer dataset, to determine Euclidean distances therefrom; and
confirming facial recognition for the customer based on a nearest match for the Euclidean distance being within a predefined distance threshold.

3. The method as claimed in claim 2, wherein the facial encoding vector is generated using a vector generation module implementing a Siamese architecture based neural network optimized to minimize triplet loss.

4. The method as claimed in claim 1, wherein the step of generating facial embeddings further comprises:
identifying a plurality of facial landmarks from each image frame of the series of image frames; and
generating the facial embeddings for each image frame using the corresponding identified plurality of facial landmarks.

5. The method as claimed in claim 1, wherein the facial embeddings are generated using an embeddings generation module implementing a transformer based machine learning model.

6. The method as claimed in claim 1, wherein the step of receiving the series of image frames comprises:
providing one or more instructions to the customer to move a head thereof according to a generally predefined movement; and
receiving, from the image source, the series of image frames, while the customer is moving the head thereof as per the one or more instructions.

7. The method as claimed in claim 6, wherein the predefined movement comprises one or more of a nod and a shake of the head by the customer.

8. The method as claimed in claim 6, wherein the step of processing the facial embeddings comprises determining a change in at least one of the temporal pitch values and the temporal yaw values of the face between consecutive image frames of the series of image frames due to the movement of the face, with the determined change corresponding to the computed probability of liveness.

9. The method as claimed in claim 1 further comprising enhancing quality of one or more image frames of the series of image frames using a facial feature enhancement module implementing a pre-trained generator-discriminator architecture.

10. A system for authenticating a customer, comprising:
an image source configured to capture image frames; and
a processing arrangement configured to:
receive, from the image source, a series of image frames, with each image frame of the series of image frames having captured at least a face of the customer therein;
generate facial embeddings from at least one of temporal pitch value and temporal yaw value of the face for each image frame of the series of image frames;
process the facial embeddings, by implementing a trained liveness classifier, to compute a probability of liveness of the series of image frames; and
confirm facial liveness for the customer based on the computed probability of liveness being above a predefined movement threshold.

11. The system as claimed in claim 10 further comprising a customer dataset having at least one image of each of customers with pre-generated facial encoding vector thereof stored therein.

12. The system as claimed in claim 11, wherein the processing arrangement is further configured to:
process at least one image frame of the series of image frames to generate a facial encoding vector corresponding to the face of the customer;
compare the generated facial encoding vector with the pre-generated facial encoding vectors stored in the customer dataset, to determine Euclidean distances therefrom; and
confirm facial recognition for the customer based on a nearest match for the Euclidean distance being within a predefined distance threshold.

13. The system as claimed in claim 12 further comprising a vector generation module implementing a Siamese architecture based neural network optimized to minimize triplet loss and implemented to generate the facial encoding vector.

14. The system as claimed in claim 10 further comprising an embeddings generation module implementing a transformer based machine learning model to generate the facial embeddings.

15. The system as claimed in claim 10 further comprising a facial feature enhancement module implementing a pre-trained generator-discriminator architecture for enhancing quality of one or more image frames of the series of image frames. ,
Description:SYSTEM AND METHOD FOR AUTHENTICATING CUSTOMER

FIELD OF THE PRESENT DISCLOSURE
[0001] The present disclosure generally relates to user authentication, and particularly to a system and method for authenticating a customer by implementing facial recognition along with liveness detection in a single workflow.

BACKGROUND
[0002] Biometric authentication techniques are being widely implemented in many sectors, including banking, for customer authentication, such as for authorizing a transaction thereby. Facial verification is one of the most important techniques at the disposal of a bank to ensure fraud prevention and enhanced user experience. In facial verification, the face of a person is used for identity verification and authentication. In such techniques, face information extracted from one or more captured images is compared to available face information which might have been gathered as enrolment data during onboarding of the user. If the face information from the captured images matches with the available face information from the enrolment data, then the facial verification succeeds. However, it has been found that most of the known facial verification techniques can be spoofed. In an example, a bad actor could pass facial verification simply by presenting a picture of the face of an authorized user obtained in any manner. A possible countermeasure to such fraud is using some form of liveness detection means which attempts to determine whether the person in front of image source for which the image(s) may have been captured is indeed alive. In particular, liveness detection refers to techniques of detecting whether an entity, which may exhibit what are ostensibly human characteristics, is actually a real, living being or is a non-living entity masquerading as such.
[0003] There are known authentication systems which use liveness detection, with their own advantages and disadvantages. In one example, the liveness detection can be implemented using stereo cameras which may be employing infrared based sensors or the like. Such set-up can capture a 3D face image and create a facial model with facial depth information to make accurate identification at some key feature points, including eyes, ears and nose, and calculate space information, such as pupil distance, nose height, distance from eyes to mouth to ears and so on. It has been found that such set-up has high accuracy, can effectively resist attacks from photos and videos, and has stronger adaptability to light change, complex background environment, and other factors. However, most widely utilized cameras, such as in smartphones, ATMs, etc. are monoscopic cameras and thus adapting the discussed set-up with stereo cameras may not be possible for said implementation in sectors like banking, as the entire infrastructure would need to be changed which may not be feasible.
[0004] The major challenge with monoscopic camera based implementation is that while facial recognition can be achieved effectively, there is no reliable and effective way of liveness detection. Moreover, it has been a challenge with known techniques to achieve both the facial recognition and the liveness detection in a single workflow. Therefore, in light of the foregoing discussion, there exists a need to overcome problems associated with conventional techniques and provide systems and/or methods for authenticating a customer by utilizing both facial recognition and liveness detection techniques in a single workflow with monoscopic camera based implementation.

SUMMARY
[0005] In an aspect, a method for authenticating a customer is provided. The method comprises receiving, from an image source, a series of image frames, with each image frame of the series of image frames having captured at least a face of the customer therein. The method further comprises generating facial embeddings from at least one of temporal pitch value and temporal yaw value of the face for each image frame of the series of image frames. The method further comprises processing the facial embeddings, by implementing a trained liveness classifier, to compute a probability of liveness of the series of image frames. The method further comprises confirming facial liveness for the customer based on the computed probability of liveness being above a predefined movement threshold.
[0006] In one or more embodiments, the method also comprises processing at least one image frame of the series of image frames to generate a facial encoding vector corresponding to the face of the customer; comparing the generated facial encoding vector with pre-generated facial encoding vectors stored in a customer dataset, to determine Euclidean distances therefrom; and confirming facial recognition for the customer based on a nearest match for the Euclidean distance being within a predefined distance threshold.
[0007] In one or more embodiments, the facial encoding vector is generated using a vector generation module implementing a Siamese architecture based neural network optimized to minimize triplet loss.
[0008] In one or more embodiments, the step of generating facial embeddings further comprises: identifying a plurality of facial landmarks from each image frame of the series of image frames; and generating the facial embeddings for each image frame using the corresponding identified plurality of facial landmarks.
[0009] In one or more embodiments, the facial embeddings are generated using an embeddings generation module implementing a transformer based machine learning model.
[0010] In one or more embodiments, the step of receiving the series of image frames comprises: providing one or more instructions to the customer to move a head thereof according to a generally predefined movement; and receiving, from the image source, the series of image frames, while the customer is moving the head thereof as per the one or more instructions.
[0011] In one or more embodiments, the predefined movement comprises one or more of a nod and a shake of the head by the customer.
[0012] In one or more embodiments, the step of processing the facial embeddings comprises determining a change in at least one of the temporal pitch values and the temporal yaw values of the face between consecutive image frames of the series of image frames due to the movement of the face, with the determined change corresponding to the computed probability of liveness.
[0013] In one or more embodiments, the method further comprises enhancing quality of one or more image frames of the series of image frames using a facial feature enhancement module implementing a pre-trained generator-discriminator architecture.
[0014] In another aspect, a system for authenticating a customer is provided. The system comprises an image source configured to capture image frames. The system also comprises a processing arrangement. The processing arrangement is configured to receive, from the image source, a series of image frames, with each image frame of the series of image frames having captured at least a face of the customer therein. The processing arrangement is further configured to generate facial embeddings from at least one of temporal pitch value and temporal yaw value of the face for each image frame of the series of image frames. The processing arrangement is further configured to process the facial embeddings, by implementing a trained liveness classifier, to compute a probability of liveness of the series of image frames. The processing arrangement is further configured to confirm facial liveness for the customer based on the computed probability of liveness being above a predefined movement threshold.
[0015] In one or more embodiments, the system further comprises a customer dataset having at least one image of each of customers with pre-generated facial encoding vector thereof stored therein.
[0016] In one or more embodiments, the processing arrangement is also configured to process at least one image frame of the series of image frames to generate a facial encoding vector corresponding to the face of the customer; compare the generated facial encoding vector with the pre-generated facial encoding vectors stored in the customer dataset, to determine Euclidean distances therefrom; and confirm facial recognition for the customer based on a nearest match for the Euclidean distance being within a predefined distance threshold.
[0017] In one or more embodiments, the system comprises a vector generation module implementing a Siamese architecture based neural network optimized to minimize triplet loss and implemented to generate the facial encoding vector.
[0018] In one or more embodiments, the system comprises an embeddings generation module implementing a transformer based machine learning model to generate the facial embeddings.
[0019] In one or more embodiments, the system comprises a facial feature enhancement module implementing a pre-trained generator-discriminator architecture for enhancing quality of one or more image frames of the series of image frames.
[0020] The foregoing summary is illustrative only and is not intended to be in any way limiting. In addition to the illustrative aspects, embodiments, and features described above, further aspects, embodiments, and features will become apparent by reference to the drawings and the following detailed description.

BRIEF DESCRIPTION OF THE FIGURES
[0021] For a more complete understanding of example embodiments of the present disclosure, reference is now made to the following descriptions taken in connection with the accompanying drawings in which:
[0022] FIG. 1 illustrates a schematic of a system that may reside on and may be executed by a computer, which may be connected to a network, for implementation in authenticating a customer, in accordance with one or more embodiments of the present disclosure;
[0023] FIG. 2 illustrates a schematic of an exemplary computing system for implementation in authenticating a customer, in accordance with one or more embodiments of the present disclosure;
[0024] FIG. 3 illustrates a flowchart listing steps involved in a method for authenticating a customer, in accordance with one or more embodiments of the present disclosure;
[0025] FIG. 4 illustrates a process diagram of a workflow for authenticating a customer, in accordance with one or more embodiments of the present disclosure;
[0026] FIG. 5 illustrates a schematic of a step of capturing image frames, in accordance with one or more embodiments of the present disclosure;
[0027] FIG. 6A illustrates a depiction of one of a predefined movement (nod movement) of a head to be performed by the customer, in accordance with one or more embodiments of the present disclosure;
[0028] FIG. 6B illustrates a depiction of another one of a predefined movement (shake movement) of a head to be performed by the customer, in accordance with one or more embodiments of the present disclosure;
[0029] FIG. 7 illustrates a schematic of a step of cropping face from an image frame, in accordance with one or more embodiments of the present disclosure;
[0030] FIG. 8 illustrates a schematic of a step of enhancing quality of an image frame, in accordance with one or more embodiments of the present disclosure;
[0031] FIG. 9A illustrates a schematic of a step of generating facial encoding vectors for plurality of customers’ images in a customer database, in accordance with one or more embodiments of the present disclosure;
[0032] FIG. 9B illustrates a schematic of a step of generating a facial encoding vector for an image frame, in accordance with one or more embodiments of the present disclosure; and
[0033] FIG. 10 illustrates a schematic of a step of generating facial embeddings, in accordance with one or more embodiments of the present disclosure.

DETAILED DESCRIPTION
[0034] In the following description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the present disclosure. It will be apparent, however, to one skilled in the art that the present disclosure is not limited to these specific details.
[0035] Reference in this specification to “one embodiment” or “an embodiment” means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the present disclosure. The appearance of the phrase “in one embodiment” in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Further, the terms “a” and “an” herein do not denote a limitation of quantity, but rather denote the presence of at least one of the referenced items. Moreover, various features are described which may be exhibited by some embodiments and not by others. Similarly, various requirements are described which may be requirements for some embodiments but not for other embodiments.
[0036] Furthermore, in the following detailed description of the present disclosure, numerous specific details are set forth in order to provide a thorough understanding of the present disclosure. However, it will be understood that the present disclosure may be practiced without these specific details. In other instances, well-known methods, procedures, components, and circuits have not been described in detail so as not to unnecessarily obscure aspects of the present disclosure.
[0037] Embodiments described herein may be discussed in the general context of computer-executable instructions residing on some form of computer-readable storage medium, such as program modules, executed by one or more computers or other devices. By way of example, and not limitation, computer-readable storage media may comprise non-transitory computer-readable storage media and communication media; non-transitory computer-readable media include all computer-readable media except for a transitory, propagating signal. Generally, program modules include routines, programs, objects, components, data structures, etc., that perform particular tasks or implement particular abstract data types. The functionality of the program modules may be combined or distributed as desired in various embodiments.
[0038] Some portions of the detailed description that follows are presented and discussed in terms of a process or method. Although steps and sequencing thereof are disclosed in figures herein describing the operations of this method, such steps and sequencing are exemplary. Embodiments are well suited to performing various other steps or variations of the steps recited in the flowchart of the figure herein, and in a sequence other than that depicted and described herein. Some portions of the detailed descriptions that follow are presented in terms of procedures, logic blocks, processing, and other symbolic representations of operations on data bits within a computer memory. These descriptions and representations are the means used by those skilled in the data processing arts to most effectively convey the substance of their work to others skilled in the art. In the present disclosure, a procedure, logic block, process, or the like, is conceived to be a self-consistent sequence of steps or instructions leading to a desired result. The steps are those utilizing physical manipulations of physical quantities. Usually, although not necessarily, these quantities take the form of electrical or magnetic signals capable of being stored, transferred, combined, compared, and otherwise manipulated in a computer system. It has proven convenient at times, principally for reasons of common usage, to refer to these signals as transactions, bits, values, elements, symbols, characters, samples, pixels, or the like.
[0039] In some implementations, any suitable computer usable or computer readable medium (or media) may be utilized. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. The computer-usable, or computer-readable, storage medium (including a storage device associated with a computing device) may be, for example, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer-readable medium may include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), an optical fibre, a portable compact disc read-only memory (CD-ROM), an optical storage device, a digital versatile disk (DVD), a static random access memory (SRAM), a memory stick, a floppy disk, a mechanically encoded device such as punch-cards or raised structures in a groove having instructions recorded thereon, a media such as those supporting the internet or an intranet, or a magnetic storage device. In some implementations, the computer-usable or computer-readable medium could even be a suitable medium upon which the program is stored, scanned, compiled, interpreted, or otherwise processed in a suitable manner, if necessary, and then stored in a computer memory. In the context of the present disclosure, a computer-usable or computer-readable, storage medium may be any tangible medium that can contain or store a program for use by or in connection with the instruction execution system, apparatus, or device.
[0040] In some implementations, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. In some implementations, such a propagated signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. In some implementations, the computer readable program code may be transmitted using any appropriate medium, including but not limited to the internet, wireline, optical fibre cable, RF, etc. In some implementations, a computer readable signal medium may be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
[0041] In some implementations, computer program code for carrying out operations of the present disclosure may be assembler instructions, instruction-set-architecture (ISA) instructions, machine instructions, machine dependent instructions, microcode, firmware instructions, state-setting data, or either source code or object code written in any combination of one or more programming languages, including an object-oriented programming language such as Java®, Smalltalk, C++ or the like. Java and all Java-based trademarks and logos are trademarks or registered trademarks of Oracle and/or its affiliates. However, the computer program code for carrying out operations of the present disclosure may also be written in conventional procedural programming languages, such as the "C" programming language, PASCAL, or similar programming languages, as well as in scripting languages such as JavaScript, PERL, or Python. In present implementations, the used language for training may be one of Python, TensorflowTM, Bazel, C, C++. Further, decoder in user device (as will be discussed) may use C, C++ or any processor specific ISA. Furthermore, assembly code inside C/C++ may be utilized for specific operation. Also, ASR (automatic speech recognition) and G2P decoder along with entire user system can be run in embedded Linux (any distribution), Android, iOS, Windows, or the like, without any limitations. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user’s computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user’s computer through a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the internet using an Internet Service Provider). In some implementations, electronic circuitry including, for example, programmable logic circuitry, field-programmable gate arrays (FPGAs) or other hardware accelerators, micro-controller units (MCUs), or programmable logic arrays (PLAs) may execute the computer readable program instructions/code by utilizing state information of the computer readable program instructions to personalize the electronic circuitry, in order to perform aspects of the present disclosure.
[0042] In some implementations, the flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus (systems), methods and computer program products according to various implementations of the present disclosure. Each block in the flowchart and/or block diagrams, and combinations of blocks in the flowchart and/or block diagrams, may represent a module, segment, or portion of code, which comprises one or more executable computer program instructions for implementing the specified logical function(s)/act(s). These computer program instructions may be provided to a processor of a general-purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the computer program instructions, which may execute via the processor of the computer or other programmable data processing apparatus, create the ability to implement one or more of the functions/acts specified in the flowchart and/or block diagram block or blocks or combinations thereof. It should be noted that, in some implementations, the functions noted in the block(s) may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved.
[0043] In some implementations, these computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function/act specified in the flowchart and/or block diagram block or blocks or combinations thereof.
[0044] In some implementations, the computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed (not necessarily in a particular order) on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions/acts (not necessarily in a particular order) specified in the flowchart and/or block diagram block or blocks or combinations thereof.
[0045] Referring now to the example implementation of FIG. 1, there is shown a system 100 that may reside on and may be executed by a computer (e.g., computer 12), which may be connected to a network (e.g., network 14) (e.g., the internet or a local area network). Examples of computer 12 may include, but are not limited to, a personal computer(s), a laptop computer(s), mobile computing device(s), a server computer, a series of server computers, a mainframe computer(s), or a computing cloud(s). In some implementations, each of the aforementioned may be generally described as a computing device. In certain implementations, a computing device may be a physical or virtual device. In many implementations, a computing device may be any device capable of performing operations, such as a dedicated processor, a portion of a processor, a virtual processor, a portion of a virtual processor, a portion of a virtual device, or a virtual device. In some implementations, a processor may be a physical processor or a virtual processor. In some implementations, a virtual processor may correspond to one or more parts of one or more physical processors. In some implementations, the instructions/logic may be distributed and executed across one or more processors, virtual or physical, to execute the instructions/logic. Computer 12 may execute an operating system, for example, but not limited to, Microsoft® Windows®; Mac® OS X®; Red Hat® Linux®, or a custom operating system. (Microsoft and Windows are registered trademarks of Microsoft Corporation in the United States, other countries or both; Mac and OS X are registered trademarks of Apple Inc. in the United States, other countries or both; Red Hat is a registered trademark of Red Hat Corporation in the United States, other countries or both; and Linux is a registered trademark of Linus Torvalds in the United States, other countries or both).
[0046] In some implementations, the instruction sets and subroutines of system 100, which may be stored on storage device, such as storage device 16, coupled to computer 12, may be executed by one or more processors (not shown) and one or more memory architectures included within computer 12. In some implementations, storage device 16 may include but is not limited to: a hard disk drive; a flash drive, a tape drive; an optical drive; a RAID array (or other array); a random-access memory (RAM); and a read-only memory (ROM). In some implementations, network 14 may be connected to one or more secondary networks (e.g., network 18), examples of which may include but are not limited to: a local area network; a wide area network; or an intranet, for example.
[0047] In some implementations, computer 12 may include a data store, such as a database (e.g., relational database, object-oriented database, triplestore database, etc.) and may be located within any suitable memory location, such as storage device 16 coupled to computer 12. In some implementations, data, metadata, information, etc. described throughout the present disclosure may be stored in the data store. In some implementations, computer 12 may utilize any known database management system such as, but not limited to, DB2, in order to provide multi-user access to one or more databases, such as the above noted relational database. In some implementations, the data store may also be a custom database, such as, for example, a flat file database or an XML database. In some implementations, any other form(s) of a data storage structure and/or organization may also be used. In some implementations, system 100 may be a component of the data store, a standalone application that interfaces with the above noted data store and/or an applet / application that is accessed via client applications 22, 24, 26, 28. In some implementations, the above noted data store may be, in whole or in part, distributed in a cloud computing topology. In this way, computer 12 and storage device 16 may refer to multiple devices, which may also be distributed throughout the network.
[0048] In some implementations, computer 12 may execute application 20 for authenticating a customer. In some implementations, system 100 and/or application 20 may be accessed via one or more of client applications 22, 24, 26, 28. In some implementations, system 100 may be a standalone application, or may be an applet / application / script / extension that may interact with and/or be executed within application 20, a component of application 20, and/or one or more of client applications 22, 24, 26, 28. In some implementations, application 20 may be a standalone application, or may be an applet / application / script / extension that may interact with and/or be executed within system 100, a component of system 100, and/or one or more of client applications 22, 24, 26, 28. In some implementations, one or more of client applications 22, 24, 26, 28 may be a standalone application, or may be an applet / application / script / extension that may interact with and/or be executed within and/or be a component of system 100 and/or application 20. Examples of client applications 22, 24, 26, 28 may include, but are not limited to, a standard and/or mobile web browser, an email application (e.g., an email client application), a textual and/or a graphical user interface, a customized web browser, a plugin, an Application Programming Interface (API), or a custom application. The instruction sets and subroutines of client applications 22, 24, 26, 28, which may be stored on storage devices 30, 32, 34, 36, coupled to user devices 38, 40, 42, 44, may be executed by one or more processors and one or more memory architectures incorporated into user devices 38, 40, 42, 44.
[0049] In some implementations, one or more of storage devices 30, 32, 34, 36, may include but are not limited to: hard disk drives; flash drives, tape drives; optical drives; RAID arrays; random access memories (RAM); and read-only memories (ROM). Examples of user devices 38, 40, 42, 44 (and/or computer 12) may include, but are not limited to, a personal computer (e.g., user device 38), a laptop computer (e.g., user device 40), a smart/data-enabled, cellular phone (e.g., user device 42), a notebook computer (e.g., user device 44), a tablet (not shown), a server (not shown), a television (not shown), a smart television (not shown), a media (e.g., video, photo, etc.) capturing device (not shown), and a dedicated network device (not shown). User devices 38, 40, 42, 44 may each execute an operating system, examples of which may include but are not limited to, Android®, Apple® iOS®, Mac® OS X®; Red Hat® Linux®, or a custom operating system.
[0050] In some implementations, one or more of client applications 22, 24, 26, 28 may be configured to effectuate some or all of the functionality of system 100 (and vice versa). Accordingly, in some implementations, system 100 may be a purely server-side application, a purely client-side application, or a hybrid server-side / client-side application that is cooperatively executed by one or more of client applications 22, 24, 26, 28 and/or system 100.
[0051] In some implementations, one or more of client applications 22, 24, 26, 28 may be configured to effectuate some or all of the functionality of application 20 (and vice versa). Accordingly, in some implementations, application 20 may be a purely server-side application, a purely client-side application, or a hybrid server-side / client-side application that is cooperatively executed by one or more of client applications 22, 24, 26, 28 and/or application 20. As one or more of client applications 22, 24, 26, 28, system 100, and application 20, taken singly or in any combination, may effectuate some or all of the same functionality, any description of effectuating such functionality via one or more of client applications 22, 24, 26, 28, system 100, application 20, or combination thereof, and any described interaction(s) between one or more of client applications 22, 24, 26, 28, system 100, application 20, or combination thereof to effectuate such functionality, should be taken as an example only and not to limit the scope of the disclosure.
[0052] In some implementations, one or more of users 46, 48, 50, 52 may access computer 12 and system 100 (e.g., using one or more of user devices 38, 40, 42, 44) directly through network 14 or through secondary network 18. Further, computer 12 may be connected to network 14 through secondary network 18, as illustrated with phantom link line 54. System 100 may include one or more user interfaces, such as browsers and textual or graphical user interfaces, through which users 46, 48, 50, 52 may access system 100.
[0053] In some implementations, the various user devices may be directly or indirectly coupled to network 14 (or network 18). For example, user device 38 is shown directly coupled to network 14 via a hardwired network connection. Further, user device 44 is shown directly coupled to network 18 via a hardwired network connection. User device 40 is shown wirelessly coupled to network 14 via wireless communication channel 56 established between user device 40 and wireless access point (i.e., WAP) 58, which is shown directly coupled to network 14. WAP 58 may be, for example, an IEEE 802.11a, 802.11b, 802.11g, Wi-Fi®, RFID, and/or BluetoothTM (including BluetoothTM Low Energy) device that is capable of establishing wireless communication channel 56 between user device 40 and WAP 58. User device 42 is shown wirelessly coupled to network 14 via wireless communication channel 60 established between user device 42 and cellular network / bridge 62, which is shown directly coupled to network 14.
[0054] In some implementations, some or all of the IEEE 802.11x specifications may use Ethernet protocol and carrier sense multiple access with collision avoidance (i.e., CSMA/CA) for path sharing. The various 802.11x specifications may use phase-shift keying (i.e., PSK) modulation or complementary code keying (i.e., CCK) modulation, for example, BluetoothTM (including BluetoothTM Low Energy) is a telecommunications industry specification that allows, e.g., mobile phones, computers, smart phones, and other electronic devices to be interconnected using a short-range wireless connection. Other forms of interconnection (e.g., Near Field Communication (NFC)) may also be used.
[0055] FIG. 2 is a block diagram of an example of a computing system 200 capable of implementing embodiments according to the present disclosure. Generally, as discussed herein, the computing system 200 is implemented for authenticating a customer. The present disclosure has been described in terms of implementation for a financial institution (banking) application, such as at an ATM or a store, for financial identity verification and transaction authorization. The term “authenticating” has been used in the context of the present disclosure to include confirming that a person is indeed a live person, and optionally also verifying that the same person is also a customer. With the said financial institution application being typically limited to having availability of only monoscopic cameras, such as widely available from a smartphone of an employee in a store, or installed in ATM machines, the present disclosure achieves the objective of authenticating a customer with monoscopic camera based implementation. In an example, the said financial institution application may be installed in an ATM of the financial institution for customer authentication to process transaction authorization. In another example, the said financial institution application may be implemented in a store to allow a store employee to authenticate a person as a customer of the financial institution, for instance, for extending a credit to the authenticated customer and the like. In yet another example, the said financial institution application may be implemented for onboarding a customer by conforming liveness of the person in a video recording or the like. It would be appreciated that although the present disclosure has been discussed in terms of authenticating a customer; herein, the customer may be any user and the application may be any security application, without any limitations. Further, with the said system 100 of FIG. 1 being executed on a computer, such as the computing system 200, the two terms “system 100” and “computing system 200” have been generally interchangeably used hereinafter to represent means for authenticating a customer, without any limitations.
[0056] In present implementations, as illustrated in FIG. 2, the computing system 200 includes a processing arrangement 205 for running software applications (such as, the application 20 of FIG. 1) and optionally an operating system. Memory 210 stores applications and data for use by the processing arrangement 205. Storage 215 provides non-volatile storage for applications and data and may include fixed disk drives, removable disk drives, flash memory devices, and CD-ROM, DVD-ROM or other optical storage devices. An optional user input device 220 includes devices that communicate user inputs from one or more users to the computing system 200 and may include keyboards, mice, joysticks, touch screens, etc. A communication or network interface 225 is provided which allows the computing system 200 to communicate with other computer systems via an electronic communications network, including wired and/or wireless communication and including an Intranet or the Internet. In one embodiment, the computing system 200 receives instructions and user inputs from a remote computer through communication interface 225. Communication interface 225 can comprise a transmitter and receiver for communicating with remote devices. An optional display device 250 may be provided which can be any device capable of displaying visual information in response to a signal from the computing system 200. The components of the computing system 200, including the processing arrangement 205, the memory 210, the data storage 215, the user input devices 220, the communication interface 225, and the display device 250, may be coupled via one or more data buses 260.
[0057] Herein, the processing arrangement 205 is configured to execute one or more modules for implementation in authenticating a customer, as discussed later in detail. As used herein, the term module refers to an application specific integrated circuit (ASIC), an electronic circuit, a processor (shared, dedicated, or group) and memory that execute one or more software or firmware programs, a combinational logic circuit, or other suitable components that provide the described functionality. In particular, the various modules implemented for the purposes of the present disclosure may be based on correspondingly relevant trained functions which encompass inference engines and other machine learning techniques. Furter, the memory 210 and/or the storage 215 is implemented as a database which may include a customer dataset having at least one image of each of customers with pre-generated facial encoding vector thereof stored therein. It may be appreciated that the said at least one image of each of customers may be obtained in a process of onboarding the customer.
[0058] In some implementations, as illustrated in FIG. 2, a graphics system 230 may be coupled with the data bus 260 and the components of the computing system 200. The graphics system 230 may include a physical graphics processing unit (GPU) 235 and graphics memory. The GPU 235 generates pixel data for output images from rendering commands. The physical GPU 235 can be configured as multiple virtual GPUs that may be used in parallel (concurrently) by a number of applications or processes executing in parallel. For example, mass scaling processes for rigid bodies or a variety of constraint solving processes may be run in parallel on the multiple virtual GPUs. Graphics memory may include a display memory 240 (e.g., a framebuffer) used for storing pixel data for each pixel of an output image. In another embodiment, the display memory 240 and/or additional memory 245 may be part of the memory 210 and may be shared with the processing arrangement 205. Alternatively, the display memory 240 and/or additional memory 245 can be one or more separate memories provided for the exclusive use of the graphics system 230. In another embodiment, graphics system 230 includes one or more additional physical GPUs 255, similar to the GPU 235. Each additional GPU 255 may be adapted to operate in parallel with the GPU 235. Each additional GPU 255 generates pixel data for output images from rendering commands. Each additional physical GPU 255 can be configured as multiple virtual GPUs that may be used in parallel (concurrently) by a number of applications or processes executing in parallel, e.g., processes that solve constraints. Each additional GPU 255 can operate in conjunction with the GPU 235, for example, to simultaneously generate pixel data for different portions of an output image, or to simultaneously generate pixel data for different output images. Each additional GPU 255 can be located on the same circuit board as the GPU 235, sharing a connection with the GPU 235 to the data bus 260, or each additional GPU 255 can be located on another circuit board separately coupled with the data bus 260. Each additional GPU 255 can also be integrated into the same module or chip package as the GPU 235. Each additional GPU 255 can have additional memory, similar to the display memory 240 and additional memory 245, or can share the memories 240 and 245 with the GPU 235. It is to be understood that the circuits and/or functionality of GPU as described herein could also be implemented in other types of processors, such as general-purpose or other special-purpose coprocessors, or within a CPU.
[0059] Referring to FIG. 3, illustrated is a flowchart listing steps involved in a method 300 for authenticating a customer, in accordance with one or more embodiments of the present disclosure. The various steps involved in the present method 300 have been depicted as blocks in the flowchart of FIG. 3, and the details for the same have been provided hereinafter. Also referring to FIG. 4, illustrated is a process diagram of a workflow 400 for authenticating a customer, in accordance with one or more embodiments of the present disclosure. The present workflow 400 provides a single process flow for authenticating a customer by utilizing both facial recognition and liveness detection techniques. The following description has been explained in reference to FIGS. 3 and 4 in combination. Various embodiments and variants disclosed hereinafter, with respect to the present method 300, apply mutatis mutandis to the aforementioned system 100; and vice-versa.
[0060] At step 302, the method 300 includes receiving, from an image source, a series of image frames, with each image frame of the series of image frames having captured at least a face of the customer therein. As shown in FIG. 4, an image source 402 is implemented to capture image frames of a customer (represented by reference numeral 404) therein. Herein, the image source 402 is a monoscopic camera device, as discussed earlier in the description. For the purpose of the present disclosure, this pipeline of capturing image frames of the customer works under certain ring-faced conditions, including that multiple faces cannot be present in a single frame, and the process only works for a single customer at a time; and a proximity between the image source 402 (i.e., the camera device) and the customer is pre-determined, defined as per, but not limited to, one or more properties (like focal length) of the image source 402.
[0061] FIGS. 5 and 6A-6B further depict stages involved in the step of receiving (capturing) the series of image frames. As shown in FIG. 5, an image capturing process 500 is implemented operating under the constraints laid down earlier as per the ring-faced conditions, with the customer 404 standing from the image source 402 with a pre-determined distance between the customer 404 and the image source 402. In an example, the said pre-determined distance is defined as a range of about 2 to 6 meters or the like, without any limitations. Further, as shown in FIGS. 6A-6B, for capturing the series of image frames, the process includes providing one or more instructions to the customer to move a head thereof according to a generally predefined movement. In an example, such instructions may be shown to the customer 404 in the form of visual aid. Such visual aid instructions may be provided as an overlay over a display device (such as, the output screen of the image source 402 itself) with the face of the customer 404 as being captured by the image source 402. Further, such visual aid instructions may be in the form of an animation being visible to the customer on a display device. In another example, such instructions may be in the form of audio commands, generated by any speaker including any speaker associated with the image source 402.
[0062] In one or more embodiments, the predefined movement (i.e., the movement of the head of the customer 404 as instructed thereto) comprises one or more of a nod and a shake of the head by the customer 404. Herein, it may be appreciated that the nod movement 600A (as depicted in FIG. 6A) provides a change in a pitch value of the head of the customer 404. Further, the shake movement (as depicted in FIG. 6B) provides a change in a yaw value of the head of the customer 404. These changes in the pitch value and the yaw value of the head of the customer 404 can be utilized for face pose estimation for the purposes of the present disclosure, as discussed later in some more detail. It may be appreciated that the customer 404 is provided with said instructions (like visual aid) in a display device which may mimic either the nod or the shake (as discussed herein).
[0063] Referring back to FIG. 5, the image source 402 is activated to start capturing the image frames in conjunction with the movement of the head by the customer 404. Herein, the method 300 includes receiving, from the image source 402, the series of image frames, while the customer 404 is moving the head thereof as per the one or more instructions. In the present embodiments, the image source 402 is configured to capture the image frames for about 3-5 seconds. In an example, the image source 402 may capture the image frames for about 3 seconds. In another example, the image source 402 may capture the image frames for about 4 seconds. In yet another example, the image source 402 may capture the image frames for about 5 seconds. In one or more implementations, the image source 402 may be a camera configured to capture 30-60 frames per second (FPS). In an example, the image source 402 may capture the image frames at about 30 FPS. In another example, the image source 402 may capture the image frames at about 45 FPS. In yet another example, the image source 402 may capture the image frames at about 60 FPS. For explanation, for the configuration of capturing the image frames for 3 seconds at 30 FPS, it may be understood that a total of 90 (3 x 30) image frames may be captured. In the illustrated example of FIG. 5, three image frames have been exemplarily shown which have been labelled as image frames 502A, 504A and 506A, with the image frame 502A being a first image frame (also sometimes referred to as base image frame or simply base frame) of the captured image frames 502A-506A. It may be appreciated that the present description using the three image frames is exemplary only and the process may involve a predefined number of image frames (like 90 image frames generated in 3 seconds with 30 FPS camera). All image frames in this duration are captured and sent to the pipeline for data processing in the workflow 400 of FIG. 4.
[0064] As shown in FIG. 4, the image frames 502A-506A are implemented in the workflow 400. Herein, the image frames 502A-506A are processed by a face detection module (as represented by block 406). The face detection module 406 implements computer vision techniques for identifying specific people to marking key points on the face. In present embodiments, the face detection module 406 may implement any one of Haar, dlib, Multi-task Cascaded Convolutional Neural Network (MTCNN), and OpenCV’s DNN module for the purposes of the present disclosure. In a preferable embodiment, the face detection module 406 may implement dlib library for face detection. Such techniques are known in the art and thus have not been described herein for the brevity of the present disclosure. FIG. 7 illustrates a schematic of a step of cropping face from an image frame, in accordance with one or more embodiments of the present disclosure. As shown, one of the received image frames 502A-506A is processed by the face detection module 406 (herein, the first image frame 502A is shown to be processed by the face detection module 406) for cropping out the face of the customer therefrom to generate a processed image frame 502B. Other of the received image frames 504A and 506A are also processed in a similar manner by the face detection module 406 to generate processed image frame 504B and processed image frame 506B, respectively. Herein, in one or more implementations, the face detection module 406 may discard rest of the portions of each of the processed image frames 502A-506A to provide the processed image frames 502B-506B with only corresponding facial features therein.
[0065] Further, as shown in FIG. 4, the workflow 400 involves passing the processed image frames 502B-506B through a facial feature enhancement module (as represented by block 408). The facial feature enhancement module 408 is configured to enhance quality of one or more image frames (in the present implementation, the processed image frames 502B-506B) of the series of image frames. Generally, due to variation in lighting conditions and various other environmental challenges, there is a chance the image captured will be blurry which poses a challenge for the subsequent step. Therefore, the said enhancement of the quality of facial features in the one or more image frames 502B-506B is required to enable for further processing of the image frames, such as for facial recognition as discussed later in the description. In an embodiment of the present disclosure, the facial feature enhancement module 408 implements a pre-trained generator-discriminator architecture for enhancing quality of one or more image frames of the series of image frames 502B-506B. The generator-discriminator architecture is implemented as a generative adversarial network (GAN), which has two parts: a generator which learns to generate plausible data, and generated instances become negative training examples for the discriminator; and a discriminator which learns to distinguish the generator's fake data from real data, with the discriminator penalizing the generator for producing implausible results. Both the generator and the discriminator are neural networks. The generator output is connected directly to the discriminator input. Through backpropagation, the discriminator's classification provides a signal that the generator uses to update its weights. In general to address this, an inference pipeline is setup to enhance the quality of the image through a pre-trained generator-discriminator architecture. This model is pretrained on a dataset of high quality images and corresponding blurry images to optimize the restoration error. Using this pretrained model ensures an enhanced base image is pushed to the next stage of the pipeline. Such implementation may be understood by a person skilled in the art. FIG. 8 illustrates a schematic of a step of enhancing quality of an image frame, in accordance with one or more embodiments of the present disclosure. Herein, each of the processed image frames 502B-506B is further processed by the facial feature enhancement module 408. In the illustrated example, the first processed image frame 502B is shown to be processed by the facial feature enhancement module 408 for enhancing the facial features from the face of the customer therefrom, to generate an enhanced image frame 502C. Other of the processed image frames 504B and 506B are also enhanced in a similar manner by the facial feature enhancement module 408 to generate enhanced image frame 504C and enhanced image frame 506C, respectively.
[0066] Further, as shown in FIG. 4, the workflow 400 involves facial recognition pipeline. Herein, the workflow 400 involves passing the enhanced image frames 502C-506C through a vector generation module (as represented by block 410). The vector generation module 410 is configured to generate facial encoding vectors for the one or more image frames (in the present implementation, the enhanced image frames 502C-506C) of the series of image frames. Herein, the facial encoding vectors are Fourier feature vectors obtained by encoding spatial relations between pixels constituting the image frames 502C-506C. In an embodiment, the vector generation module 410 implements a Siamese architecture based neural network optimized to minimize triplet loss and implemented to generate the facial encoding vector. Siamese architecture based neural network consists of two identical neural networks, each taking one of the two input images; and the last layers of the two networks are then fed to a contrastive loss function, which calculates the similarity between the two images. The two networks are identical neural networks, with the exact same weights. In the present embodiments, the vector generation module 410 is also implemented to pre-generate facial encoding vectors for each customer 404 by processing available at least one image for each customer 404, in a similar manner. Such available image(s) for the customer 404 may be obtained during the process of on-boarding the customer 404 in the present system 100.
[0067] FIG. 9A illustrates a schematic of a step of generating facial encoding vectors for plurality of customers’ images in a customer dataset (as represented by block 412 in FIG. 4), in accordance with one or more embodiments of the present disclosure. It may be appreciated that in the present banking application, the customer dataset 412 is prepared from a database of previously collected face images from KYC (Know Your Customer) based documents from the customer 404, like Govt. issued photo identification documents (such as, PAN card, Aadhar card, etc.). In one or more embodiments of the present disclosure, the vector generation module 410 implements a Siamese architecture based neural network optimized to minimize triplet loss and implemented to generate the facial encoding vector. Herein, the image received in the previous step is further pushed through another deep learning pipeline to generate 128-dimensional facial encoding vector. This neural network is based on the Siamese architecture and is optimized to minimize triplet loss with an anchor, positive and negative image. As shown, the available images of the customer 404 which may be pre-processed, including cropping and enhancement thereof, and as represented by the reference numerals 902A-906A, are processed by the vector generation module 410 to generate facial encoding vectors as represented by the reference numerals 902B-906B. It may be appreciated that for the purpose of this description, only three pre-generated facial encoding vectors have been depicted, but the customer dataset 412 may store hundreds or thousands (or even higher) number of said pre-generated facial encoding vectors depending on number of available images of the customer 404. In some implementations, the pre-generated facial encoding vectors may be stored back in the customer dataset 412. Further, according to embodiments of the present disclosure, the processing arrangement 205 is configured to process at least one image frame (usually, the first image frame after processing and enhancement, i.e., the enhanced image frame 502C) of the series of image frames to generate a facial encoding vector corresponding to the face of the customer 404. FIG. 9B illustrates a schematic of a step of generating a facial encoding vector for an image frame, in accordance with one or more embodiments of the present disclosure. Herein, each of the enhanced image frames 502C-506C is processed by the vector generation module 410. In the illustrated example, the first enhanced image frame 502C is shown to be processed by the vector generation module 410 to generate a facial encoding vector as represented by the reference numeral 502D. Other of the enhanced image frames 504C and 506C are also processed in a similar manner by the vector generation module 410 to generate facial encoding vector 504D and facial encoding vector 506D, respectively.
[0068] Referring back to FIG. 4, the facial recognition pipeline in the workflow 400 further involves, at block 414, comparing the generated facial encoding vector, that is one of the facial encoding vector 502D-506D, usually the first facial encoding vector 502D, with the pre-generated facial encoding vectors (i.e., the pre-generated facial encoding vectors 902B-906B) stored in the customer dataset 412. For this purpose, the processing arrangement 205 is configured to compare the generated facial encoding vector 902B-906B with the pre-generated facial encoding vectors stored in the customer dataset 412. This step involves calculating Euclidean distances for the first facial encoding vector 502D with each of the pre-generated facial encoding vectors 902B-906B. Further, in the workflow 400 of FIG. 4, the facial recognition is confirmed for the customer 404 based on a nearest match for the Euclidean distance being within a predefined distance threshold. That is, if the smallest calculated Euclidean distance is within the said predefined distance threshold, the facial recognition is confirmed. Such predefined distance threshold may be defined based on a desired accuracy of the facial recognition process required for implementation of the embodiments of the present disclosure. If confirmed, the workflow 400 moves to liveness detection pipeline (as discussed in the proceeding paragraphs) for further authentication; and if not, the workflow 400 moves to block 418 and the process is terminated.
[0069] According to embodiments of the present disclosure, the workflow 400 implements liveness detection as part of process for authentication of the customer 404 from the captured image frames 502A-506A. This stage of the liveness detection corresponds to a step 304 of the present method 300 for authenticating the customer 404. The step 304 includes generating facial embeddings from at least one of temporal pitch value and temporal yaw value of the face for each image frame of the series of image frames 502A-506A. FIG. 10 illustrates a schematic of a step of generating facial embeddings, in accordance with one or more embodiments of the present disclosure. For this purpose, the system 100 implements an embeddings generation module 420. In particular, the step of generating facial embeddings includes identifying a plurality of facial landmarks (as represented by the reference numeral 422), as an intermediate output, from each image frame of the series of image frames (herein, the enhanced image frames 502C-506C). Further, the said step of generating facial embeddings includes generating the facial embeddings (as represented by the reference numeral 424) for each image frame 502C, 504C, 506C using the corresponding identified plurality of facial landmarks 422. Herein, each of the image frames 502C-506C is processed by the embeddings generation module 420. In the illustrated example, the first image frame 502C is shown to be processed by the embeddings generation module 420 to generate facial embeddings 424. Other of the image frames 504C and 506C are also processed in a similar manner by the embeddings generation module 420 to generate facial embeddings 424.
[0070] In one or more embodiments, the embeddings generation module 420 implements a transformer based machine learning model to generate the facial embeddings 424. Such model is a deep learning model that adopts the mechanism of self-attention, differentially weighting the significance of each part of the input data. In particular, the facial embeddings 424 are computed using a transformer architecture neural network, specifically using attention mechanism. In one or more implementations, the embeddings generation module 420 computes relative orientation of the image source 402 from the face of the customer 404. This is passed through a Resnet50 model to collect encodings which is then pushed through three custom fully connected layers to provide the three desired outputs at a final SoftMax layer, including the pitch, yaw and roll parameters seen in the face. The embeddings generation module 420 that runs this inference is trained to minimize the yaw loss, roll loss and pitch loss in a pre-labelled dataset of face pose data. The output then provides three angles along three dimensions of roll, pitch and yaw that helps in triangulating the 3D orientation of a face in a 2D monocular vision setup of the image source 402.
[0071] Further, at step 306, the method 300 includes processing the facial embeddings 424, by implementing a trained liveness classifier, to compute a probability of liveness of the series of image frames 502A-506A. For this purpose, in the workflow 400 of FIG. 4, a maximum change in pitch value and/or yaw value is determined, depending on the movement of the head performed by the customer 404. In one or more embodiments, the step of processing the facial embeddings 424 includes determining a change in at least one of the temporal pitch values and the temporal yaw values of the face between consecutive image frames 502A-506A of the series of image frames due to the movement of the face, with the determined change corresponding to the computed probability of liveness. In the workflow 400 of FIG. 4, as shown, at block 426, it is checked if the change in determined pitch value is above a corresponding predefined threshold. Further, at block 428, it is checked if the change in determined yaw value is above a corresponding predefined threshold. Further, at block 416, it is checked if the determined landmark distance is above a corresponding predefined threshold. If any one of the said three conditions is not met, the workflow 400 moves to the block 418 and the authentication of the customer 404 is rejected. If all three conditions are met, the workflow 400 involves generating trajectory embeddings (as represented by block 430) which represents change in at least one of the temporal pitch values and the temporal yaw values of the face between consecutive image frames 502A-506A of the series of image frames due to the movement of the face by the customer 404. For this purpose, the facial embeddings 424 are combined with the facial encoding vector 502D generated from the base image 502A. It may be understood that if a bad actor is trying to gain access by holding a victim’s image in a mobile device or a physical print, the rotation movement task to be performed in data capture stage is performed by moving the image. So, in this case, at a conceptual level there is a focus on translatory motion rather than a rotational motion. Given the above examples, it should be clear that there has to be a certain threshold of change in the pitch value and/or the yaw value (depending on whether the task to be performed is a nod or shake respectively) from the base image frame 502A to the subsequent image frames 504A-506A. So, the maximum change in the pitch value and/or the yaw value is calculated and checked if it is above a certain threshold. If this is satisfied, it moves to the next step, otherwise goes through the rejection workflow.
[0072] Further, a liveness classifier (as represented by block 432), which is a binary machine learning based model, is executed for further processing. Herein, the liveness classifier 432 is trained on data that discriminates between live video versus an imposter video. It may be noted that, herein, the liveness detection is being treated as a binary classification problem. The liveness classifier 432 is implemented as a machine learning algorithm optimized to minimize the log-loss of the liveness binary label. A pre-labelled dataset for liveness is used for training the liveness classifier 432. So, given an input image, the liveness classifier 432, which would be, for example, a Convolutional Neural Network (CNN), is trained to be capable of distinguishing real faces from fake/spoofed faces. In the present implementations, the liveness classifier 433 is executed as a custom trained classifier which in the inference pipeline is configured to provide a probability score of liveness. The X variables for the liveness classifier 432 is the facial encoding vector 502D of the base image frame 502A, concatenated with a 32-dimensional vector for the facial embeddings 424 generated by the temporal yaw and pitch values of the sequential set of image frames 502A-506A.
[0073] Further, at step 308, the method 300 includes confirming facial liveness for the customer 404 based on the computed probability of liveness being above a predefined movement threshold. That is, as shown in the workflow 400 of FIG. 4, if the probability of facial liveness as determined in the step 306 (described in the preceding paragraphs) is above the predefined movement threshold, then the liveness detection is confirmed, and with the facial recognition already confirmed, the process moves to block 434 to confirm authentication of the customer. If not, the process moves to the block 418 to reject authentication of the customer. Herein, it may be appreciated that the said predefined movement threshold may be defined based on a desired accuracy of the facial liveness detection process required for implementation of the embodiments of the present disclosure.
[0074] The present disclosure provides a unique facial recognition system along with liveness detection in one single workflow in real time. The system and the method of the present disclosure implement widely available monoscopic cameras having a combination of onboard and remote computing systems for running artificial intelligence/ machine learning based computations for image processing, face detection, recognition and liveness verification. The end-end workflow 400 as proposed in the present disclosure includes face detection, face recognition, and liveness detection through face pose estimation with multiple frames to disambiguate rotation and translation movement between live and fake faces respectively. The present disclosure leverages facial features to compare against a database of customers to authenticate and verify the face match. This implementation can be applied to verify a customer to be onboarded to a banking product or authorize a transaction for an ongoing customer. Thereby, the system and the method of the present disclosure ensure fraud prevention and enhanced user experience in the banking sector with seamless biometric (facial) authentication, without any infrastructural change.
[0075] The foregoing descriptions of specific embodiments of the present disclosure have been presented for purposes of illustration and description. They are not intended to be exhaustive or to limit the present disclosure to the precise forms disclosed, and obviously many modifications and variations are possible in light of the above teaching. The exemplary embodiment was chosen and described in order to best explain the principles of the present disclosure and its practical application, to thereby enable others skilled in the art to best utilize the present disclosure and various embodiments with various modifications as are suited to the particular use contemplated.

Documents

Application Documents

#	Name	Date
1	202121057490-FORM-9 [10-12-2021(online)].pdf	2021-12-10
2	202121057490-FORM 1 [10-12-2021(online)].pdf	2021-12-10
3	202121057490-DRAWINGS [10-12-2021(online)].pdf	2021-12-10
4	202121057490-DECLARATION OF INVENTORSHIP (FORM 5) [10-12-2021(online)].pdf	2021-12-10
5	202121057490-COMPLETE SPECIFICATION [10-12-2021(online)].pdf	2021-12-10
6	Abstract1.jpg	2021-12-16
7	202121057490-FORM-26 [14-02-2022(online)].pdf	2022-02-14
8	202121057490-Proof of Right [31-05-2022(online)].pdf	2022-05-31