Method And System For Face Detection And Recognition In Big Data

Method And System For Face Detection And Recognition In Big Data Environment

Abstract: Face detection and face recognition are being used widely for various applications. The existing methods for face detection and recognition are complex and time consuming. Further it involves huge amount of data due to which it is difficult to handle on traditional environment. A method and system for face detection and recognition of a person in an image in a big data environment has been provided. The method involves building an exhaustive dynamic external database on big data platform based on web-crawling & CNN techniques. Further, it is using deep learning based techniques for detecting and recognizing face of the person and collecting the data of the person from various web as well as internal sources and then showing his linkages with incidence of interest in which the person was involved, entities of interests i.e. his relatives and associated organization.

Patent Information

Application #

Filing Date

30 November 2018

Publication Number

23/2020

Publication Type

INA

Invention Field

COMPUTER SCIENCE

Status

Email

kcopatents@khaitanco.com

Parent Application

Patent Number

Legal Status

Grant Date

2024-02-29

Renewal Date

Applicants

Tata Consultancy Services Limited

Nirmal Building, 9th floor, Nariman point, Mumbai 400021

Inventors

1. SINGH, Divya

Tata Consultancy Services Limited Plot No. 2, 3, RGIP, Phase III, Hinjawadi-Maan, Pune 411057, Maharashtra, India

Specification

Claims:WE CLAIM:

1. A method (200) for face detection and recognition of a person in an image in a big data environment, the method comprising a processor implemented steps of:
extracting all the images of the person from a plurality of internal sources (202);
extracting an information about the person from a plurality of external sources using a set of web crawling techniques (204);
creating an exhaustive dynamic database by collating all the extracted images from the internal sources and the extracted information (206);
extracting a plurality of features of the face of the person (208);
creating a model using the extracted plurality of features, wherein the model is created using a set of supervised machine learning techniques (210);
providing the image as an input to detect the presence of the person in the image (212);
detecting the presence of all the faces in the image using a computational neural network (CNN) technique (214);
querying all the detected faces as an input request in the exhaustive dynamic database (216);
recognizing all the detected face across the exhaustive dynamic database using the created model (218); and
sharing a profile page of the person if recognized, wherein the profile page comprises linkages and various incidence of interest and entities of the interest of the person (220).

2. The method of claim 1 further comprising the step of giving the results as a top closest match to the person.

3. The method of claim 1 further comprising the step of creating separate profile page for all the detecting faces in the image.

4. The method of claim 1 further comprising the step of classifying the recognized person using support vector machine (SVM) classifier on the extracted features.

5. The method of claim 1, wherein the plurality of external sources comprises at least one or more of RSS feeds, social media or websites.

6. The method of claim 1 wherein the plurality of internal sources comprises one or more of an internal report sources from PDFs, MS word, images or PPT documents.

7. The method of claim 1 wherein the CNN technique uses a maximum-margin object detector (MMOD) with CNN based features for face detection.

8. The method of claim 1, wherein the created model is stored in the exhaustive dynamic database.

9. A system (100) for face detection and recognition of a person in an image in a big data environment, the system comprises:

an input module (102) for providing the image as an input to detect the presence of the person in the image;
a memory (104); and
a processor (106) in communication with the memory, the processor further comprises:
an extraction module (108) for extracting all the images of the person from a plurality of internal sources;
a web crawler (110) for extracting an information about the person from a plurality of external sources;
a database creation module (112) for creating an exhaustive dynamic database (126) by collating all the extracted images from the internal sources and the extracted information;
a feature extraction module (114) for extracting a plurality of features of the face of the person;
a model creation module (116) for creating a model using the extracted plurality of features, wherein the model is created using a set of supervised machine learning techniques;
a face detection module (118) for detecting the presence of all the faces in the image using a computational neural network (CNN) technique;
a querying module (120) for querying all the detected faces as an input request in the exhaustive dynamic database;
a face recognition module (122) for recognizing all the detected face across the exhaustive dynamic database using the created model; and
a profile generation module (124) for sharing a profile page of the person if recognized, wherein the profile page comprises linkages and various incidence of interest and entities of the interest of the person.
, Description:TECHNICAL FIELD

[001] The embodiments herein generally relates to the field of face recognition. More particularly, the invention provides a system and method for detecting and recognizing a face in an image in the big data environment.

BACKGROUND

[002] Face detection and face recognition has found its relevance in various fields. This has been used in marketing, photography, surveillance, ID verification, national security against terrorism and various other applications. With the advent of internet, there has been a huge amount of data available on the web in various forms. This data may contain valuable information of interest. Out of those detecting faces and recognizing faces have become important field of research.
[003] Face detection and recognition is a complex problem due to different variations in face, social media analytics taxonomy and huge data available on the internet. Due to diversified range of data, traditional web crawling techniques that rely on pattern of URL and document object model structure of webpage may not prove very efficient. Further customary face recognition algorithms based on template matching, Eigen faces may also not be efficient due to age, pose and illumination variations.
[004] In addition to this, the data is growing at a vast speed making it difficult to handle such large amount of data using traditional software tools available. Therefore, processes like face detection and recognition image processing techniques are also being integrated to big data environment. This also imposes further challenges.

SUMMARY

[005] The following presents a simplified summary of some embodiments of the disclosure in order to provide a basic understanding of the embodiments. This summary is not an extensive overview of the embodiments. It is not intended to identify key/critical elements of the embodiments or to delineate the scope of the embodiments. Its sole purpose is to present some embodiments in a simplified form as a prelude to the more detailed description that is presented below.
[006] In view of the foregoing, an embodiment herein provides a system for face detection and recognition of a person in an image in a big data environment, the system comprises input module, a memory and a processor in communication with the memory. The input module provides the image as an input to detect the presence of the person in the image. The processor further comprises an extraction module, a web crawler, a database creation module, a feature extraction module, a model creation module, face detection module, a querying module, a face recognition module and a profile generation module. The extraction module extracts all the images of the person from a plurality of internal sources. The web crawler extracts an information about the person from a plurality of external sources. The database creation module creates an exhaustive dynamic database by collating all the extracted images from the internal sources and the extracted information. The feature extraction module extracts a plurality of features of the face of the person. The model creation module creates a model using the extracted plurality of features, wherein the model is created using a set of supervised machine learning techniques. The face detection module detects the presence of all the faces in the image using a computational neural network (CNN) technique. The querying module queries all the detected faces as an input request in the exhaustive dynamic database. The face recognition module recognizes all the detected face across the exhaustive dynamic database using the created model. The profile generation module shares a profile page of the person if recognized, wherein the profile page comprises linkages and various incidence of interest and entities of the interest of the person.
[007] In another aspect the embodiment here provides a method for face detection and recognition of a person in an image in a big data environment. Initially, all the images of the person are extracted from a plurality of internal sources. And an information about the person is also extracted from a plurality of external sources using a set of web crawling techniques. In the next step, an exhaustive dynamic database is created by collating all the extracted images from the internal sources and the extracted information. Further, a plurality of features are extracted from the face of the person. In the next step, a model is created using the extracted plurality of features, wherein the model is created using a set of supervised machine learning techniques. The image is then provided as an input to detect the presence of the person in the image. In the next step, the presence of all the faces is detected in the image using a computational neural network (CNN) technique. In the next step, all the detected faces are queried as an input request in the exhaustive dynamic database. All the detected face are then recognized across the exhaustive dynamic database using the created model. And finally, a profile page of the person is shared if recognized, wherein the profile page comprises linkages and various incidence of interest and entities of the interest of the person.
[008] It should be appreciated by those skilled in the art that any block diagram herein represent conceptual views of illustrative systems embodying the principles of the present subject matter. Similarly, it will be appreciated that any flow charts, flow diagrams, state transition diagrams, pseudo code, and the like represent various processes which may be substantially represented in computer readable medium and so executed by a computing device or processor, whether or not such computing device or processor is explicitly shown.

BRIEF DESCRIPTION OF THE DRAWINGS

[009] The accompanying drawings, which are incorporated in and constitute a part of this disclosure, illustrate exemplary embodiments and, together with the description, serve to explain the disclosed principles.
[010] Fig. 1 illustrates a block diagram of a system for face detection and recognition of a person in an image in a big data environment according to an embodiment of the present disclosure;
[011] Fig. 2 shows a workflow of the system using the big data environment according to an embodiment of the disclosure; and
[012] Fig. 3A-3B is a flowchart illustrating the steps involved in face detection and recognition of a person in an image in a big data environment according to an embodiment of the present disclosure.

DETAILED DESCRIPTION

[013] Exemplary embodiments are described with reference to the accompanying drawings. In the figures, the left-most digit(s) of a reference number identifies the figure in which the reference number first appears. Wherever convenient, the same reference numbers are used throughout the drawings to refer to the same or like parts. While examples and features of disclosed principles are described herein, modifications, adaptations, and other implementations are possible without departing from the spirit and scope of the disclosed embodiments. It is intended that the following detailed description be considered as exemplary only, with the true scope and spirit being indicated by the following claims.
[014] Referring now to the drawings, and more particularly to Fig. 1 through Fig. 3, where similar reference characters denote corresponding features consistently throughout the figures, there are shown preferred embodiments and these embodiments are described in the context of the following exemplary system and/or method.
[015] According to an embodiment of the disclosure, a system 100 for face detection and recognition of a person in an image in a big data environment is shown in the block diagram of Fig. 1. The system 100 is using deep learning based techniques for detecting and recognizing face of the person and collecting the data of the person from various web as well as internal sources and then showing his linkages with incidence of interest in which the person was involved, entities of interests i.e. his relatives and associated organization.
[016] According to an embodiment of the disclosure, the system 100 further comprises an input module 102, a memory 104 and a processor 106 as shown in the block diagram of Fig. 1. The processor 106 works in communication with the memory 104. The processor 106 further comprises a plurality of modules. The plurality of modules access the set of algorithms stored in the memory 104 to perform certain functions. The processor 106 further comprises an extraction module 108, a web crawler 110, a database creation module 112, a feature extraction module 114, a model creation module 116, a face detection module 118, a querying module 120, a face recognition module 122 and a profile generation module 124.
[017] According to an embodiment of the disclosure the input module 102 is configured to provide the image as an input to the system 100 to detect the presence of the person in the image. The image is to be used by the system 100 to detect the presence of one or more person. The image can be provided in any format such as jpg, pdf, doc etc. The image can also be taken from any other document. It should be appreciated that the image can also be provided in the form of website link at which the image is present. The input module 102 can include a variety of software and hardware interfaces, for example, a web interface, a graphical user interface, and the like and can facilitate multiple communications within a wide variety of networks N/W and protocol types, including wired networks, for example, LAN, cable, etc., and wireless networks, such as WLAN, cellular, or satellite.
[018] According to an embodiment of the disclosure, the processor 106 further comprises the extraction module 108 and the web crawler 110. The extraction module 108 is configured to extract all the images of the person from a plurality of internal sources. The plurality of internal sources comprises one or more of an internal report sources from PDFs, MS word, images or PPT documents. The web crawler 110 is configured to extract the information about the person from a plurality of external sources. The plurality of external sources comprises at least one or more of RSS feeds, social media or websites.
[019] According to an embodiment of the disclosure, the processor 106 further comprises the database creation module 112. The database creation module 112 is configured to create an exhaustive dynamic database 126 by collating all the extracted images from the internal sources and the extracted information from the plurality of external source. The database 126 is dynamic database as it is keep on updating for different person over a period of time. Thus, the database 126 is created only once. Because of huge size, the database 126 is hosted and executed on the Hadoop distributed file system (HDFS) as explained in the later part of the disclosure.
[020] According to an embodiment of the disclosure, the processor 106 further comprises the feature extraction module 114 and the model creation module 116. The feature extraction module 114 is configured to extract a plurality of features of the face of the person. Normally the image of the person from which the features is being extracted should be a high quality image of the person, this will help in generating good quality of the plurality of features. The model creation module 116 is configured to create a model using the extracted plurality of features, wherein the model is created using a set of supervised machine learning techniques. The model identifies the plurality of features as vectors and creates a bounding box around all possible faces is created.
[021] According to an embodiment of the disclosure, the processor 106 further comprises the face detection module 118. The face detection module 118 is configured to detect the presence of all the faces in the image using a computational neural network (CNN) technique. The face detection module 118 uses a Maximum-Margin Object Detector (MMOD) with CNN based features. The CNN based detectors are feasible to detect the faces at any angle thus make it more robust. Once faces are detected, then an update or new entry is made for each document and its images in tables in hive with document name (this is made unique by adding timestamp) as the link. Original image or document along with these images are stored in HDFS with its reference in hive table.
[022] According to an embodiment of the disclosure, the processor 106 further comprises the querying module 120 and the face detection module 122. The querying module 120 is configured to query all the detected faces as an input request in the exhaustive dynamic database 126. The face recognition module 122 is configured to recognize all the detected face across the exhaustive dynamic database 126 using the created model.
[023] According to an embodiment of the disclosure, the processor 106 further comprises the profile generation module 124. The profile generation module 124 is configured to generate and share a profile page of the person if recognized. The profile page comprises linkages and various incidence of interest and entities of the interest of the person. Linkages will be built between internal and external data which will give deeper insight and better analytics to visualize the person, Entity of interest, and Incidence of interest. It should be appreciated that the profile generation module 124 is configured to generate the profile page of all the faces which were detected in the image. In another embodiment, the profile page might be pre-generated, then the profile generation module 124 is configured to populate the profile page of the person.
[024] According to an embodiment of the disclosure, the system 100 can be implemented on the big data environment. Fig. 2 shows a schematic workflow of the system 100 implemented on a Hadoop cluster. As shown in the figure, various interrogation reports, incidences, free text and CDR is provided as input. After som processing these documents are stored in the database which is also referred as landing area present in Hadoop distributed files system (HDFS). These documents are then provided for image processing. After the processing, the output comes to processing area in the form of .jpg and .log format. And provided to HBASE database management system. The documents in the database then can be searched using the Solr indexes.
[025] In operation, a flowchart 200 illustrating the method for face detection and recognition of a person in the image in the big data environment is shown in Fig. 3A-3B. Initially at step 202, all the images of the person are extracted from the plurality of internal sources such as internal reports, documents etc. And similarly, at step 204, the information about the person is also extracted from the plurality of external sources using the set of web crawling techniques. The use any web crawling technique is well within the scope of this disclosure.
[026] In the next step 206, the exhaustive dynamic database is created by collating all the extracted images from the internal sources and the extracted information. This database is prepared once and keep on updating over a period of times. Because of huge size, the database 126 is hosted on the Hadoop distributed file system (HDFS). In the next step 208, the plurality of features are extracted from the face of the person. Then at step 210, the model is created using the extracted plurality of features. The model is created using a set of supervised machine learning techniques.
[027] In the next step 212, the image is provided as the input to detect the presence of the person in the image. At step 214, the presence of all the faces is detected in the image using a computational neural network (CNN) technique. Then at step 216, all the detected faces are queried as the input request in the exhaustive dynamic database 126.
[028] In the next step 218, all the detected face are recognized across the exhaustive dynamic database 126 using the created model. And finally, the profile page of the person if recognized, wherein the profile page comprises linkages and various incidence of interest and entities of the interest of the person. It should be appreciated that the profile page is generated for each of the detected faces and the results are given as top closest match to the person.
[029] According to an embodiment of the disclosure, the system 100 can also be explained with the help of examples. In an example, the user is interested to know about person A. The hierarchical linkage analysis of the person can be generated. The hierarchical analysis will describe person A’s all linked entities like his father, incident involved, organization, friends, places visited etc.
[030] The written description describes the subject matter herein to enable any person skilled in the art to make and use the embodiments. The scope of the subject matter embodiments is defined by the claims and may include other modifications that occur to those skilled in the art. Such other modifications are intended to be within the scope of the claims if they have similar elements that do not differ from the literal language of the claims or if they include equivalent elements with insubstantial differences from the literal language of the claims.
[031] The embodiments of present disclosure herein solves the difficulty related to efficient face detection and recognition. The disclosure provides a method and system for face detection and recognition of the person in the image in the big data environment.
[032] It is to be understood that the scope of the protection is extended to such a program and in addition to a computer-readable means having a message therein; such computer-readable storage means contain program-code means for implementation of one or more steps of the method, when the program runs on a server or mobile device or any suitable programmable device. The hardware device can be any kind of device which can be programmed including e.g. any kind of computer like a server or a personal computer, or the like, or any combination thereof. The device may also include means which could be e.g. hardware means like e.g. an application-specific integrated circuit (ASIC), a field-programmable gate array (FPGA), or a combination of hardware and software means, e.g. an ASIC and an FPGA, or at least one microprocessor and at least one memory with software modules located therein. Thus, the means can include both hardware means and software means. The method embodiments described herein could be implemented in hardware and software. The device may also include software means. Alternatively, the embodiments may be implemented on different hardware devices, e.g. using a plurality of CPUs.
[033] The embodiments herein can comprise hardware and software elements. The embodiments that are implemented in software include but are not limited to, firmware, resident software, microcode, etc. The functions performed by various modules described herein may be implemented in other modules or combinations of other modules. For the purposes of this description, a computer-usable or computer readable medium can be any apparatus that can comprise, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
[034] The illustrated steps are set out to explain the exemplary embodiments shown, and it should be anticipated that ongoing technological development will change the manner in which particular functions are performed. These examples are presented herein for purposes of illustration, and not limitation. Further, the boundaries of the functional building blocks have been arbitrarily defined herein for the convenience of the description. Alternative boundaries can be defined so long as the specified functions and relationships thereof are appropriately performed. Alternatives (including equivalents, extensions, variations, deviations, etc., of those described herein) will be apparent to persons skilled in the relevant art(s) based on the teachings contained herein. Such alternatives fall within the scope and spirit of the disclosed embodiments. Also, the words “comprising,” “having,” “containing,” and “including,” and other similar forms are intended to be equivalent in meaning and be open ended in that an item or items following any one of these words is not meant to be an exhaustive listing of such item or items, or meant to be limited to only the listed item or items. It must also be noted that as used herein and in the appended claims, the singular forms “a,” “an,” and “the” include plural references unless the context clearly dictates otherwise.
[035] Furthermore, one or more computer-readable storage media may be utilized in implementing embodiments consistent with the present disclosure. A computer-readable storage medium refers to any type of physical memory on which information or data readable by a processor may be stored. Thus, a computer-readable storage medium may store instructions for execution by one or more processors, including instructions for causing the processor(s) to perform steps or stages consistent with the embodiments described herein. The term “computer-readable medium” should be understood to include tangible items and exclude carrier waves and transient signals, i.e., be non-transitory. Examples include random access memory (RAM), read-only memory (ROM), volatile memory, nonvolatile memory, hard drives, CD ROMs, DVDs, flash drives, disks, and any other known physical storage media.
[036] It is intended that the disclosure and examples be considered as exemplary only, with a true scope and spirit of disclosed embodiments being indicated by the following claims.

Documents

Application Documents

#	Name	Date
1	201821045425-STATEMENT OF UNDERTAKING (FORM 3) [30-11-2018(online)].pdf	2018-11-30
2	201821045425-REQUEST FOR EXAMINATION (FORM-18) [30-11-2018(online)].pdf	2018-11-30
3	201821045425-FORM 18 [30-11-2018(online)].pdf	2018-11-30
4	201821045425-FORM 1 [30-11-2018(online)].pdf	2018-11-30
5	201821045425-FIGURE OF ABSTRACT [30-11-2018(online)].jpg	2018-11-30
6	201821045425-DRAWINGS [30-11-2018(online)].pdf	2018-11-30
7	201821045425-DECLARATION OF INVENTORSHIP (FORM 5) [30-11-2018(online)].pdf	2018-11-30
8	201821045425-COMPLETE SPECIFICATION [30-11-2018(online)].pdf	2018-11-30
9	Abstract1.jpg	2019-01-21
10	201821045425-FORM-26 [11-02-2019(online)].pdf	2019-02-11
11	201821045425-Proof of Right (MANDATORY) [15-05-2019(online)].pdf	2019-05-15
12	201821045425-ORIGINAL UR 6(1A) FORM 26-130219.pdf	2019-12-02
13	201821045425-ORIGINAL UR 6(1A) FORM 1-160519.pdf	2020-01-01
14	201821045425-OTHERS [17-06-2021(online)].pdf	2021-06-17
15	201821045425-FER_SER_REPLY [17-06-2021(online)].pdf	2021-06-17
16	201821045425-DRAWING [17-06-2021(online)].pdf	2021-06-17
17	201821045425-COMPLETE SPECIFICATION [17-06-2021(online)].pdf	2021-06-17
18	201821045425-CLAIMS [17-06-2021(online)].pdf	2021-06-17
19	201821045425-FER.pdf	2021-10-18
20	201821045425-US(14)-HearingNotice-(HearingDate-28-12-2023).pdf	2023-11-06
21	201821045425-FORM-26 [29-11-2023(online)].pdf	2023-11-29
22	201821045425-Correspondence to notify the Controller [29-11-2023(online)].pdf	2023-11-29
23	201821045425-Annexure [29-11-2023(online)].pdf	2023-11-29
24	201821045425-Written submissions and relevant documents [03-01-2024(online)].pdf	2024-01-03
25	201821045425-PatentCertificate29-02-2024.pdf	2024-02-29
26	201821045425-IntimationOfGrant29-02-2024.pdf	2024-02-29

Search Strategy

1	2020-11-0616-39-44E_26-11-2020.pdf