A Deep Attention Pyramid Harmonic Network System For Face Image

< Back

A Deep Attention Pyramid Harmonic Network System For Face Image Recognition Using Occluded And Mask Face Images

Abstract: A DEEP ATTENTION PYRAMID HARMONIC NETWORK SYSTEM FOR FACE IMAGE RECOGNITION USING OCCLUDED AND MASK FACE IMAGES A system and method for face recognition from occluded and masked images are disclosed. Input images are enhanced using contrast limited adaptive histogram equalisation to restore local details lost under masks or poor lighting. Face regions are detected by a detector trained with an adaptive optimisation algorithm integrating dynamic step size and self-adaptive population updates. From the detected region, multiple features including circular local binary patterns, local symmetric patterns, discrete cosine transform, convolutional neural network features and statistical descriptors are extracted. These are fused end-to-end within a deep attention pyramid harmonic network comprising a deep high-order attention encoder, a multiple feature pyramid neck and harmonic analysis blocks. This joint spatial-frequency design captures complementary structural and spectral cues for robust recognition. The system outputs the recognised identity with high precision, recall and F-measure, maintaining real-time performance on edge devices and providing a unified pipeline for occluded and masked face recognition.

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #

Filing Date

23 September 2025

Publication Number

43/2025

Publication Type

INA

Invention Field

COMPUTER SCIENCE

Status

Parent Application

Applicants

SR UNIVERSITY

ANANTHSAGAR, HASANPARTHY (M), WARANGAL URBAN, TELANGANA - 506371, INDIA

Inventors

1. AITHA RAJESH

SR UNIVERSITY ANANTHSAGAR, HASANPARTHY (M), WARANGAL URBAN, TELANGANA - 506371, INDIA

2. RAJCHANDAR K

SR UNIVERSITY ANANTHSAGAR, HASANPARTHY (M), WARANGAL URBAN, TELANGANA - 506371, INDIA

Specification

Description:FIELD OF THE INVENTION
This invention relates to computer vision and biometrics. More particularly, it concerns a system and method for face image recognition from occluded and masked faces using a deep attention pyramid harmonic network that integrates adaptive detection, multimodal feature fusion, and harmonic-enhanced attention pyramids for robust and accurate recognition under partial occlusion.
BACKGROUND OF THE INVENTION
Face image recognition using occluded and masked face images aims at recognizing individuals even when sunglasses, masks, or other obstructions partially obscure their faces. This is crucial in real-world scenarios, like the COVID-19 pandemic or crowded environments, enhancing surveillance, contactless identification, and security. However, existing techniques struggle with reduced accuracy, limited feature visibility, and varying conditions. This research introduces a novel approach, called Deep Attention Pyramid Harmonic Network (DAPH-Net), to tackle these challenges. The process starts with enhancing the input occluded and masked face image using Contrast Limited Adaptive Histogram Equalization (CLAHE). Further, You Only Look Once YOLO-FaceV2 (YOLO-Facev2), optimized by the proposed Adaptive Superb Fairy Optimization (Ada-SFO), is employed for face detection. Here, Ada-SFO is introduced by merging the Superb Fairy-wren Optimization Algorithm (SFOA) and the adaptive concept. Various features are extracted from the detected image utilizing Circular Local Binary Pattern (CLBP), Local Symmetric Pattern (LSP), Discrete Cosine Transform (DCT), Convolutional Neural Network (CNN), and statistical methods. The final recognition is performed employing the introduced DAPH-Net, which combines Deep High-order Attention Neural Network (DHA-Net), Multiple Feature Pyramid Network (MFPN), and Harmonic Analysis. The introduced DAPH-Net recorded a recall of 95.988%, F-measure of 94.977%, and precision of 93.987%.
US20220067521: Methods and systems for enhancing a neural network include detecting an occlusion in an input image using a trained occlusion detection neural network. The detected occlusion is replaced in the input image with a neutral occlusion to prevent the detected occlusion from frustrating facial recognition to generate a modified input image. Facial recognition is performed on the modified input image using a trained facial recognition neural network.
US10984225: Embodiments of the present disclosure provide systems and methods for recognizing a masked face. According to the present disclosure, the disclosed systems and methods include features that provide augmentation of existing face recognition databases, real-time mask detection, and real-time masked face recognition. In embodiments, masked face recognition includes a multi-layered approach, which includes finding matching simulated masked faces in the database that match the masked face being analyzed, comparing the unmasked portion of the masked face to stored unmasked faces in a database to identify any matches, and executing face restoration algorithms in which the masked portion is reconstructed to generate an unmasked representation which may then be matched against unmasked faces in the database.
Existing face recognition systems perform poorly when faces are partially covered by masks, sunglasses, or other occlusions, which limit feature visibility and degrade recognition accuracy. Current solutions either rely on large backbones with high latency, treat occlusion as noise, or fuse handcrafted and deep features only at score level. This leads to reduced precision, recall, and generalisation. The present invention addresses these shortcomings by introducing a pipeline combining contrast restoration, an adaptively optimised face detector, multimodal feature extraction, and a deep attention pyramid harmonic network that fuses handcrafted and learned descriptors inside the network, yielding improved accuracy, speed, and robustness on occluded and masked face datasets.
SUMMARY OF THE INVENTION
This summary is provided to introduce a selection of concepts, in a simplified format, that are further described in the detailed description of the invention.
This summary is neither intended to identify key or essential inventive concepts of the invention and nor is it intended for determining the scope of the invention.
The invention provides a complete pipeline for face recognition under occlusion. Input occluded or masked face images are enhanced using contrast limited adaptive histogram equalisation (CLAHE) to restore local details lost under masks or poor lighting. Face detection is performed by a detector trained with an adaptive superb fairy optimisation (Ada-SFO) metaheuristic, accelerating convergence and improving detection accuracy on occluded faces.
From the detected face image, multimodal features are extracted, including circular local binary patterns (CLBP), local symmetric patterns (LSP), discrete cosine transform (DCT), convolutional neural network (CNN) features, and statistical descriptors. These features are fused end-to-end within a deep attention pyramid harmonic network (DAPH-Net).
DAPH-Net integrates a deep high-order attention network (DHA-Net), a multiple feature pyramid network (MFPN), and harmonic analysis blocks, embedding discrete harmonic filters inside an attention encoder and a multi-frequency pyramid neck. This joint spatial-frequency design captures complementary structural and spectral cues, enabling accurate face recognition even with severe occlusion.
By combining these components into one trainable graph, the system achieves high recall, precision and F-measure on occluded and masked face datasets while maintaining real-time throughput on edge devices.
To further clarify advantages and features of the present invention, a more particular description of the invention will be rendered by reference to specific embodiments thereof, which is illustrated in the appended drawings. It is appreciated that these drawings depict only typical embodiments of the invention and are therefore not to be considered limiting of its scope. The invention will be described and explained with additional specificity and detail with the accompanying drawings.
The major intention of this work is to devise a novel technique termed DAPH-Net for face recognition from occluded/masked images. Initially, the occluded and masked face image is accumulated from the dataset and subjected to image enhancement using CLAHE [15]. Afterward, face detection is performed using YOLO-Facev2 [5], where the YOLOv2 technique is trained with the proposed Ada-SFO, formulated by integrating the SFOA [17] and the Adaptive concept. Next, features such as CLBP [18], LSP [19], CNN [20], DCT [21], and statistical features are extracted from the detected face image. Finally, the face is recognized using the proposed DAPH-Net, which is developed through the integration of DHA-Net [22], MFPN [23], and Harmonic analysis [29].
BRIEF DESCRIPTION OF THE DRAWINGS
The illustrated embodiments of the subject matter will be understood by reference to the drawings, wherein like parts are designated by like numerals throughout. The following description is intended only by way of example, and simply illustrates certain selected embodiments of devices, systems, and methods that are consistent with the subject matter as claimed herein, wherein:
FIGURE 1: SYSTEM ARCHITECTURE
The figures depict embodiments of the present subject matter for the purposes of illustration only. A person skilled in the art will easily recognize from the following description that alternative embodiments of the structures and methods illustrated herein may be employed without departing from the principles of the disclosure described herein.
DETAILED DESCRIPTION OF THE INVENTION
The detailed description of various exemplary embodiments of the disclosure is described herein with reference to the accompanying drawings. It should be noted that the embodiments are described herein in such details as to clearly communicate the disclosure. However, the amount of details provided herein is not intended to limit the anticipated variations of embodiments; on the contrary, the intention is to cover all modifications, equivalents, and alternatives falling within the scope of the present disclosure as defined by the appended claims.
It is also to be understood that various arrangements may be devised that, although not explicitly described or shown herein, embody the principles of the present disclosure. Moreover, all statements herein reciting principles, aspects, and embodiments of the present disclosure, as well as specific examples, are intended to encompass equivalents thereof.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments. As used herein, the singular forms “a",” “an” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms “comprises,” “comprising,” “includes” and/or “including,” when used herein, specify the presence of stated features, integers, steps, operations, elements and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components and/or groups thereof.
It should also be noted that in some alternative implementations, the functions/acts noted may occur out of the order noted in the figures. For example, two figures shown in succession may, in fact, be executed concurrently or may sometimes be executed in the reverse order, depending upon the functionality/acts involved.
In addition, the descriptions of "first", "second", “third”, and the like in the present invention are used for the purpose of description only, and are not to be construed as indicating or implying their relative importance or implicitly indicating the number of technical features indicated. Thus, features defining "first" and "second" may include at least one of the features, either explicitly or implicitly.
Unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which example embodiments belong. It will be further understood that terms, e.g., those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the relevant art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.
The invention starts with input acquisition of occluded or masked face images from a dataset or real-time camera feed.
An image enhancement module applies contrast limited adaptive histogram equalisation to improve local contrast and recover features lost under masks or poor lighting.
A face detection module identifies the face region using a detector trained with an adaptive superb fairy optimisation metaheuristic. This training method integrates dynamic step size and self-adaptive population update rules to accelerate convergence and improve detection accuracy compared to conventional optimisers.
Once the face region is detected, a feature extraction module computes multiple complementary descriptors: circular local binary patterns capture local texture, local symmetric patterns capture structural symmetry, discrete cosine transform captures frequency information, convolutional neural network features capture high-level representations, and statistical features provide additional quantitative measures.
These features are passed into the deep attention pyramid harmonic network for recognition. Inside this network, a deep high-order attention encoder directs focus to salient regions of the face, while a multiple feature pyramid neck aggregates multi-scale features from different layers.
Harmonic analysis blocks are embedded into the attention pyramid, applying discrete harmonic filters to extract spectral cues complementing spatial patterns. This combination enables the network to differentiate between occluded and non-occluded regions and to leverage frequency information to compensate for missing spatial data.
End-to-end fusion of handcrafted and learned descriptors occurs inside the network rather than at score level, allowing backpropagation to learn optimal weighting between classic texture/shape evidence and deep embeddings.
The system outputs the recognised identity or a similarity score. It also records metrics such as recall, precision, and F-measure for performance evaluation.
Lightweight design choices, such as using a multi-frequency pyramid neck and single FFT harmonic kernels, maintain real-time throughput on edge devices without sacrificing accuracy.
This architecture is modular and can be adapted to other partially visible biometric modalities by retraining on appropriate data.
Security and privacy measures ensure that all face data are handled and stored securely in compliance with data protection standards.
Extensive experiments show that harmonic blocks contribute significant improvements over pure attention pyramids, Ada-SFO boosts detector accuracy, and feature-level fusion outperforms deep-only baselines on multiple occluded face datasets.
By integrating all components—contrast restoration, adaptive detection, multimodal feature fusion, and harmonic-enhanced recognition—the invention provides a robust, efficient solution for occluded and masked face recognition.
The preferred embodiment deploys the system as a software module running on an edge GPU or embedded device. Input occluded or masked face images are enhanced using contrast limited adaptive histogram equalisation. The face detection module, trained with adaptive superb fairy optimisation, identifies the face region. Multimodal features are extracted and fed into the deep attention pyramid harmonic network, which integrates attention, pyramid, and harmonic analysis components to produce the recognition result. This configuration achieves high accuracy under severe occlusion while maintaining real-time processing speed.
The major intention of this work is to devise a novel technique termed DAPH-Net for face recognition from occluded/masked images. Initially, the occluded and masked face image is accumulated from the dataset and subjected to image enhancement using CLAHE. Afterward, face detection is performed using YOLO-Facev2 , where the YOLOv2 technique is trained with the proposed Ada-SFO, formulated by integrating the SFOA and the Adaptive concept. Next, features such as CLBP, LSP, CNN, DCT, and statistical features are extracted from the detected face image. Finally, the face is recognized using the proposed DAPH-Net, which is developed through the integration of DHA-Net, MFPN, and Harmonic analysis.
Harmonic enhanced attention pyramid architecture (DAPH Net).
• Unlike prior attention based or pyramid based face recognisers, DAPH Net embeds discrete harmonic filters inside a Dual Head Attention (DHA) encoder and a lightweight Multi Frequency Pyramid Neck (MFPN). This joint spatial frequency design extracts complementary structural (pyramid) and spectral (harmonic) cues, making it the first network to fuse harmonic analysis with modern attention pyramids for occlusion robust face recognition.
End to end fusion of handcrafted and learned descriptors.
• The framework concatenates CLBP, LSP, DCT, and low order statistics with deep CNN embeddings inside the network—not merely at score level—so that DAPH Net can learn how to weight classic texture/shape evidence against learned features during back propagation. This tightly coupled, multi modal feature fusion has not been reported for masked face pipelines.
Adaptive Superb Fairy Optimisation (Ada SFO) for detector training.
• YOLO Face v2 is trained with a new Ada SFO meta heuristic that augments the original Superb Fairy Optimisation with dynamic step size and self adaptive population update rules. Compared with standard SGD, Adam, or even recent swarm optimisers, Ada SFO accelerates convergence and consistently yields higher AP on occluded face benchmarks, representing the first application of SFO variants to face detector weight optimisation.
Unified pipeline that addresses the full masked/occluded face problem.
• The study integrates: (i) CLAHE based contrast restoration to recover local details lost under masks or poor lighting, (ii) Ada SFO–optimised YOLO Face v2 to isolate partially visible faces, (iii) multi modal feature fusion, and (iv) harmonic augmented recognition—all in one trainable graph. Earlier works tackle only one or two of these stages in isolation.
Occlusion resilience without accuracy–latency trade offs.
• By replacing heavyweight backbones with MFPN and harmonic blocks (single FFT kernels) the system maintains real time throughput on edge GPUs/Jetson class devices while sustaining accuracy under severe occlusion (> 60 % facial area). This contrasts with current SOTA masked face models that require larger backbones or cloud inference.
Comprehensive experimental evidence.
• Ablation studies show (a) harmonic blocks contribute up to +3.7 % F1 over a pure attention pyramid, (b) Ada SFO boosts YOLO Face v2 AP by +2.5 – 4.2 % versus Adam, and (c) feature level fusion of handcrafted+deep descriptors yields +2.1 % over deep only baselines on MAFA, RMFD, and custom low light mask datasets. No previous work reports such systematic gains across all three components.
, Claims:1. A system for recognising faces from occluded and masked images comprising:
an image enhancement module configured to apply contrast limited adaptive histogram equalisation to input images;
a face detection module trained with an adaptive optimisation algorithm configured to detect face regions from the enhanced images;
a feature extraction module configured to compute circular local binary patterns, local symmetric patterns, discrete cosine transform features, convolutional neural network features and statistical descriptors from the detected face region;
a deep attention pyramid harmonic network comprising a deep high-order attention encoder, a multiple feature pyramid neck and harmonic analysis blocks configured to fuse the extracted features and recognise the face; and
an output module configured to present the recognition result and performance metrics to a user.
2. The system as claimed in claim 1, wherein the image enhancement module improves local contrast to recover details lost under masks or poor lighting.
3. The system as claimed in claim 1, wherein the face detection module uses an adaptive metaheuristic to accelerate convergence and improve detection accuracy on occluded faces.
4. The system as claimed in claim 1, wherein the harmonic analysis blocks apply discrete harmonic filters to extract spectral cues complementing spatial patterns.
5. The system as claimed in claim 1, wherein the deep attention pyramid harmonic network fuses handcrafted and learned descriptors end-to-end inside the network for improved weighting.
6. A method for recognising faces from occluded and masked images comprising:
applying contrast limited adaptive histogram equalisation to input occluded or masked face images;
detecting face regions from the enhanced images using a detector trained with an adaptive optimisation algorithm;
extracting circular local binary patterns, local symmetric patterns, discrete cosine transform features, convolutional neural network features and statistical descriptors from the detected face region;
processing the extracted features through a deep attention pyramid harmonic network comprising a deep high-order attention encoder, a multiple feature pyramid neck and harmonic analysis blocks to fuse features and recognise the face; and
outputting the recognition result and performance metrics to a user.
7. The method as claimed in claim 6, wherein the adaptive optimisation algorithm integrates dynamic step size and self-adaptive population update rules to improve detector training.
8. The method as claimed in claim 6, wherein harmonic analysis blocks embedded in the network extract frequency information to compensate for missing spatial data due to occlusion.
9. The method as claimed in claim 6, wherein handcrafted and learned descriptors are fused end-to-end inside the network to allow optimal weighting during backpropagation.
10. The method as claimed in claim 6, wherein the system maintains real-time throughput on edge devices while sustaining high accuracy under severe occlusion.

Documents

Application Documents

#	Name	Date
1	202541090703-STATEMENT OF UNDERTAKING (FORM 3) [23-09-2025(online)].pdf	2025-09-23
2	202541090703-REQUEST FOR EARLY PUBLICATION(FORM-9) [23-09-2025(online)].pdf	2025-09-23
3	202541090703-POWER OF AUTHORITY [23-09-2025(online)].pdf	2025-09-23
4	202541090703-FORM-9 [23-09-2025(online)].pdf	2025-09-23
5	202541090703-FORM FOR SMALL ENTITY(FORM-28) [23-09-2025(online)].pdf	2025-09-23
6	202541090703-FORM 1 [23-09-2025(online)].pdf	2025-09-23
7	202541090703-EVIDENCE FOR REGISTRATION UNDER SSI(FORM-28) [23-09-2025(online)].pdf	2025-09-23
8	202541090703-EVIDENCE FOR REGISTRATION UNDER SSI [23-09-2025(online)].pdf	2025-09-23
9	202541090703-EDUCATIONAL INSTITUTION(S) [23-09-2025(online)].pdf	2025-09-23
10	202541090703-DRAWINGS [23-09-2025(online)].pdf	2025-09-23
11	202541090703-DECLARATION OF INVENTORSHIP (FORM 5) [23-09-2025(online)].pdf	2025-09-23
12	202541090703-COMPLETE SPECIFICATION [23-09-2025(online)].pdf	2025-09-23