Sign In to Follow Application
View All Documents & Correspondence

Systems And Methods For Classifying Images For Privacy With Interpretable Textual Explanation

Abstract: Due to increase social media activity of users, there is a high proliferation of image sharing and it is becoming increasingly difficult for users to maintain their privacy and security of their sensitive data appears to be a major concern. System of the present disclosure implements an explainable privacy prediction model, a face recognition model, and a feature generator model for obtaining class(es) for an input image, generating a first output including face identification in the image based on facial embeddings, and generating a second output including feature embedding of the input image for determining any similarity therein. The class(es), the first output and the second output are then used for classifying the input image as a specific image type with associated interpretable textual explanation (AITE) for the classified image. Feedback is obtained for the classified image and the AITE for updating corresponding databases and retraining of the system. [To be published with FIG. 2A]

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #
Filing Date
29 October 2021
Publication Number
18/2023
Publication Type
INA
Invention Field
COMPUTER SCIENCE
Status
Email
Parent Application

Applicants

Tata Consultancy Services Limited
Nirmal Building, 9th Floor, Nariman Point Mumbai Maharashtra India 400021

Inventors

1. DAMMU, Preetam Prabhu Srikar
Tata Consultancy Services Limited GS2-60, Deccan Park, Plot No 1, Survey No. 64/2, Software Units Layout , Serilingampally Mandal, Madhapur, Hyderabad Telangana India 500081
2. SINGH, Ajeet Kumar
Tata Consultancy Services Limited Tata Research Development & Design Centre, Cubicle 271, 54-B, Hadapsar Industrial Estate, Hadapsar, Pune Maharashtra India 411013
3. CHALAMALA, Srinivasa Rao
Tata Consultancy Services Limited GS2-55, Deccan Park, Plot No 1, Survey No. 64/2, Software Units Layout , Serilingampally Mandal, Madhapur, Hyderabad Telangana India 500081

Specification

FORM 2
THE PATENTS ACT, 1970
(39 of 1970)
&
THE PATENT RULES, 2003
COMPLETE SPECIFICATION (See Section 10 and Rule 13)
Title of invention:
SYSTEMS AND METHODS FOR CLASSIFYING IMAGES FOR PRIVACY WITH INTERPRETABLE TEXTUAL EXPLANATION
Applicant
Tata Consultancy Services Limited A company Incorporated in India under the Companies Act, 1956
Having address:
Nirmal Building, 9th floor,
Nariman point, Mumbai 400021,
Maharashtra, India
Preamble to the description
The following specification particularly describes the invention and the manner in which it is to be performed.

TECHNICAL FIELD [001] The disclosure herein generally relates to privacy prediction systems, and, more particularly, to systems and methods for classifying images for privacy with interpretable textual explanation.
BACKGROUND [002] Due to increased social media activity of users, there is a high proliferation of image sharing and it is becoming increasingly difficult for users to maintain their privacy and security of their sensitive data appears to be a major concern. Most of the users share their information such as images on various platforms assuming that it is utilized for intended purpose. But, in reality, there is a significant risk of the images getting exposed into wrong hands and eventually being used for malignant purposes without the users’ knowledge. It is therefore imperative to devise effective techniques to prevent such events and notify users to review before any sensitive information is shared. Conventionally, several methods have been proposed to execute this task, yet most have shortcomings that might make them unsuitable for end-users.
SUMMARY [003] Embodiments of the present disclosure present technological improvements as solutions to one or more of the above-mentioned technical problems recognized by the inventors in conventional systems. For example, in one aspect, there is provided a processor implemented method for classifying images for privacy with interpretable textual explanation. The method comprises obtaining, via one or more hardware processors, an image as an input from a user; transmitting, via the one or more hardware processors, the obtained image to (i) an explainable privacy prediction model, (ii) a face recognition model, and (iii) a feature generating model; classifying, by using the explainable privacy prediction model via the one or more hardware processors, the obtained image to obtain one or more classes of the obtained image; detecting, by using the face recognition model via the one or more hardware processors, one or more faces of one or more corresponding users

in the obtained image and generating one or more associated facial embeddings of the one or more corresponding users; performing a first comparison of the one or more associated facial embeddings with one or more facial embeddings comprised in a facial embedding database to obtain a first output; generating, by using the feature generating model via the one or more hardware processors, a feature embedding for the obtained image; performing a second comparison of the feature embedding with one or more feature embeddings comprised in one or more feature embedding databases to obtain a second output; and classifying the obtained image based on the one or more classes of the obtained image, the first output, and the second output to obtain (i) a classified image, and (ii) an associated interpretable textual explanation for the classified image.
[004] In an embodiment, the one or more classes of the obtained image is at least one of a public image, and a private image.
[005] In an embodiment, the one or more feature embedding databases comprises at least one of a private image embedding database, and a public image embedding database.
[006] In an embodiment, the step of performing the first comparison of the one or more associated facial embeddings with one or more facial embeddings comprised in the facial embedding database to obtain the first output comprises determining the one or more faces of one or more corresponding users as one of (i) a face of the user, (ii) a face of personal acquaintance of the user, or (iii) a face of an unknown user.
[007] In an embodiment, the one or more feature embedding databases further comprises one or more misclassified instances of feature embeddings of one or more image types.
[008] In an embodiment, the method further comprises receiving user feedback on (i) the classified image, and (ii) the associated interpretable textual explanation for the classified image; performing one of: outputting (i) the classified image, and (ii) the associated interpretable textual explanation for the classified image based on the user feedback; or re-classifying the classified image and modifying the associated interpretable textual explanation for the classified image.

[009] In an embodiment, the method further comprises updating the one or more feature embedding databases based on the user feedback.
[010] In an embodiment, the method further comprises retraining the explainable privacy prediction model based on a comparison of (i) number of classified images being re-classified and (ii) a misclassification threshold.
[011] In an embodiment, the misclassification threshold is one of an empirically determined threshold, or a pre-defined misclassification threshold.
[012] In another aspect, there is provided a processor implemented system for classifying images for privacy with interpretable textual explanation. The system comprises: a memory storing instructions; one or more communication interfaces; and one or more hardware processors coupled to the memory via the one or more communication interfaces, wherein the one or more hardware processors are configured by the instructions to: obtain an image as an input from a user; transmit the obtained image to (i) an explainable privacy prediction model, (ii) a face recognition model, and (iii) a feature generating model; classify, by using the explainable privacy prediction model, the obtained image to obtain one or more classes of the obtained image; detect, by using the face recognition model via the one or more hardware processors, one or more faces of one or more corresponding users in the obtained image and generate one or more associated facial embeddings of the one or more corresponding users; perform a first comparison of the one or more associated facial embeddings with one or more facial embeddings comprised in a facial embedding database to obtain a first output; generate, by using the feature generating model via the one or more hardware processors, a feature embedding for the obtained image; perform a second comparison of the feature embedding with one or more feature embeddings comprised in one or more feature embedding databases to obtain a second output; and classify the obtained image based on the one or more classes of the obtained image, the first output, and the second output to obtain (i) a classified image, and (ii) an associated interpretable textual explanation for the classified image.
[013] In an embodiment, the one or more classes of the obtained image is at least one of a public image, and a private image.

[014] In an embodiment, the one or more feature embedding databases comprises at least one of a private image embedding database, and a public image embedding database.
[015] In an embodiment, the first comparison of the one or more associated facial embeddings with one or more facial embeddings comprised in the facial embedding database performed to obtain the first output comprises determining the one or more faces of one or more corresponding users as one of (i) a face of the user, (ii) a face of personal acquaintance of the user, or (iii) a face of an unknown user.
[016] In an embodiment, the one or more feature embedding databases further comprises one or more misclassified instances of feature embeddings of one or more image types.
[017] In an embodiment, the one or more hardware processors are further configured by the instructions to receive user feedback on (i) the classified image, and (ii) the associated interpretable textual explanation for the classified image; perform one of: outputting (i) the classified image, and (ii) the associated interpretable textual explanation for the classified image based on the user feedback; or re-classifying the classified image and modifying the associated interpretable textual explanation for the classified image.
[018] In an embodiment, the one or more hardware processors are further configured by the instructions to update the one or more feature embedding databases based on the user feedback.
[019] In an embodiment, the one or more hardware processors are further configured by the instructions to retrain the explainable privacy prediction model based on a comparison of (i) number of classified images being re-classified and (ii) a misclassification threshold.
[020] In an embodiment, the misclassification threshold is one of an empirically determined threshold, or a pre-defined misclassification threshold.
[021] In yet another aspect, there are provided one or more non-transitory machine-readable information storage mediums comprising one or more instructions which when executed by one or more hardware processors cause a

method for classifying images for privacy with interpretable textual explanation. The method comprises obtaining, via the one or more hardware processors, an image as an input from a user; transmitting, via the one or more hardware processors, the obtained image to (i) an explainable privacy prediction model, (ii) a face recognition model, and (iii) a feature generating model; classifying, by using the explainable privacy prediction model via the one or more hardware processors, the obtained image to obtain one or more classes of the obtained image; detecting, by using the face recognition model via the one or more hardware processors, one or more faces of one or more corresponding users in the obtained image and generating one or more associated facial embeddings of the one or more corresponding users; performing a first comparison of the one or more associated facial embeddings with one or more facial embeddings comprised in a facial embedding database to obtain a first output; generating, by using the feature generating model via the one or more hardware processors, a feature embedding for the obtained image; performing a second comparison of the feature embedding with one or more feature embeddings comprised in one or more feature embedding databases to obtain a second output; and classifying the obtained image based on the one or more classes of the obtained image, the first output, and the second output to obtain (i) a classified image, and (ii) an associated interpretable textual explanation for the classified image.
[022] In an embodiment, the one or more classes of the obtained image is at least one of a public image, and a private image.
[023] In an embodiment, the one or more feature embedding databases comprises at least one of a private image embedding database, and a public image embedding database.
[024] In an embodiment, the step of performing the first comparison of the one or more associated facial embeddings with one or more facial embeddings comprised in the facial embedding database to obtain the first output comprises determining the one or more faces of one or more corresponding users as one of (i) a face of the user, (ii) a face of personal acquaintance of the user, or (iii) a face of an unknown user.

[025] In an embodiment, the one or more feature embedding databases further comprises one or more misclassified instances of feature embeddings of one or more image types.
[026] In an embodiment, the method further comprises receiving user feedback on (i) the classified image, and (ii) the associated interpretable textual explanation for the classified image; performing one of: outputting (i) the classified image, and (ii) the associated interpretable textual explanation for the classified image based on the user feedback; or re-classifying the classified image and modifying the associated interpretable textual explanation for the classified image.
[027] In an embodiment, the method further comprises updating the one or more feature embedding databases based on the user feedback.
[028] In an embodiment, the method further comprises retraining the explainable privacy prediction model based on a comparison of (i) number of classified images being re-classified and (ii) a misclassification threshold.
[029] In an embodiment, the misclassification threshold is one of an empirically determined threshold, or a pre-defined misclassification threshold.
[030] It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.
BRIEF DESCRIPTION OF THE DRAWINGS [031] The accompanying drawings, which are incorporated in and
constitute a part of this disclosure, illustrate exemplary embodiments and, together
with the description, serve to explain the disclosed principles:
[032] FIG. 1 depicts an exemplary system for classifying images for
privacy with interpretable textual explanation, in accordance with an embodiment
of the present disclosure.
[033] FIGS. 2A and 2B depict an exemplary high level block diagram of
the system of FIG. 1 for classifying images for privacy with interpretable textual
explanation, in accordance with an embodiment of the present disclosure, in
accordance with an embodiment of the present disclosure.

[034] FIG. 3 depicts an exemplary flow chart illustrating a method for classifying images for privacy with interpretable textual explanation, using the systems of FIGS. 1-2B, in accordance with an embodiment of the present disclosure.
[035] FIGS. 4A through 4C depict classified images with an associated interpretable textual explanation, in accordance with an embodiment of the present disclosure.
[036] FIGS. 5A through 5D depict classified images with an associated interpretable textual explanation based on user feedback, in accordance with an embodiment of the present disclosure.
[037] FIGS. 6A through 6D depict classified images with an associated interpretable textual explanation based on user feedback, in accordance with an embodiment of the present disclosure.
DETAILED DESCRIPTION OF EMBODIMENTS [038] Exemplary embodiments are described with reference to the accompanying drawings. In the figures, the left-most digit(s) of a reference number identifies the figure in which the reference number first appears. Wherever convenient, the same reference numbers are used throughout the drawings to refer to the same or like parts. While examples and features of disclosed principles are described herein, modifications, adaptations, and other implementations are possible without departing from the scope of the disclosed embodiments.
[039] In today’s highly connected world, any content shared on the internet without appropriate privacy settings can adversely affect parties involved (e.g., individuals to entities such as corporations, small organizations, etc.). Serious concerns over mismanaged personal content have been shown and it is observed through studies that most users fail to take appropriate measures and do not protect the privacy of their data either due to lack of awareness or difficulties in managing privacy settings. Even users who proactively manage their privacy settings, at still at risk due to inaccurate representation of their judgement.

[040] Since, privacy preferences of individuals evolve over time and are usually dependent on location, culture, demography etc., such preferences can be only incorporated in appropriate system through user preferences or feedback collected at regular intervals. As per regulations and user policies, individuals and entities who are involved in processing and publishing data must take consent from the data owner (which could be implicit or explicit). Taking explicit consent along with individual preferences for the privacy can reduce the chances of information leakage of sensitive content.
[041] With the above limitations and constraints, the privacy predication system must also have the ability/facility to autocorrect any mishandling of personal data and overcome the inability of understanding privacy nuances.
[042] It is understood that privacy is highly subjective in nature. For instance, an image can be construed as private by a user, while another user may tag it as public image. Manual annotation of images is performed and based on users’ perception and the majority images are accordingly considered as training labels.
[043] Therefore, even if a prediction model achieves 100% accuracy on the training data, there could be stakeholders who may not agree to model’s prediction. Hence, this needs a level of personalizing the models. However, personalizing models pose significant challenges due to requirement of large training data, resources and their training and deployments and the difficulties in accommodating uncertain changes in preferences.
[044] Embodiments of the present disclosure provide system and method that address these problems yet requires minimal amount of additional data and computational resources. Additionally, the system and method of the present disclosure keeps up with the changing user requirements and updates its behavior to reflect appropriate preferences.
[045] In most of the scenarios, users are not aware of the privacy implications of images which would lead to a difference in the user’s judgement and the predicted label. It is very common for users to overlook minute details that might leak private information (e.g., such as personal space and sensitive content

such as computer screens or confidential documents might accidentally be captured in the photograph). Also, such instances may affect other people who are present in a picture when these are shared on various platforms. Most of the time, users may assume that the prediction is erroneous unless an explanation is provided along with the prediction, bringing the unnoticed details to the users’ attention. It is therefore, imperative to provide explainability of privacy predications to convince users about apparent privacy violation. Otherwise, users may simply ignore suggested privacy status due to a lack of confidence in the automated prediction, thus posing a huge risk. In this regard, embodiments of the present disclosure provide system and method to generate an explanation along with each prediction. More specifically, systems and methods described herein introduce multiple important characteristics to the privacy prediction process while still achieving state of the art performance. Following are the challenges addressed by the system and method of the present disclosure.
1. Personalization realized by incorporating user feedback into the models.
2. Explanation for each prediction in real-time.
3. Configurability by allowing modifications to the system composition.
4. Customizable privacy settings for enforcing elevated security constraints in user-specified conditions.
5. Instantaneous updation of the user’s privacy preferences.
[046] The system of the present disclosure implements a Modular Neural Networks (MNN) which are biologically inspired by the modularization found naturally in the human brain. When humans assess sensitivity of the content present in an image, several aspects are considered that pose relevant questions to help them arrive at a logical conclusion. This is implemented by the system wherein a model is trained to capture patterns for applying to generic population and provide individual users with the flexibility to modify prediction systems for better suitability, in terms of perspectives, requirements, and personality traits.
[047] Referring now to the drawings, and more particularly to FIGS. 1 through 6D, where similar reference characters denote corresponding features consistently throughout the figures, there are shown preferred embodiments and

these embodiments are described in the context of the following exemplary system and/or method.
[048] FIG. 1 depicts an exemplary system 100 for classifying images for privacy with interpretable textual explanation, in accordance with an embodiment of the present disclosure. The system 100 may also be referred as explainable privacy prediction system and interchangeably used herein. In an embodiment, the system 100 includes one or more hardware processors 104, communication interface device(s) or input/output (I/O) interface(s) 106 (also referred as interface(s)), and one or more data storage devices or memory 102 operatively coupled to the one or more hardware processors 104. The one or more processors 104 may be one or more software processing components and/or hardware processors. In an embodiment, the hardware processors can be implemented as one or more microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, state machines, logic circuitries, and/or any devices that manipulate signals based on operational instructions. Among other capabilities, the processor(s) is/are configured to fetch and execute computer-readable instructions stored in the memory. In an embodiment, the system 100 can be implemented in a variety of computing systems, such as laptop computers, notebooks, hand-held devices (e.g., smartphones, tablet phones, mobile communication devices, and the like), workstations, mainframe computers, servers, a network cloud, and the like.
[049] The I/O interface device(s) 106 can include a variety of software and hardware interfaces, for example, a web interface, a graphical user interface, and the like and can facilitate multiple communications within a wide variety of networks N/W and protocol types, including wired networks, for example, LAN, cable, etc., and wireless networks, such as WLAN, cellular, or satellite. In an embodiment, the I/O interface device(s) can include one or more ports for connecting a number of devices to one another or to another server.
[050] The memory 102 may include any computer-readable medium known in the art including, for example, volatile memory, such as static random-access memory (SRAM) and dynamic-random access memory (DRAM), and/or

non-volatile memory, such as read only memory (ROM), erasable programmable ROM, flash memories, hard disks, optical disks, and magnetic tapes. In an embodiment, a database 108 is comprised in the memory 102, wherein the database 108 comprises images of users, or various scenes that are to be classified (or are classified). The database 108 further comprises one or more classes corresponding to various images, one or more faces of one or more corresponding users in obtained image, one or more associated facial embeddings of the one or more corresponding users, feature embedding(s) for the obtained image, associated interpretable textual explanation(s) for the classified image(s), and the like. The database 108 further comprises various models such as an explainable privacy prediction model, (ii) a face recognition model (also referred as face extractor and detector model and interchangeably used herein), and (iii) a feature generating model (also referred as feature generator and interchangeably used herein). The memory 102 further comprises (or may further comprise) information pertaining to input(s)/output(s) of each step performed by the systems and methods of the present disclosure. In other words, input(s) fed at each step and output(s) generated at each step are comprised in the memory 102 and can be utilized in further processing and analysis.
[051] FIGS. 2A and 2B depict an exemplary high level block diagram of the system 100 of FIG. 1 for classifying images for privacy with interpretable textual explanation, in accordance with an embodiment of the present disclosure, in accordance with an embodiment of the present disclosure.
[052] FIG. 3, with reference to FIGS. 1 through 2B, depicts an exemplary flow chart illustrating a method for classifying images for privacy with interpretable textual explanation, using the systems of FIGS. 1-2B, in accordance with an embodiment of the present disclosure. In an embodiment, the system(s) 100 comprises one or more data storage devices or the memory 102 operatively coupled to the one or more hardware processors 104 and is configured to store instructions for execution of steps of the method by the one or more processors 104. The steps of the method of the present disclosure will now be explained with reference to components of the system 100 of FIG. 1, the block diagram of the system 100 depicted in FIGS. 2A-2B, and the flow diagram as depicted in FIG. 3.

[053] In an embodiment, at step 202 of the present disclosure, the one or more hardware processors 104 obtain an image as an input from a user. In an embodiment, the image may be pre-stored in the database 108 of the system 100 and may be queried for retrieval and further processing. In an embodiment, at step 204 of the present disclosure, the one or more hardware processors 104 transmit the obtained image to (i) an explainable privacy prediction model, (ii) a face recognition model, and (iii) a feature generator model. As depicted in FIG. 2, the input image is transmitted to each of the models.
[054] In an embodiment, at step 206 of the present disclosure, the one or more hardware processors 104 classify, by using the explainable privacy prediction model, the obtained image to obtain one or more classes of the obtained image. The obtained image may be classified as a public image, a private image, and the like. Within classification of private image, the obtained image may be classified as moderately private image, a private image, or highly private image (e.g., image containing sensitive information or confidential information).
[055] In an embodiment, at step 208 of the present disclosure, the one or more hardware processors 104 detect, by using the face recognition model, one or more faces of one or more corresponding users in the obtained image and generate one or more associated facial embeddings of the one or more corresponding users.
[056] In an embodiment, at step 210 of the present disclosure, the one or more hardware processors 104 perform a first comparison of the one or more associated facial embeddings with one or more facial embeddings comprised in a facial embedding database to obtain a first output. step of performing the first comparison of the one or more associated facial embeddings with one or more facial embeddings comprised in the facial embedding database to obtain the first output comprises determining the one or more faces of one or more corresponding users as one of (i) a face of the user, (ii) a face of personal acquaintance of the user, or (iii) a face of an unknown user. The face recognition model detects the presence of a face(s) in an image (e.g., the obtained image), and generates a facial embedding if present. When a user identifies faces that belong to either the user, or any of their personal acquaintances or a face in the obtained image is a face of an unknown user,

these embeddings are stored in the face embedding Database in the memory 102. A query may be performed to check if a given image contains the face of the user/personal acquaintance to determine if there is an existing face or previously classified image (or face) that maps to objects in the obtained image or not.
[057] The system 100 may be implemented with configuration settings, in one embodiment: In this configuration, feedback may be obtained from user(s) by only identifying faces of personal acquaintances. This configuration demonstrates how the predictions change if the user or an acquaintance of the user is present, or if the face is of an unknown user in the obtained image. Naturally, users are more concerned for the privacy of the images in which they or their friends or family members are present, however, the existing methods do not take this into consideration while arriving at a prediction. The system 100 takes this information as a very strong indicator of privacy and therefore cannot be overlooked and must be incorporated to the prediction process. In Through experimental conducted (not shown in FIGS.), it is observed that identifying a single face image of a person is sufficient to treat all images of that person with higher importance with respect to privacy. By default, all images of identified persons are treated as private but more complex rules can be created based on the user’s specifications.
[058] In the present disclosure, the system and method described herein implement FaceNet (e.g., a face recognition model) for generating the feature embeddings of user-identified faces (also referred as face embeddings or facial embeddings of users) which are then stored in a face embeddings database. Multitask cascaded convolutional networks (MTCNN) as known in the art technique is used in conjunction with FaceNet, where MTCNN is used for detecting and localizing the facial region in an image. In FIG. 2, “Face Extractor + Detector” represents the MTCNN and FaceNet pair (together referred herein as face recognition model as implemented by the system 100 of the present disclosure).
[059] Referring to steps of FIG. 3, at step 212 of the present disclosure, the one or more hardware processors 104 generate, via the feature generator model, a feature embedding for the obtained image. The feature generator model generates the feature embedding for the entire image (e.g., the obtained image). If a user

identifies misclassified instances, their embeddings are stored in one or more feature embedding databases (e.g., private and/or public feature embeddings databases) to prevent similar mistakes/errors from happening again. When a new input image is given, a query is performed to check if a similar image has been tagged as private/public by the user earlier.
[060] Referring to steps of FIG. 3, at step 214 of the present disclosure, the one or more hardware processors 104 perform a second comparison of the feature embedding with one or more feature embeddings comprised in one or more feature embedding databases to obtain a second output. The one or more feature embedding databases comprises at least one of a private image embedding database, and a public image embedding database. The feature embedding of the obtained image is used for comparing with feature embeddings comprised in the feature embedding databases stored in the memory 102 to determine whether there are similar embeddings (or identical embeddings). The similarity determination may also be determined by the system on the misclassified instances of private feature embeddings and/or public feature embeddings. In other words, the feature embedding generated by the feature generating model is compared with private and public feature embeddings to determine similarity or a match. Based on this determination, the resultant output serves as the second output. Further, any mismatch during the comparison, such instances are stored as misclassified instances of feature embeddings of one or more image types (e.g., private image or public image misclassified instance). In other words, the one or more feature embedding databases further comprises one or more misclassified instances of feature embeddings of one or more image types.
[061] Referring to steps of FIG. 3, at step 216 of the present disclosure, the one or more hardware processors 104 classify the obtained image based on (a) the one or more classes of the obtained image, (b) the first output, and (c) the second output to obtain (i) a classified image, and (ii) an associated interpretable textual explanation (AITE) for the classified image. FIGS. 4A through 4C, with reference to FIGS. 1 through 3, depict classified images with an associated interpretable textual explanation, in accordance with an embodiment of the present disclosure.

More specifically, in FIG. 4A, the obtained image is classified/predicted as public image (classified image depicted as 402 in FIG. 4A) because there is 1 person in an indoor setting at the conference_center and the picture is SFW (safe for work). In FIG. 4B, the obtained image is classified/predicted as private image (classified image depicted as 404 in FIG. 4B) because there are 1 person in an indoor setting at the nursing_home and the picture is SFW. Similarly, in FIG. 4C, the obtained image is classified/predicted to be Public (classified image depicted as 406 in FIG. 4C) because there are 1 person in an outdoor setting at the stadium_soccer and the picture is SFW. The images shown in FIGS. 4A through 4C are obtained from publicly Labeled Faces in the Wild (LFW) Dataset (e.g., refer http://vis-www.cs.umass.edu/lfw/).
[062] Further, the hardware processors 104 receive user feedback on (i) the classified image, and (ii) the associated interpretable textual explanation for the classified image. performing one of: outputting (i) the classified image, and (ii) the associated interpretable textual explanation for the classified image based on the user feedback; or re-classifying the classified image and modifying the associated interpretable textual explanation for the classified image. In other words, there could be images that are misclassified by the system 100 for which the system 100 receives user feedback. For instance, the user would have the option to indicate if a prediction is contradicting their own judgement by identifying the misclassified image and providing the rectified label which serves as the user feedback.
[063] Based on this user feedback, the system 100 may re-classify and store this information. FIGS. 5A through 5D, with reference to FIGS. 1 through 4C, depict classified images with an associated interpretable textual explanation based on user feedback, in accordance with an embodiment of the present disclosure. More specifically, in FIG. 5A, the first image is classified/predicted as public image by the user (classified image depicted as 502 in FIG. 5A), as it is a picture of the user in a sports competition. As can be observed from the images from FIGS. 5A through 5D, sensitive content such as such as nudity. However, this image is an exception due to the context and venue. The privacy prediction system learns this preference and stores this information. It identifies other images (FIG. 5B - refer

classified image 504, FIG. 5C - refer classified image 506, and FIG. 5D - refer classified image 508 respectively) similar to the one identified by the user as Public subsequently. The images shown in FIGS. 5A through 5D are obtained from publicly Labeled Faces in the Wild (LFW) Dataset (e.g., refer http://vis-www.cs.umass.edu/lfw/). FIGS. 6A through 6D, with reference to FIGS. 1 through 5D, depict classified images with an associated interpretable textual explanation based on user feedback, in accordance with an embodiment of the present disclosure. The first image is identified by the user as Public, as it is a sportsman playing in a public venue.
[064] For instance, the user would have the option to identify the presence of either their or their personal acquaintance’s face in an image, or face of unknown user. This action needs to be performed only once for each face, in one embodiment of the present disclosure. The system 100 learns this preference and stores this information. It identifies other images (FIG. 6B - refer classified image 604, FIG. 6C - refer classified image 606, and FIG. 6D - refer classified image 608 respectively) similar to the one identified by the user as Public subsequently. The images shown in FIGS. 6A through 6D are obtained from publicly LFW Dataset (e.g., refer http://vis-www.cs.umass.edu/lfw/).
[065] Additionally, the system 100 may be integrated with other models such as object detection model(s), obscene detection model, license plate/number detection model, and the like. For instance, object detection model may be stored in the memory 102 and invoked for execution and to detect objects in the image which may help in deciding if sensitive content is present in an image or not. Similarly, obscene/nudity detection model – may be stored in the memory 102 and invoked for execution and to detect any NSFW (Not-suitable-for-work) content is present in an image. NSFW content is generally considered as sensitive. Likewise, license plate/number detection model may be stored in the memory 102 and invoked for execution and to detect presence of visible license plates of vehicles which can be considered as indirect PII (Personally Identifiable Information). Such examples of model integration into the system 100 shall not be construed as limiting the scope of the present disclosure.

[066] Further, the system 100 or the models described herein (e.g., the explainable privacy prediction model, the face recognition model, the feature generator model or other models if integrated) are (or may be) re-trained based on a comparison of (i) number of classified images being re-classified and (ii) a misclassification threshold. In other words, say the misclassification threshold is 10 which means that the system 100 needs to be retrained if the classification and the associated interpretable textual explanation made by the system 100 appears to be incorrect and when user feedback is obtained and compared. Such incorrect or misclassification (or misclassified) instances when occur for more than 10 times (e.g., in this scenario the misclassification threshold is defined as 10), the system 100 may be re-trained (or undergoes retraining). This enables the system 100 to improve its classification accuracy along with correcting and providing appropriate interpretable textual explanation for the image being classified.
[067] The user provides feedback inputs only on misclassified instances. The performance of the system is updated with every feedback input received from the user. It should be noted that the increment or reduction of performance is dependent on the user’s feedback. The purpose of this configuration is to demonstrate how the system 100 incorporates user’s preferences based on corrective feedback provided by the user on misclassified instances. FIGS. 5A and 6A depict an image corresponding to the misclassified instances identified by the user. Subsequent images shown in their respective columns are images which were incorrectly classified previously but are correctly classified now as a result of the corrective feedback (e.g., FIGS. 5B through 5D and FIGS. 6B through 6D).
[068] As demonstrated in FIGS. 1 through 6D, the input is simultaneously processed by the PrivMNN (privacy modular neural network or also referred as explainable privacy prediction model), the ResNet (Residual Networks) model (also referred as feature generating model and interchangeably used herein) and the face detector (or face recognition model and interchangeably used herein). Subsequently, the query results and PrivMNN’s prediction are fed to the decision operator which generates the final decision. If the user finds the result acceptable, no further actions are required. However, if the user finds the result is not

appropriate according to his/her perspective, he/she could provide feedback to the system 100. This feedback is used for creating customized rules in the decision ops as shown in FIG. 2A, and for updating the relevant databases. All misclassified instances are also stored in a separate database, and if the number of these instances is significant enough (e.g., refer misclassification threshold), the PrivMNN could be fine-tuned from these. However, it is unlikely that a single user would provide such a high number of misclassified instances as it is expected that the system would handle most of the user’s concerns after receiving a few feedback inputs from the user. It is to be understood by a person having ordinary skill in the art or person skilled in the art that the various models as implemented by the system 100 such as explainable privacy prediction model, face recognition model, feature generating model, and the like shall not be construed as limiting the scope of the present disclosure. In other words, any versions/variants of such models or equivalent or any other similar models may be implemented for classifying images with appropriate interpretable textual explanation and be used for privacy prediction.
[069] The written description describes the subject matter herein to enable any person skilled in the art to make and use the embodiments. The scope of the subject matter embodiments is defined by the claims and may include other modifications that occur to those skilled in the art. Such other modifications are intended to be within the scope of the claims if they have similar elements that do not differ from the literal language of the claims or if they include equivalent elements with insubstantial differences from the literal language of the claims.
[070] It is to be understood that the scope of the protection is extended to such a program and in addition to a computer-readable means having a message therein; such computer-readable storage means contain program-code means for implementation of one or more steps of the method, when the program runs on a server or mobile device or any suitable programmable device. The hardware device can be any kind of device which can be programmed including e.g., any kind of computer like a server or a personal computer, or the like, or any combination thereof. The device may also include means which could be e.g., hardware means like e.g., an application-specific integrated circuit (ASIC), a field-programmable

gate array (FPGA), or a combination of hardware and software means, e.g., an ASIC and an FPGA, or at least one microprocessor and at least one memory with software processing components located therein. Thus, the means can include both hardware means and software means. The method embodiments described herein could be implemented in hardware and software. The device may also include software means. Alternatively, the embodiments may be implemented on different hardware devices, e.g., using a plurality of CPUs.
[071] The embodiments herein can comprise hardware and software elements. The embodiments that are implemented in software include but are not limited to, firmware, resident software, microcode, etc. The functions performed by various components described herein may be implemented in other components or combinations of other components. For the purposes of this description, a computer-usable or computer readable medium can be any apparatus that can comprise, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
[072] The illustrated steps are set out to explain the exemplary embodiments shown, and it should be anticipated that ongoing technological development will change the manner in which particular functions are performed. These examples are presented herein for purposes of illustration, and not limitation. Further, the boundaries of the functional building blocks have been arbitrarily defined herein for the convenience of the description. Alternative boundaries can be defined so long as the specified functions and relationships thereof are appropriately performed. Alternatives (including equivalents, extensions, variations, deviations, etc., of those described herein) will be apparent to persons skilled in the relevant art(s) based on the teachings contained herein. Such alternatives fall within the scope of the disclosed embodiments. Also, the words “comprising,” “having,” “containing,” and “including,” and other similar forms are intended to be equivalent in meaning and be open ended in that an item or items following any one of these words is not meant to be an exhaustive listing of such item or items, or meant to be limited to only the listed item or items. It must also be

noted that as used herein and in the appended claims, the singular forms “a,” “an,” and “the” include plural references unless the context clearly dictates otherwise.
[073] Furthermore, one or more computer-readable storage media may be utilized in implementing embodiments consistent with the present disclosure. A computer-readable storage medium refers to any type of physical memory on which information or data readable by a processor may be stored. Thus, a computer-readable storage medium may store instructions for execution by one or more processors, including instructions for causing the processor(s) to perform steps or stages consistent with the embodiments described herein. The term “computer-readable medium” should be understood to include tangible items and exclude carrier waves and transient signals, i.e., be non-transitory. Examples include random access memory (RAM), read-only memory (ROM), volatile memory, nonvolatile memory, hard drives, CD ROMs, DVDs, flash drives, disks, and any other known physical storage media.
[074] It is intended that the disclosure and examples be considered as exemplary only, with a true scope of disclosed embodiments being indicated by the following claims.

We Claim:
1. A processor implemented method, comprising:
obtaining, via one or more hardware processors, an image as an input from a user (202);
transmitting, via the one or more hardware processors, the obtained image to (i) an explainable privacy prediction model, (ii) a face recognition model, and (iii) a feature generating model (204);
classifying, by using the explainable privacy prediction model via the one or more hardware processors, the obtained image to obtain one or more classes of the obtained image (206);
detecting, by using the face recognition model via the one or more hardware processors, one or more faces of one or more corresponding users in the obtained image and generating one or more associated facial embeddings of the one or more corresponding users (208);
performing a first comparison of the one or more associated facial embeddings with one or more facial embeddings comprised in a facial embedding database to obtain a first output (210);
generating, by using the feature generating model via the one or more hardware processors, a feature embedding for the obtained image (211);
performing a second comparison of the feature embedding with one or more feature embeddings comprised in one or more feature embedding databases to obtain a second output (212); and
classifying the obtained image based on the one or more classes of the obtained image, the first output, and the second output to obtain (i) a classified image, and (ii) an associated interpretable textual explanation for the classified image (214).

2. The processor implemented method of claim 1, wherein the one or more classes of the obtained image is at least one of a public image, and a private image.
3. The processor implemented method of claim 1, wherein the one or more feature embedding databases comprises at least one of a private image embedding database, and a public image embedding database.
4. The processor implemented method of claim 1, wherein the step of performing the first comparison of the one or more associated facial embeddings with one or more facial embeddings comprised in the facial embedding database to obtain the first output comprises determining the one or more faces of one or more corresponding users as one of (i) a face of the user, (ii) a face of personal acquaintance of the user, or (iii) a face of an unknown user.
5. The processor implemented method of claim 1, wherein the one or more feature embedding databases further comprises one or more misclassified instances of feature embeddings of one or more image types.
6. The processor implemented method of claim 1, further comprising:
receiving user feedback on (i) the classified image, and (ii) the associated
interpretable textual explanation for the classified image; and
performing one of:
outputting (i) the classified image, and (ii) the associated interpretable textual explanation for the classified image based on the user feedback; or
re-classifying the classified image and modifying the associated interpretable textual explanation for the classified image.
7. The processor implemented method of claim 6, further comprising updating
the one or more feature embedding databases based on the user feedback.

8. The processor implemented method of claim 1, further comprising retraining the explainable privacy prediction model based on a comparison of (i) number of classified images being re-classified and (ii) a misclassification threshold.
9. The processor implemented method of claim 8, wherein the misclassification threshold is one of an empirically determined threshold, or a pre-defined misclassification threshold.
10. A system (100), comprising:
a memory (102) storing instructions;
one or more communication interfaces (106); and
one or more hardware processors (104) coupled to the memory (102) via the one or more communication interfaces (106), wherein the one or more hardware processors (104) are configured by the instructions to:
obtain an image as an input from a user;
transmit the obtained image to (i) an explainable privacy prediction model, (ii) a face recognition model, and (iii) a feature generating model;
classify, by using the explainable privacy prediction model, the obtained image to obtain one or more classes of the obtained image;
detect, by using the face recognition model, one or more faces of one or more corresponding users in the obtained image and generating one or more associated facial embeddings of the one or more corresponding users;
perform a first comparison of the one or more associated facial embeddings with one or more facial embeddings comprised in a facial embedding database to obtain a first output;
generate, by using the feature generating model, a feature embedding for the obtained image;

perform a second comparison of the feature embedding with one or more feature embeddings comprised in one or more feature embedding databases to obtain a second output; and
classify the obtained image based on the one or more classes of the obtained image, the first output, and the second output to obtain (i) a classified image, and (ii) an associated interpretable textual explanation for the classified image.
11. The system of claim 10, wherein the one or more classes of the obtained image is at least one of a public image, and a private image.
12. The system of claim 10, wherein the one or more feature embedding databases comprises at least one of a private image embedding database, and a public image embedding database.
13. The system of claim 10, wherein the first comparison of the one or more associated facial embeddings with one or more facial embeddings comprised in the facial embedding database performed to obtain the first output comprises determining the one or more faces of one or more corresponding users as one of (i) a face of the user, (ii) a face of personal acquaintance of the user, or (iii) a face of an unknown user.
14. The system of claim 10, wherein the one or more feature embedding databases further comprises one or more misclassified instances of feature embeddings of one or more image types.
15. The system of claim 10, wherein the one or more hardware processors are further configured by the instructions to:
receiving user feedback on (i) the classified image, and (ii) the associated interpretable textual explanation for the classified image; and performing one of:

outputting (i) the classified image, and (ii) the associated interpretable textual explanation for the classified image based on the user feedback; or
re-classifying the classified image and modifying the associated interpretable textual explanation for the classified image.
16. The system of claim 15, wherein the one or more hardware processors are further configured by the instructions to update the one or more feature embedding databases based on the user feedback.
17. The system of claim 10, wherein the one or more hardware processors are further configured by the instructions to retrain the explainable privacy prediction model based on a comparison of (i) number of classified images being re-classified and (ii) a misclassification threshold.
18. The system of claim 17, wherein the misclassification threshold is one of an empirically determined threshold, or a pre-defined misclassification threshold.

Documents

Application Documents

# Name Date
1 202121049709-STATEMENT OF UNDERTAKING (FORM 3) [29-10-2021(online)].pdf 2021-10-29
2 202121049709-REQUEST FOR EXAMINATION (FORM-18) [29-10-2021(online)].pdf 2021-10-29
3 202121049709-FORM 18 [29-10-2021(online)].pdf 2021-10-29
4 202121049709-FORM 1 [29-10-2021(online)].pdf 2021-10-29
5 202121049709-FIGURE OF ABSTRACT [29-10-2021(online)].jpg 2021-10-29
6 202121049709-DRAWINGS [29-10-2021(online)].pdf 2021-10-29
7 202121049709-DECLARATION OF INVENTORSHIP (FORM 5) [29-10-2021(online)].pdf 2021-10-29
8 202121049709-COMPLETE SPECIFICATION [29-10-2021(online)].pdf 2021-10-29
9 Abstract1.jpg 2021-12-14
10 202121049709-Proof of Right [21-02-2022(online)].pdf 2022-02-21
11 202121049709-FORM-26 [20-04-2022(online)].pdf 2022-04-20
12 202121049709-FER.pdf 2023-10-25
13 202121049709-OTHERS [23-02-2024(online)].pdf 2024-02-23
14 202121049709-FER_SER_REPLY [23-02-2024(online)].pdf 2024-02-23
15 202121049709-CLAIMS [23-02-2024(online)].pdf 2024-02-23

Search Strategy

1 SearchStrategyMatrixE_15-09-2023.pdf