Abstract: Embodiments of the present disclosure relate to a system (100) and a method (600) of handwritten text analysis. The system (100) performs rotation- and scale-invariant handwritten feature extraction from grayscale document images using block-wise structural analysis and deep bit-level processing. An input image unit (104) receives a handwritten image which is converted into grayscale format. The image is divided into a matrix of fixed-size blocks, and average pixel intensities are computed to form a detail matrix. The system (100) decomposes this matrix into multiple binary layers using bit-plane slicing and extracts skeletal features from each layer using morphological skeletonization. Selected skeletal pixels serve as anchors for angular neighborhood analysis across eight directions. Directional texture features are computed from surrounding 3×3 blocks and aggregated into a rotation-aware vector. The normalized feature representation is stored in a storage unit (106) and used for classification using convolutional neural networks.
Description:TECHNICAL FIELD
[0001] The present disclosure relates to the field of image processing technologies. More particularly, the present disclosure relates to a system and method of handwritten text analysis.
BACKGROUND
[0002] Conventional systems for handwritten text analysis often struggle with variations in writing orientation, scale, and style, leading to inconsistent feature extraction. Many existing methods rely heavily on global image characteristics, which are sensitive to noise and distortions. Bit-level structural details are typically ignored, resulting in loss of critical information from low-level pixel patterns. Rotation-invariant feature extraction is rarely addressed, making systems highly sensitive to angular deviations in handwriting. Conventional segmentation and pooling techniques often overlook local spatial relationships, reducing classification accuracy. Additionally, most models fail to generalize across diverse scripts and handwriting types due to limited training adaptability. Finally, current approaches lack an efficient mechanism to capture fine-grained directional textures, leading to poor discrimination of similar-looking characters.
[0003] To address these limitations, the present disclosure provides a novel system and method that overcomes shortcomings of the prior art.
OBJECTS OF THE PRESENT DISCLOSURE
[0004] It is an object of the present disclosure to provide a system that extracts invariant handwritten features using deep spatial and angular analysis.
[0005] It is another object of the present disclosure to provide a system that classifies handwritten characters across styles, scales, and orientations accurately.
SUMMARY
[0006] The present disclosure relates to the field of image processing technologies. More particularly, the present disclosure relates to a system and method of handwritten text analysis.
[0007] In an aspect, a system for handwritten text analysis is disclosed. The system may include a processor and a memory coupled to the processor. The memory may include processor-executable instructions, which on execution, cause the processor to execute a sequence of tasks. The system may generate a matrix by segmenting an input image into a plurality of blocks. The input image may be a grayscale representation of handwritten text. Further, the system may extract one or more skeletal features from one or more bit-plane layers of the matrix by decomposing the matrix into a plurality of binary layers based on bit-plane slicing of pixel intensity values. Further, the system may identify a reference 3×3 block centered at each skeletal pixel obtained from the one or more extracted skeletal features. Further, the system may locate at least eight neighboring 3×3 blocks positioned at predefined angular directions around the identified reference 3×3 block. Further, the system may extract one or more rotation-invariant features by computing one or more directional texture features based on any or a combination of the reference 3×3 block and the at least eight neighboring 3×3 blocks. Further, the system may generate a scale-independent feature representation of the input image by normalization of the one or more extracted rotation-invariant features.
BRIEF DESCRIPTION OF DRAWINGS
[0008] FIG. 1 illustrates an exemplary block diagram representation of a system for handwritten text analysis, in accordance with an embodiment of the present disclosure.
[0009] FIG. 2 illustrates an exemplary representation of handwritten samples to be processed by the system, in accordance with an embodiment of the present disclosure.
[0010] FIG. 3 illustrates an exemplary representation of an angle-rotation feature extraction stage of the system, in accordance with an embodiment of the present disclosure.
[0011] FIG. 4 illustrates an exemplary representation of an angle-rotation functional logic of the system, in accordance with an embodiment of the present disclosure.
[0012] FIG. 5 illustrates an exemplary flowchart representation of an angle-rotation feature extraction stage of the system using 3×3 blocks, in accordance with an embodiment of the present disclosure.
[0013] FIG. 6 illustrates an exemplary flowchart representation of a method of handwritten text analysis, in accordance with an embodiment of the present disclosure.
DETAILED DESCRIPTION
[0014] FIG. 1 illustrates an exemplary block diagram representation of a system for handwritten text analysis, in accordance with an embodiment of the present disclosure.
[0015] Illustrated in Fig. 1 is a block diagram representation of the system for handwritten text analysis 100 (hereafter referred to as system 100).
[0016] In an embodiment of the present disclosure, the system 100 may include a processing engine 102. The processing engine 102 may include a processor 102-2 operatively coupled with a memory 102-4. The processor 102-2 may include one or more microprocessors, Digital Signal Processors (DSPs), Application-Specific Integrated Circuits (ASICs), Field-Programmable Gate Arrays (FPGAs), System-On-Chip (SoC) architectures, or any combination thereof, to execute machine-readable instructions stored in the memory 102-4. The processor 102-2 may enable accurate and invariant feature extraction from handwritten images for reliable analysis across variations in orientation, scale, and writing style.
[0017] In an embodiment of the present disclosure, the memory 102-4 may include one or more non-transitory, computer-readable storage media, including, but not limited to, Random-Access Memory (RAM), Read-Only Memory (ROM), flash memory, magnetic storage, optical storage, or any combination thereof. The memory 102-4 may store machine-readable instructions executable by the processor 102-2. The memory 102-4 may further include volatile and/or non-volatile storage elements operatively coupled to the processor 102-2 to facilitate real-time processing and data retention for operation. The processing engine 102 may be a central control unit of the system 100, responsible for managing and coordinating all connected components.
[0018] In an embodiment of the present disclosure, the processor 102-2 may be operatively coupled with an input image unit 104. The system 100 may utilize the input image unit 104 to receive and digitize handwritten documents into grayscale image matrices suitable for downstream processing. The input image unit 104 may perform pre-acquisition calibration and resolution normalization to ensure consistency across varying input formats. The acquired image may be subjected to adaptive histogram equalization or denoising filters prior to segmentation, enhancing the effectiveness of later feature extraction stages. The input image unit 104 may enable real-time image acquisition from scanners, cameras, or document repositories, feeding directly into Convolutional Neural Network (CNN) pipelines for handwritten region localization. In an exemplary aspect, the input image unit 104 may cooperate with transformer-based models for spatial alignment and input quality enhancement in challenging visual environments.
[0019] In an embodiment of the present disclosure, the processor 102-2 may be operatively coupled with a storage unit 106. The system 100 may utilize the storage unit 106 to persistently store grayscale input images received from the input image unit 104, as well as intermediate and final feature representations. The storage unit 106 may support high-throughput read-write operations to enable seamless integration with convolutional neural network (CNN) models and transformer architectures during training or inference. Feature vectors generated from the angle-rotation adaptive pooling process may be indexed and cached within the storage unit 106 for rapid retrieval and classification. The storage unit 106 may maintain a structured repository of labelled handwritten datasets to facilitate continual learning, fine-tuning, or retraining of deployed models. In certain implementations, the storage unit 106 may interact with distributed memory systems or embedded flash storage to support edge-based or federated inference scenarios in the system 100.
[0020] In an embodiment of the present disclosure, the processor 102-2 may be operatively coupled with a communication unit 108 and a display interface 110. The system 100 may employ the communication unit 108 to transmit extracted feature vectors, classification outcomes, or preprocessed image data to external systems or cloud-based services for advanced analysis or archival. The communication unit 108 may implement low-latency protocols optimized for machine learning inference pipelines, allowing seamless integration with remote convolutional neural network (CNN) classifiers or transformer-based recognition services. The display interface 110 may render intermediate processing results, including, but not limited to, block segmentation patterns or skeletal overlays, directly from data accessed via the storage unit 106 or acquired through the input image unit 104. The display interface 110 may further visualize classification probabilities, attention maps, or rotation-invariant feature distributions generated during execution of deep learning models within the system 100. In an exemplary and non-limiting embodiment, the communication unit 108 and the display interface 110 may jointly support interactive feedback loops for model validation, user annotation, or semi-supervised correction of handwritten input interpretations.
[0021] In an embodiment of the present disclosure, the system 100 may initiate the processing pipeline by receiving a grayscale representation of handwritten text from the input image unit 104 and segmenting the image into a matrix of uniform blocks for localized structural analysis. Each block may capture localized pixel intensity information, enabling the processor to preserve spatial granularity critical for downstream feature extraction. The matrix may then be decomposed into multiple binary layers through bit-plane slicing, isolating individual bit-level structures across pixel values to facilitate fine-grained analysis of handwritten strokes.
[0022] In an embodiment of the present disclosure, the system 100 may extract skeletal features from selected binary layers by applying morphological skeletonization algorithms that minimize structural redundancy while preserving connectivity. For each skeletal pixel, the system 100 may identify a reference 3×3 block and locate at least eight neighboring 3×3 blocks arranged in predefined angular directions to encode local orientation patterns. Directional texture features may be computed based on any or a combination of the reference and surrounding blocks, forming the basis for rotation-invariant representation. These features may be passed through a normalization layer to produce a scale-independent feature vector that is resilient to variations in writing size, slant, or curvature. Convolutional neural network (CNN) models may utilize the normalized features for classification, while transformer-based models may leverage the spatial encoding for sequence modeling in multi-character recognition tasks. The resulting feature vectors and classification outputs may be stored in the storage unit 106 or transmitted via the communication unit 108 for further processing or archival. The display interface 110 may present the processed outputs including, but not limited to, classified character labels, attention overlays, or orientation visualizations to the end user or validation operator.
[0023] In an embodiment of the present disclosure, the system 100 may perform handwritten character classification by analyzing the scale-independent feature representation of the input image received from the input image unit 104. The processor 102-2 may apply convolutional neural network (CNN) models that have been trained on large datasets of labeled handwritten text, enabling robust mapping of extracted features to specific character classes. The CNN models may include multiple convolutional and pooling layers that identify spatial hierarchies in the normalized feature vector. This hierarchical feature extraction may allow the system 100 to distinguish between visually similar letters, digits, symbols, or language-specific scripts under diverse writing styles. The classification output may be stored in the storage unit 106 for later retrieval or transmitted via the communication unit 108 to external systems for document-level interpretation. The display interface 110 may be used to present predicted labels, classification confidence scores, or misclassification alerts to human operators. The system 100 may operate in both real-time and batch modes depending on the throughput and complexity of the CNN model deployed.
[0024] In an embodiment of the present disclosure, the system 100 may execute multiscale bit-plane feature extraction by decomposing the matrix representation of the input image into multiple binary layers, each representing a specific bit-plane of pixel intensity. The system 100 may apply bit-level decomposition techniques to isolate low-level binary patterns that are often ignored in conventional grayscale models. These binary layers may then be processed using morphological models trained on fine-grained pixel intensity variations, enabling the system 100 to capture textural and edge-level information from both dominant and subtle pixel transitions. The extracted features from each bit-plane may be treated independently to preserve hierarchical texture encoding across multiple resolutions. The system 100 may utilize binary morphological operations including, but not limited to, dilation, erosion, and thinning to refine structural representation within each layer. These intermediate outputs may be stored in the storage unit 106 and selectively retrieved based on layer informativeness. The extracted bit-plane features may serve as input for downstream processes such as skeletonization or angle-aware feature computation.
[0025] In an embodiment of the present disclosure, the system 100 may perform skeletal abstraction by reducing binary shapes within each bit-plane layer to their minimal structural form. The processor 102-2 may apply morphological skeletonization and shape-preserving thinning techniques to each binary layer independently to preserve the essential stroke geometry of handwritten characters. These techniques may be trained on datasets containing diverse handwriting styles to maintain robustness against stroke width variations and irregularities. The skeletal output may include thin, connected representations of strokes that retain topological and directional properties, making them ideal for subsequent neighborhood analysis. Each skeleton pixel may be indexed and stored in the storage unit 106 for reference during rotational block traversal. The system 100 may also apply noise suppression techniques to eliminate spurious branches or disconnected fragments in the skeletonized output. The resulting skeletal features may enhance the ability of the system 100 to focus on stroke directionality and layout for classification tasks.
[0026] In an embodiment of the present disclosure, the system 100 may execute rotational neighborhood modeling by analyzing spatial context around each skeletal pixel extracted in prior processing stages. The processor 102-2 may select a 3×3 reference block centered at each skeletal pixel and identify at least eight angularly displaced neighboring 3×3 blocks in fixed orientations including, but not limited to, 0°, 45°, 90°, and so on up to 315°. These angular positions may enable the system 100 to capture directionality and local curvature of strokes invariant to handwriting rotation. The rotational modeling technique may utilize orientation-mapping models trained on spatially transformed samples to ensure robustness across diverse angular distortions. Each group of blocks may be stored or processed as a unit to retain relative spatial dependencies. The resulting contextual data may be stored in the storage unit 106 and visualized through the display interface 110 for model transparency or debugging. The communication unit 108 may transmit these features to a remote classification engine for evaluation in distributed environments.
[0027] In an embodiment of the present disclosure, the system 100 may perform texture-aware feature encoding by computing directional gradients and local texture features from the reference block and its angularly neighboring blocks. The processor 102-2 may apply 3×3 convolutional kernels, which act as feature detectors, trained on localized handwriting patches across various scripts and styles. These patch-level CNN filters may capture micro-patterns such as stroke intersections, curvatures, or line ends that carry essential discriminative information. The extracted texture features may be rotation-aware, enabling the system 100 to handle orientation distortions in cursive or slanted handwriting. Feature responses may be selectively retained based on gradient strength or entropy thresholds, allowing the system 100 to suppress noisy or redundant components. The texture features may be stored in the storage unit 106 and used by classification models deployed locally or remotely. The display interface 110 may be used to render heatmaps or attention maps based on texture gradients for interpretation.
[0028] In an embodiment of the present disclosure, the system 100 may generate feature vectors by aggregating directional and angular features from the localized 3×3 blocks through angle-rotation pooling techniques. The processor 102-2 may implement vector embedding models that transform spatial and directional cues into a compact numerical representation. These models may be trained on invariant spatial features from diverse writing styles to ensure robustness across handwriting variability. The resulting feature vectors may encode a combination of structural, textural, and directional information essential for downstream classification. Each vector may be stored in the storage unit 106 or passed to the communication unit 108 for model inference in a connected cloud or edge system. The display interface 110 may display low-dimensional projections of the feature vector space for clustering or verification analysis. The system 100 may perform this operation iteratively across all relevant skeletal regions to generate a holistic representation of the input image.
[0029] In an embodiment of the present disclosure, the system 100 may normalize the extracted feature vectors to remove size-related variance and produce scale-independent representations suitable for consistent inference. The processor 102-2 may apply statistical normalization techniques including, but not limited to, min-max scaling and Z-score transformation, to each vector component. These transformations may be computed using parameters derived from training data composed of handwritten characters of varying sizes and stroke widths. The scale-normalized feature vector may enhance model generalization, particularly when the input image unit 104 provides data from unconstrained sources. The normalized features may be stored in the storage unit 106 or streamed to the communication unit 108 for cloud-based classification. The display interface 110 may be employed to present differences between pre- and post-normalized outputs or allow manual inspection of normalization quality. This normalization stage may play a critical role in standardizing input representations before classification by deep learning models.
[0030] In an embodiment of the present disclosure, the system 100 may perform differentiation of language-specific symbols by analyzing the contextual and sequential relationships between characters within the input image. The processor 102-2 may apply attention-based sequence modeling techniques that learn positional dependencies between symbols in languages including, but not limited to, Devanagari, Arabic, and Chinese. Transformer architectures may be employed to process the scale-normalized features in parallel, attending to key regions within the input matrix based on relevance weights. These models may be trained on annotated handwriting datasets containing multiple scripts and stylistic variations, enabling robust multilingual performance. The output from this differentiation stage may identify script boundaries, special characters, or language-specific ligatures with high precision. The resulting classifications may be stored in the storage unit 106 or transmitted through the communication unit 108 for downstream text interpretation engines. The display interface 110 may visualize the attention distribution across strokes, offering interpretability of the language classification process.
[0031] FIG. 2 illustrates an exemplary representation of handwritten samples to be processed by the system, in accordance with an embodiment of the present disclosure.
[0032] Illustrated in Fig. 2 is an exemplary representation 200 of handwritten samples to be processed by the system 100.
[0033] In an embodiment of the present disclosure, the system 100 may receive an input image from the input image unit 104 and convert the image into a grayscale format to simplify processing and reduce computational overhead. A predefined block size, including, but not limited to, 2×2 pixels, may be specified to enable localized structural analysis. The image may be traversed block by block using the defined dimensions to facilitate spatial decomposition. For each block, the processor 102-2 may compute an average pixel intensity value, which may serve as a simplified representation of that region. These average values may be systematically stored in a matrix, thereby reducing resolution while preserving essential local features. The resulting matrix, alternatively referred to as a detail image or detailing matrix, may be stored in the storage unit 106 and may represent a compact version of the original image. This matrix may be used as input for downstream processes including, but not limited to, texture-based feature extraction, image compression, or pattern recognition.
[0034] In an embodiment of the present disclosure, upon receiving the detail image, the system 100 may decompose the matrix into at least eight binary layers, each representing a distinct bit-plane from the 8-bit grayscale image. This decomposition may be performed using bit-plane slicing, which may isolate individual bits across all pixel values, resulting in binary images where each layer corresponds to a specific significance level ranging from the least significant bit (LSB) to the most significant bit (MSB). Each binary layer may then undergo a pixel-wise traversal, where pixels having a value of ‘1’ may be considered significant, potentially indicating embedded structural data within that bit-plane. These identified pixels may be passed through a morphological skeletonization algorithm to extract minimal yet topologically connected representations. Skeletonization may allow the system 100 to reduce binary objects to skeletal remnants while retaining their essential shape and connectivity. The spatial indices, including row and column positions, of skeletal pixels having a final value of ‘1’ may be recorded and stored in the storage unit 106 for each layer independently. For downstream analysis, layers 5, 6, and 7 may be selected based on their structural richness and contribution to discriminative feature encoding.
[0035] FIG. 3 illustrates an exemplary representation of an angle-rotation feature extraction stage of the system, in accordance with an embodiment of the present disclosure.
[0036] Illustrated in Fig. 3 is a representation 300 of an angle-rotation feature extraction stage of the system 100.
[0037] In an embodiment of the present disclosure, upon completion of the mining process, the system 100 may retrieve index information for all extracted layers. There may be an iteration through each of the available layers by initializing a loop from index 0 to 8, thereby covering a total of 9 layers, including the input layer. For each layer during iteration, the system 100 may identify the primary reference coordinate by extracting the row and column indices corresponding to a specific location within the layer. Once the reference index is determined, a 3×3-pixel block may be extracted, cantered around the identified coordinate. This reference block may serve as the baseline for comparative neighborhood analysis. Subsequently, the system 100 may locate and aggregate eight neighboring blocks of the same 3×3 dimensions, each corresponding to specific angular orientations relative to the reference block. These neighboring blocks may be determined based on both spatial proximity and angular displacement, forming a circular pattern around the reference.
[0038] In an example embodiment of the present disclosure, let there be a reference block bounded by coordinates (4,4), (4,6), (6,4), and (6,6). The neighbouring blocks (NB) may be identified as follows:
[0039] NB1 at index (7,4), oriented at an angular displacement of 270° (i.e., 360° − 90°)
[0040] NB2 at index (6,2), angle 225° (i.e., 360° − 135°)
[0041] NB3 at index (4,1), angle 180°
[0042] NB4 at index (2,2), angle 134°
[0043] NB5 at index (1,4), angle 95°
[0044] NB6 at index (2,6), angle 45°
[0045] NB7 at index (4,7), angle 0°
[0046] NB8 at index (6,6), angle 315° (i.e., 360° − 45°)
[0047] In an embodiment of the present disclosure, each of these surrounding 3×3 blocks may be systematically collected. Following the aggregation of the reference and neighbouring blocks, the system 100 may initiate a feature extraction process focused specifically on textural characteristics within the collected regions. This extraction may be uniformly applied across all layers, thereby enabling comprehensive and layer-wise textural feature analysis for subsequent processing tasks, such as classification, segmentation, or pattern recognition. This may be performed in all layers of mining processed images.
[0048] Input: Let D ∈ RH×W be a grayscale image matrix (2D array),
with window size 3×3, sliding step = 1, and f = 8 (directional multiplier constant).
[0049] Sliding Window Operation: For each sliding window W r, c = D [r: r + 3, c: c + 3] where r ∈ [0, H − 3], c ∈ [0, W−3]: A scalar rotation-based feature F r, c using a custom transformation may be computed.
[0050] Final Feature Vector Construction: Let Cr = {Fr, c1, Fr, c2, …} be the set of nonzero feature values in row r. Then the value stored in img_feat for row r is:
[0051] Output: img_feat = {V0, V1, …, VH−3}. This is the final 1D feature vector extracted from the image based on rotated angle matrices of each 3×3 patch.
Image Layer Feature matric
5 [337.09636, 148.84164, 319.45104000000003, 306.25156, 174.38976, 347.62816, 213.81376000000006, 347.62816, 257.45476, 225.05536, 219.96099999999996, 362.88576, 197.84704000000005, 231.36100000000005, 337.09636, 359.04064, 231.36100000000005, 326.04100000000005, 310.91776000000004, 466.76223999999996, 211.23216, 303.16036, 225.05536, 347.62816, 337.09636, 326.04100000000005, 347.62816, 200.52483999999995, 216.22500000000002, 321.26224]
6 [164.34916, 63.00099999999999, 42.025, 125.88304000000001, 171.89315999999997, 95.23396, 49.28399999999999, 95.23396, 114.921, 205.20899999999995, 116.14464, 196.95844000000005, 307.80304, 201.24196000000006, 213.81376000000006, 297.24304, 231.36100000000005, 213.81376000000006, 347.62816, 231.36100000000005, 238.33923999999996, 231.36100000000005, 466.489, 467.30896, 359.04064, 205.20899999999995, 466.489, 213.81376000000006, 210.68099999999998, 83.86815999999999]
7 [101.25124, 100.74276000000003, 40.0, 179.43696000000006, 201.24196000000006, 95.23396, 179.43696000000006, 236.19599999999994, 113.56900000000002, 347.62816, 201.24196000000006, 179.43696000000006, 231.36100000000005, 142.43076, 245.61935999999997, 231.36100000000005, 297.24304, 321.26224, 347.62816, 347.62816, 359.04064, 347.62816, 347.62816, 359.04064, 359.04064, 359.04064, 347.62816, 231.36100000000005, 310.47184000000004, 307.80304]
Image Layer Norm Feature matric
5 [1.18186285, 0.52184012, 1.1199982, 1.07372071, 0.61141206, 1.21878744, 0.74963296, 1.21878744, 0.90263869, 0.78904611, 0.77118523, 1.27228072, 0.69365349, 0.81115374, 1.18186285, 1.2587997, 0.81115374, 1.14310266, 1.09008045, 1.63647259, 0.74058184, 1.06288294, 0.78904611, 1.21878744, 1.18186285, 1.14310266, 1.21878744, 0.70304188, 0.75808678, 1.12634829]
6 [0.76715005, 0.29407647, 0.19616456, 0.58759765, 0.80236398, 0.44453368, 0.23004817, 0.44453368, 0.53642897, 0.95787587, 0.54214069, 0.91936386, 1.43676498, 0.9393585, 0.99804123, 1.38747295, 1.07994834, 0.99804123, 1.62266093, 1.07994834, 1.11252142, 1.07994834, 2.17748032, 2.18130773, 1.67593218, 0.95787587, 2.17748032, 0.99804123, 0.98341811, 0.39148033]
7 [0.40670594, 0.40466347, 0.16067198, 0.7207623, 0.80834862, 0.38253573, 0.7207623, 0.94875199, 0.45618391, 1.39635264, 0.80834862, 0.7207623, 0.92933076, 0.57211581, 0.98660374, 0.92933076, 1.19396571, 1.29044603, 1.39635264, 1.39635264, 1.44219429, 1.39635264, 1.39635264, 1.44219429, 1.44219429, 1.44219429, 1.39635264, 0.92933076, 1.24710315, 1.23638312]
[0052] In an embodiment of the present disclosure, each basic unit in the image processed by the system 100 may be a 3×3 matrix block used for localized feature extraction. For each red-marked central 3×3 block, the system 100 may search the eight immediate surrounding 3×3 blocks positioned in fixed directions, including top, top-right, right, bottom-right, bottom, bottom-left, left, and top-left. During this traversal, if the extracted feature from a neighbouring block, including, but not limited to, edge direction or gradient magnitude, is non-zero, the system 100 may proceed to evaluate the next directional block. If a neighbouring block produces a zero-feature response, indicating the absence of directional energy or structural cues, the search in that direction may be terminated to suppress noise and retain relevant information. This selective traversal mechanism may help isolate dominant local features by discarding irrelevant orientations. The circular arrangement of the surrounding blocks may facilitate the capture of rotational patterns and angular transitions formed across the spatial neighbourhood. The system 100 may use this structure to extract rotation-aware spatial context, enhancing the robustness of the feature representation for classification or recognition tasks.
[0053] FIG. 4 illustrates an exemplary representation of an angle-rotation functional logic of the system, in accordance with an embodiment of the present disclosure.
[0054] Illustrated in Fig. 4 is a representation 400 of an angle-rotation functional logic of the system 100.
[0055] In an embodiment of the present disclosure, the system 100 may initiate the feature extraction process by identifying 3×3 blocks that may serve as anchor points for localized analysis. Around each central block, eight surrounding 3×3 blocks positioned in predefined directions, including N, NE, E, SE, S, SW, W, and NW, may be evaluated to form a spatial neighborhood. Directional traversal from the center block to its neighbors may be indicated by arrows representing the paths along which angular or rotational features may be extracted. Dashed circular boundaries may define the search region, encapsulating the central 3×3 block and its neighbors as a 3×3 meta-grid analysis zone. For each neighboring block, the system 100 may compute directional or angle-based features, and if a non-zero feature is detected, the search may proceed to the next block; if a zero feature is encountered, the traversal in that direction may be terminated to reduce noise. The valid directional responses collected during this traversal may be compiled into a rotation-aware feature vector. This process may be repeated iteratively across all anchor blocks throughout the image, allowing the system 100 to perform dense, localized rotation-invariant feature extraction.
[0056] FIG. 5 illustrates an exemplary flowchart representation of an angle-rotation feature extraction stage of the system using 3×3 blocks, in accordance with an embodiment of the present disclosure.
[0057] Illustrated in Fig. 5 is a flowchart representation 500 of an angle-rotation feature extraction stage of the system 100 using 3×3 blocks.
[0058] In an embodiment of the present disclosure, angular feature extraction may begin when the input image matrix is received by the system 100, potentially via the input image unit 104. A central 3×3 block, referred to as the red block, may be selected as the anchor for feature analysis. Around this anchor, the system 100 may initialize a rotational search in eight directions, top, top-right, right, bottom-right, bottom, bottom-left, left, and top-left. This search mechanism may allow the processor 102-2 to explore localized neighborhoods around the central block, thereby capturing angular and directional relationships. For each of the eight neighboring 3×3 blocks, the system 100 may attempt to extract a directional texture feature based on pixel orientation, edge gradient, or spatial variation. If a non-zero feature is extracted from a neighboring block, the system 100 may store the corresponding feature value along with its associated angular direction. This information may then contribute to forming a spatially structured representation of the neighborhood.
[0059] Conversely, if a neighboring block produces a feature value equal to zero, the system 100 may interpret this as an absence of meaningful directional content. In such cases, the processor 102-2 may terminate the traversal in that particular direction, thereby suppressing noise and improving computational efficiency. This approach may allow the system 100 to focus on meaningful regions and avoid unnecessary evaluation of homogeneous or low-informational zones. Once all directions around the central block have been evaluated, the system 100 may repeat the same process for additional anchor points across the image. As this process is repeated for every relevant 3×3 block, the processor may compile a complete set of angle-rotation features. These features may be aggregated into a compact, rotation-invariant vector representation. This vector may be normalized and then used as input for machine learning models, including, but not limited to, convolutional neural networks (CNNs) or transformer-based classifiers. The generated feature vector may be stored in the storage unit 106 for further use, transmitted to external systems via the communication unit 108, or displayed through the display interface 110 for analysis and interpretation. The system 100 may thus efficiently capture spatial and angular context by using block-based rotation-aware traversal and selective feature aggregation.
[0060] FIG. 6 illustrates an exemplary flowchart representation of a method of handwritten text analysis, in accordance with an embodiment of the present disclosure.
[0061] Illustrated in Fig. 6 is a representation of a method 600 of handwritten text analysis. The method 600 may begin with generating 602, by the processor 102-2, a matrix by segmenting an input image into a plurality of blocks. The method 600 may proceed with extracting 604, by the processor 102-2, one or more skeletal features from one or more bit-plane layers of the matrix by decomposing the matrix into a plurality of binary layers based on bit-plane slicing of pixel intensity values. The method 600 may proceed with identifying 606, by the processor 102-2, a reference 3×3 block centered at each skeletal pixel obtained from the one or more extracted skeletal features. The method 600 may proceed with locating 608, by the processor 102-2, at least eight neighboring 3×3 blocks positioned at predefined angular directions around the identified reference 3×3 block. The method 600 may proceed with extracting 610, by the processor 102-2, one or more rotation-invariant features by computing one or more directional texture features based on any or a combination of the reference 3×3 block and the at least eight neighboring 3×3 blocks. The method 600 may end with generating 612, by the processor 102-2, a scale-independent feature representation of the input image by normalization of the one or more extracted rotation-invariant features.
[0062] In an embodiment of the present disclosure, the system 100 may begin operation by receiving a handwritten document image through the input image unit 104, which may capture or load a grayscale representation of the text. This grayscale image may be divided into smaller regions using a segmentation process that generates a matrix of fixed-size blocks, allowing localized structural information to be preserved. The processor 102-2 may then compute average pixel intensity values within each block to generate a detail matrix that simplifies the input while retaining essential features. This matrix may be stored in the storage unit 106 and used as the foundation for further analysis. The system 100 may perform bit-plane slicing on the detail matrix to decompose the image into a series of binary layers, each representing a different level of pixel intensity significance. From each binary layer, the system 100 may extract skeletal features using morphological skeletonization techniques to reduce visual content to its topological core. Each skeletal point may be indexed by its row and column position and stored independently in the storage unit 106 for downstream angular analysis.
[0063] In an embodiment of the present disclosure, following skeleton extraction, the system 100 may select a 3×3 reference block centered on each skeletal pixel and initialize an angular search across eight neighboring 3×3 blocks arranged radially around the center. Each direction may be explored to determine whether a directional or texture-based feature exists. If the extracted feature from a neighbor block is non-zero, the system 100 may store that feature along with its angular position. In contrast, if the feature is zero, the system 100 may halt further traversal in that direction to reduce computational overhead and suppress noise. This block-wise traversal may be repeated across all relevant skeletal anchor points in the image. Directional features collected from surrounding blocks may then be compiled into a rotation-invariant feature vector that represents the spatial structure of the handwritten content. This feature vector may be normalized to ensure scale independence, enabling the processor to generalize across varying handwriting sizes and styles. The normalized vectors may be used as input to machine learning models, including, but not limited to, convolutional neural networks and transformer-based classifiers trained on handwritten character datasets.
[0064] In an embodiment of the present disclosure, final classification results may be saved in the storage unit 106 or transmitted to external servers or clients via the communication unit 108 for broader analysis. The system 100 may also display intermediate outputs including, but not limited to, skeleton maps, angle-direction overlays, or predicted class labels using the display interface 110. In certain implementations, the display interface 110 may also show attention maps or confidence scores associated with classification outputs to support human-in-the-loop verification. The system 100 may continue to iterate over multiple image inputs, enabling batch processing or real-time inference depending on application requirements. Feature vectors and classification outputs may be archived in structured form in the storage unit 106 to support future retrieval, analytics, or model retraining. In addition to text recognition, the extracted features may support downstream applications including, but not limited to, writer verification, symbol detection, or multilingual script segmentation. The system 100 may adapt the traversal logic based on learned feature importance or environmental conditions to optimize performance. Through this method of operation, the system 100 may achieve robust, rotation- and scale-invariant feature extraction for handwritten image analysis.
[0065] A use case of the system 100 is described herein. The system 100 may be deployed in a postal automation center to read handwritten addresses on mail parcels. The input image unit 104 may scan each envelope and generate a grayscale representation of the handwritten address region. This image may be segmented into localized blocks for structural decomposition, and the resulting detail matrix may be stored in the storage unit 106 for processing. The processor 102-2 may extract skeletal and rotation-invariant features from the scanned handwriting, enabling classification of characters regardless of slant or orientation. These features may be passed through trained convolutional neural network models to identify alphanumeric characters and language-specific postal codes. The final output may include the recognized address string, which may be transmitted via the communication unit 108 to a central logistics server for routing and tracking. The display interface 110 may present the recognized address and model confidence scores to a human operator for optional verification. Any ambiguous results may be highlighted visually to facilitate manual correction. The storage unit 106 may archive the input images, feature vectors, and classification logs for audit and compliance.
ADVANTAGES OF THE INVENTION
[0066] The present disclosure provides a system that enables robust handwritten text analysis across variations in scale and orientation.
[0067] The present disclosure provides a system that enhances feature extraction accuracy using bit-plane slicing and angular neighborhood analysis.
, Claims:1. A system (100) for handwritten text analysis, the system (100) comprising:
a processor (102-2); and
a memory (102-4) coupled to the processor (102-2), wherein the memory (102-4) comprises processor-executable instructions, which on execution, cause the processor (102-2) to:
generate a matrix by segmenting an input image into a plurality of blocks, the input image being a grayscale representation of handwritten text;
extract one or more skeletal features from one or more bit-plane layers of the matrix by decomposing the matrix into a plurality of binary layers based on bit-plane slicing of pixel intensity values;
identify a reference 3×3 block centered at each skeletal pixel obtained from the one or more extracted skeletal features;
locate at least eight neighboring 3×3 blocks positioned at predefined angular directions around the identified reference 3×3 block;
extract one or more rotation-invariant features by computing one or more directional texture features based on any or a combination of the reference 3×3 block and the at least eight neighboring 3×3 blocks; and
generate a scale-independent feature representation of the input image by normalization of the one or more extracted rotation-invariant features.
2. The system (100) as claimed in claim 1, wherein the processor (102-2) performs handwritten character classification into any or a combination of letters, digits, symbols, or language-specific scripts by implementing Convolutional Neural Network (CNN) models on the scale-independent feature representation of the input image.
3. The system (100) as claimed in claim 1, wherein the processor (102-2) executes multiscale bit-plane feature extraction by implementing any or a combination of bit-level decomposition techniques and binary morphological models trained on fine-grained pixel intensity variations.
4. The system (100) as claimed in claim 1, wherein the processor (102-2) performs skeletal abstraction of bit-plane data by implementing any or a combination of morphological skeletonization techniques and shape-preserving thinning techniques trained on binary representations of handwritten strokes.
5. The system (100) as claimed in claim 1, wherein the processor (102-2) executes rotational neighborhood modeling by implementing any or a combination of angular window traversal techniques and orientation-mapping models trained on spatially transformed handwritten samples.
6. The system (100) as claimed in claim 1, wherein the processor (102-2) performs texture-aware feature encoding by implementing any or a combination of directional gradient computation techniques and 3×3 patch-based CNN kernels trained on localized handwriting patches.
7. The system (100) as claimed in claim 1, wherein the processor (102-2) feature vector generation by implementing angle-rotation aggregation techniques and vector embedding models trained on invariant spatial features.
8. The system (100) as claimed in claim 1, wherein the processor (102-2) executes scale normalization of extracted features by implementing any or a combination of min-max and Z-score normalization techniques and statistical transformation functions trained on diverse handwritten character corpora.
9. The system (100) as claimed in claim 1, wherein the processor (102-2) performs language-specific symbol differentiation by implementing any or a combination of attention-based sequence modeling techniques and transformer architectures trained on annotated cursive and printed text samples.
10. A method (600) for handwritten text analysis, the method (600) comprising steps of:
generating (602), by a processor (102-2), a matrix by segmenting an input image into a plurality of blocks, the input image being a grayscale representation of handwritten text;
extracting (604), by the processor (102-2), one or more skeletal features from one or more bit-plane layers of the matrix by decomposing the matrix into a plurality of binary layers based on bit-plane slicing of pixel intensity values;
identifying (606), by the processor (102-2), a reference 3×3 block centered at each skeletal pixel obtained from the one or more extracted skeletal features;
locating (608), by the processor (102-2), at least eight neighboring 3×3 blocks positioned at predefined angular directions around the identified reference 3×3 block;
extracting (610), by the processor (102-2), one or more rotation-invariant features by computing one or more directional texture features based on any or a combination of the reference 3×3 block and the at least eight neighboring 3×3 blocks; and
generating (612), by the processor (102-2), a scale-independent feature representation of the input image by normalization of the one or more extracted rotation-invariant features.
| # | Name | Date |
|---|---|---|
| 1 | 202541075318-STATEMENT OF UNDERTAKING (FORM 3) [07-08-2025(online)].pdf | 2025-08-07 |
| 2 | 202541075318-REQUEST FOR EARLY PUBLICATION(FORM-9) [07-08-2025(online)].pdf | 2025-08-07 |
| 3 | 202541075318-POWER OF AUTHORITY [07-08-2025(online)].pdf | 2025-08-07 |
| 4 | 202541075318-FORM-9 [07-08-2025(online)].pdf | 2025-08-07 |
| 5 | 202541075318-FORM FOR SMALL ENTITY(FORM-28) [07-08-2025(online)].pdf | 2025-08-07 |
| 6 | 202541075318-FORM FOR SMALL ENTITY [07-08-2025(online)].pdf | 2025-08-07 |
| 7 | 202541075318-FORM 1 [07-08-2025(online)].pdf | 2025-08-07 |
| 8 | 202541075318-EVIDENCE FOR REGISTRATION UNDER SSI(FORM-28) [07-08-2025(online)].pdf | 2025-08-07 |
| 9 | 202541075318-EVIDENCE FOR REGISTRATION UNDER SSI [07-08-2025(online)].pdf | 2025-08-07 |
| 10 | 202541075318-DRAWINGS [07-08-2025(online)].pdf | 2025-08-07 |
| 11 | 202541075318-DECLARATION OF INVENTORSHIP (FORM 5) [07-08-2025(online)].pdf | 2025-08-07 |
| 12 | 202541075318-COMPLETE SPECIFICATION [07-08-2025(online)].pdf | 2025-08-07 |