Abstract: The present subject matter discloses device, system and method for providing vehicle surveillance. The device may be communicatively coupled with a video capturing unit for capturing real time video data stream of vehicles. The video data stream may comprise plurality of images. The plurality of images captured is processed by concurrently detecting a region of interest from the plurality of images, recognizing alphanumeric characters from the region of interest, and tracking the license number of each vehicle. The region of interest and the alphanumeric characters indicates license number plate and license number of the vehicle respectively. Further, metadata may be extracted from the plurality of images. The metadata extracted may be further transmitted to a server configured to provide a user-interface for facilitating vehicle surveillance to a user. Further, the server is configured for executing a server application enabling the user to login and thereafter provide vehicle and driver details.
DESC:
FORM 2
THE PATENTS ACT, 1970
(39 of 1970)
&
THE PATENT RULES, 2003
COMPLETE SPECIFICATION
(See Section 10 and Rule 13)
Title of invention:
DEVICE, SYSTEM AND METHOD FOR PROVIDING VEHICLE SURVEILLANCE
APPLICANT:
Konnet Vian Private Limited
A company Incorporated in India under The Companies Act, 1956
Having address:
102, Shiv Shakti Complex
Baner Road, Opposite Food Bazar,
Pune 411045, Maharashtra, India
The following specification particularly describes the invention and the manner in which it is to be performed.
CROSS-REFERENCE TO RELATED APPLICATIONS AND PRIORITY
[001] The present application claims priority to Indian Provisional Patent Application No. 2520/MUM/2014, filed on August 05th, 2014, the entirety of which is hereby incorporated by reference.
TECHNICAL FIELD
[002] The present subject matter described herein, in general, relates to vehicle surveillance, more particularly, a device, a system and a method for recognizing a license number of a vehicle.
BACKGROUND
[003] Vehicle surveillance is a major concern in today’s traffic monitoring system. In general, the vehicle surveillance is performed through a set of cameras installed on roads. These cameras are capable of capturing images or videos of vehicles passing on the road. The images or the videos captured are further processed/ analyzed through different image/video processing techniques for recognizing a particular object from the video captured. For example, the particular object may be license number of the vehicle which is to be recognized from the image captured.
[004] Some of the techniques available for recognizing the license number of the vehicles are Automatic License Plate Recognition (ALPR) or Automatic Number Plate Recognition (ANPR) technique. The ALPR or the ANPR techniques are being used by various companies and organizations as means for identifying and recognizing the vehicles. These techniques are also used to assist law enforcement departments established for monitoring red light cameras, speeding cameras, automatic identification of stolen vehicles, and other traffic monitoring applications.
[005] These law enforcement departments use the license number of the vehicles to identify owner of the vehicle for taking appropriate legal action as and when required. But unfortunately, sometimes the images, of the vehicles, captured by the cameras becomes blur and scattered due to some naturally occurring factors like weather, wind, visibility, time of day and the like. Thus, it becomes difficult to exactly recognize the license numbers from the images of the vehicles captured. Another factor which hinders in recognizing the license number from the images of the vehicle are fonts used or writing style of characters present on license plate of the vehicle. Further, a trailer hitch, license plate cover, or other similar added components may also block or obscure an image of the license plate. Thus, identifying the license number accurately from the image captured becomes a challenge in the vehicle surveillance system.
[006] Further, the images/videos of the vehicles captured are transmitted in a continuous fashion as video stream, through a network, to a centralized server for performing video analytics. Due to the continuous transmission of the videos i.e., real time high resolution video streaming, network issues are prevalent. Also, there is a requirement of high configuration servers where videos shall be streamed and process. Thus, the vehicle surveillance becomes a challenging task with the above mentioned limitations.
SUMMARY
[007] This summary is provided to introduce aspects related to device, systems and methods for providing vehicle surveillance and the concepts are further described below in the detailed description. This summary is not intended to identify essential features of subject matter nor is it intended for use in determining or limiting the scope of the subject matter.
[008] In one implementation, a vehicle surveillance device for facilitating vehicle surveillance is disclosed. The device comprises a processor and a memory coupled to the processor. The processor executes a plurality of modules stored in the memory. The plurality of modules comprises a receiving module, a processing module, extracting module, and a transmitting module. The receiving module may receive a real time video data stream captured by a video capturing unit. Further, the video data stream may comprise a plurality of images corresponding to vehicles. Further, the processing module may process the plurality of images by concurrently performing detecting, recognizing, and tracking. The processing module may detect a region of interest from the plurality of images. The region of interest may indicate a license number plate of a vehicle. The processing module may further recognize alphanumeric characters from the region of interest. Further, the alphanumeric characters may indicate a license number of the vehicle. Further, the processing module may track the license number of each vehicle in order to count number of vehicles crossing a predefined line in a vehicle surveillance area. Further, the extracting module may extract a metadata from the plurality of images. The metadata extracted may comprise the license number of the vehicle, an image of the vehicle, and timestamp and location details associated with the vehicle. Further, the transmitting module may transmit the metadata to a server configured to provide an user-interface for facilitating vehicle surveillance to a user.
[009] In another implementation, a method for providing vehicle surveillance is disclosed. The method may comprise a step of receiving, by a processor, a real time video data stream captured by a video capturing unit. Further, the video data stream may comprise a plurality of images corresponding to vehicles. The method may further comprise processing, by the processor, the plurality of images by concurrently performing the steps of detecting, recognizing, and tracking. Further, the step of detecting is performed to detect a region of interest from the plurality of images. The region of interest may indicate a license number plate of a vehicle. Further, the step of recognizing is performed to recognize alphanumeric characters from the region of interest. The alphanumeric characters may indicate a license number of the vehicle. Further, the step of tracking is performed to track the license number of each vehicle in order to count number of vehicles crossing a predefined line in a vehicle surveillance area. The method may further comprise extracting, by the processor, a metadata from the plurality of images. The metadata comprises the license number of the vehicle, an image of the vehicle, and timestamp and location details associated with the vehicle. The method may further comprise transmitting, by the processor, the metadata to a server configured to provide a user-interface for facilitating vehicle surveillance to a user.
[0010] Yet in another implementation a non-transitory computer readable medium embodying a program executable in a computing device for providing vehicle surveillance is disclosed. The program may comprise a program code for receiving a real time video data stream captured by a video capturing unit. Further, the video data stream may comprise a plurality of images corresponding to vehicles. The program may further comprise a program code for processing the plurality of images by concurrently performing the steps of detecting, recognizing, and tracking. Further, the step of detecting is performed to detect a region of interest from the plurality of images. The region of interest may indicate a license number plate of a vehicle. Further, the step of recognizing is performed to recognize alphanumeric characters from the region of interest. The alphanumeric characters may indicate a license number of the vehicle. Further, the step of tracking is performed to track the license number of each vehicle in order to count number of vehicles crossing a predefined line in a vehicle surveillance area. The program may further comprise a program code for extracting a metadata from the plurality of images. Further, the metadata may comprise the license number of the vehicle, an image of the vehicle, and timestamp and location details associated with the vehicle. Further, the program may comprise a program code for transmitting the metadata to a server configured to provide a user-interface for facilitating vehicle surveillance to a user.
BRIEF DESCRIPTION OF THE DRAWINGS
[0011] The detailed description is described with reference to the accompanying figures. In the figures, the left-most digit(s) of a reference number identifies the figure in which the reference number first appears. The same numbers are used throughout the drawings to refer like features and components.
[0012] Figure 1 illustrates a network implementation of a device and a server for providing vehicle surveillance, in accordance with an embodiment of the present subject matter.
[0013] Figure 2 illustrates the device, in accordance with an embodiment of the present disclosure.
[0014] Figure 3 illustrates an ANX application (2A) and Security Information Management (SIM) server application (2B) implemented on the device and the server respectively, in accordance with an embodiment of the present subject matter.
[0015] Figure 4 illustrates a working of the server 104, in accordance with an embodiment of the present subject matter.
[0016] Figure 5 illustrates method for providing vehicle surveillance, in accordance with an embodiment of the present disclosure.
DETAILED DESCRIPTION
[0017] Devices, systems, and methods for providing vehicle surveillance are described. In general, physical security using closed-circuit television (CCTV) camera systems has improved in the recent years. The advancements in computing technologies have enabled reliable and robust video analytic applications that can detect and recognize events viewed in streaming video. The present subject matter discloses the device, the system, and the method for enabling distributed computing in large city traffic surveillance projects, by performing video analytic computations. Further, the video analytic computations are performed by the device termed as an “ANX device” or “Automatic Number Plate Recognition Device”. According to embodiments of present disclosure, the ANX device may be connected to a video capturing unit, for example, camera. Further, the camera may be an analog or an Internet Protocol (IP) camera installed at the road-side for capturing real time video data stream comprising plurality of images corresponding to vehicles passing through the road-side. According to embodiments of present disclosure, any type of IP camera or the analog camera may be used for low light high speed solution. Further, the ANX device analyses the real time video data stream for the vehicles passing by in order to detect and extract the license number from license number plate, associated with the vehicle, along with a time stamp and location data. Further, plurality of images may be extracted from the real time video data stream of the vehicles. The plurality of images extracted of size image of “704x480” may be processed by the ANX device in a span of 1/20th second. This increases computing speed of the ANX device for processing the plurality of images of different sizes. The information extracted is further sent from the ANX device to a server. The server further provides the information extracted to different client browser applications. Thus, the present disclosure provides an end-to-end surveillance solution that enables traffic monitoring or traffic management. The present disclosure also provides a vehicle count that may be used to identify heavy traffic zones and be used for effective traffic management.
[0018] Referring to Figure 1, a network implementation 100 of a device 102 and a server 104 for providing vehicle surveillance is illustrated, in accordance with an embodiment of the present subject matter. In one embodiment, the device 102 facilitates recognition of license number associated with a vehicle 110.
[0019] Although the present subject matter is explained considering that the server 104 is implemented as a computing system, it may be understood that the server 104 may also be implemented as a variety of computing systems, such as a laptop computer, a desktop computer, a workstation, a mainframe computer, a server, and the like. In one implementation, the server 104 may be implemented in a cloud-based environment. According to an embodiment, the ANX device 102 may be installed at road-side. Further, the device 102 may be coupled with device video capturing unit or a camera 108 focused for capturing real time video data stream comprising plurality of images of vehicle(s) 110 passing through the road-side. Further, the device 102 is configured for performing video analytics upon the plurality of images captured by the camera 108. Further, the device 102 communicatively coupled with the server 102 through a network 106.
[0020] It will be understood that the server 104 may be accessed by multiple users through one or more user devices 112-1, 112-2…112-N, collectively referred to as user 112 hereinafter, or applications residing on the user devices 112. Examples of the user devices 112 may include, but are not limited to, a portable computer, a personal digital assistant, a handheld device, and a workstation. The user devices 112 are communicatively coupled to the server 104 through the network 106.
[0021] In one implementation, the network 106 may be a wireless network, a wired network or a combination thereof. The network 106 can be implemented as one of the different types of networks, such as intranet, local area network (LAN), wide area network (WAN), the internet, and the like. The network 106 may either be a dedicated network or a shared network. The shared network represents an association of the different types of networks that use a variety of protocols, for example, Hypertext Transfer Protocol (HTTP), Transmission Control Protocol/Internet Protocol (TCP/IP), Wireless Application Protocol (WAP), and the like, to communicate with one another. Further the network 106 may include a variety of network devices, including routers, bridges, servers, computing devices, storage devices, and the like.
[0022] Referring now to Figure 2, the device 102 is illustrated in accordance with an embodiment of the present disclosure. In one embodiment, the device 102 may include at least one processor 202, an input/output (I/O) interface 204, and a memory 206. The at least one processor 202 may be implemented as one or more microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, state machines, logic circuitries, and/or any devices that manipulate signals based on operational instructions. Among other capabilities, the at least one processor 202 is configured to fetch and execute computer-readable instructions or modules stored in the memory 206. According to another embodiment, the server 104 may also include at least one processor, an input/output (I/O) interface, and a memory.
[0023] The I/O interface 204 may include a variety of software and hardware interfaces, for example, a web interface, a graphical user interface, and the like. The I/O interface may allow the device 102 to interact with a user directly or through the client devices 112. Further, the I/O interface 204 may enable the device 102 to communicate with other computing devices, such as web servers and external data servers (not shown). The I/O interface 204 can facilitate multiple communications within a wide variety of networks and protocol types, including wired networks, for example, LAN, cable, etc., and wireless networks, such as WLAN, cellular, or satellite. The I/O interface may include one or more ports for connecting a number of devices to one another or to another server.
[0024] The memory 206 may include any computer-readable medium or computer program product known in the art including, for example, volatile memory, such as static random access memory (SRAM) and dynamic random access memory (DRAM), and/or non-volatile memory, such as read only memory (ROM), erasable programmable ROM, flash memories, hard disks and SD cards. The memory 206 may include plurality of modules 208 and data 220.
[0025] The modules 208 include routines, programs, objects, components, data structures, etc., which perform particular tasks or implement particular abstract data types. In one implementation, the modules 208 may include a receiving module 210, a processing module 212, an extracting module 214, a transmitting module 216, and other modules 218. The other modules 218 may include programs or coded instructions that supplement applications and functions of the device 102.
[0026] The data 220, amongst other things, serves as a repository for storing data processed, received, and generated by one or more of the modules. The data may also include an image database 222 and other data 224.
[0027] Referring now to figure 3 illustrates an ANX application (3A) and Security Information Management (SIM) server application (3B) implemented on the device 102 and the server 104 respectively, in accordance with an embodiment of the present subject matter. According to embodiments of present disclosure, the receiving module 210 of the device 102 may receive a real time video data stream captured by the video capturing unit 108 in a continuous fashion. The video capturing unit 108 may be an analog camera or an Internet Protocol (IP) camera installed at the road-side for capturing the videos of the vehicle(s) passing through the road-side. Further, the real time video data stream comprises a plurality of images corresponding to the vehicles. In case, the plurality of images are captured by the IP cameras, the IP camera encodes the plurality of images and then transmits encoded form of the plurality of images to the device 102 for further processing. Further, a Gstreamer software application is available on the device 102 to convert the real time video data stream into a raw color video data. Further, the device 102 may perform specific computing tasks (i.e., video analytics algorithm) on the plurality of images received from the video capturing unit 108 to extract information for recognizing the license number of the vehicles 110. Further, the plurality of images may be stored in the image database 222 of the device 102.
[0028] According to embodiments of present disclosure, the processing module 212 of the device 102 may be employed for performing video analytics operations concurrently on the plurality of images. Further, the processing module 212 may further comprise an ANX controller module and ANPR analytics module. According to embodiments, device 102 may further comprise various other processor executable programmed modules as shown in the 3A of the figure 3. The various processor executable modules may be ANX watchdog module, network capture module, HTTP server module, AN decisioning module, GSM/GPRS 3G, and hardware accelerator (HWA ENC/DEC). The device 102 also comprises a local data stored in the memory 206 of the device 102. The processing module 212 may process the plurality of images by concurrently performing steps of detecting a region of interest from the plurality of images, recognizing alphanumeric characters from the region of interest, and tracking the license number of each vehicle in order to count number of vehicles crossing a predefined line in a vehicle surveillance area. Further, the region of interest indicates a license number plate of a vehicle 110. Also, the alphanumeric characters indicate a license number of the vehicle 110. Thus, by performing all the above three steps concurrently, the processing module 212 reduces the processing time of the device 102. This way, the processing capability of the device may be increased for processing the plurality of images for recognizing the license number of the vehicle 110. Further, The detail explanation of the above three steps and the various other processor executable programmed modules of the processing module 212 is provided in subsequent paragraphs of the specification.
[0029] After receiving the plurality of images from the video capturing unit 108, the ANX controller module may further transfer the plurality of images to the ANPR analytic module for further processing. The plurality of images may be received by the ANPR analytic module in frames. According to embodiments of present disclosure, the region of interest ROI in the frame may be set by user during ANPR configuration. Further, the ANPR analytic module uses the ROI set by the user to identify a text region from the region of interest (ROI) in the frame. Further, the ANPR analytic module converts the frame into gray scale, and further edge detection is performed on the ROI. The gray scale of the frame is shown below.
[0030] Further, from the gray scale image, edge detection is performed to locate the text region. At first, a step is performed for locating a single window of size W*H, whereby edge strength is maximum, and thereafter kernel method may be used on that window. A next step may be performed for locating a single window of the size W*H, whereby the edge strength is maximum, and thereafter a technique is used on this window to get the candidate locations for the text region. Further, another step is performed for locating multiple windows of the size W*H, whereby the edge strength is more than a threshold, and thereafter using kernel method on those windows. Further, the above mentioned steps for locating the text region is explained below in detail.
[0031] For a given input image and ROI, text locating may be done using various processor executable steps executed by the ANPR analytic module. In a first step, edges may be determined in X direction using Sobel operators. In a next step, Y locations are identified for which the edges are more than a threshold. Further, in next step, for each of the candidate Y locations, spanning window of size W*H may be used to find sum of edges as shown below.
[0032] Further, in the next step, the locations i.e., X coordinates and Y coordinates are identified having maximum edge strength. Further, on the selected window of size W*H, spanning window of size M*N may be used to identify the text region. This may be done by using kernels in spanning window
[0033] Further, in the next step, kernels of size Kx*Ky are spanned in the window of M*N to get the sum of edges in a kernel window. If the sum of edges is within a threshold, the kernel is taken as a valid kernel. The total number of valid kernels in a row and total number of valid kernels in M*N window are counted. In the next step, the MxN window which has maximum valid kernels and has valid kernels in center of M*N window is selected as a text location. This is to have Number Plate/text region NP in center of the window as shown below.
[0034] In case, the valid kernels are not found in center of M*N window, the maximum valid kernel count is used to select text location. The above mentioned steps are explained for text locating using a single window only. Further, the subsequent paragraphs are provided to explain the text locating using multiple windows. In the first step of the text locating, the edges are determined in X direction using the Sobel operator. In the next step, the Y locations are identified for which edges are more than a threshold. For each of the candidate Y locations, spanning window of size W*H is used to find sum of edges as shown below.
[0035] Further, in the next step, the locations i.e., the X coordinates and Y coordinates are identified which has maximum edge strength at each candidate Y location as shown below.
[0036] In the next step, clusters of locations based on the X coordinates and the Y coordinates are identified as shown below. Further, chains in each cluster are identified using a chain algorithm.
[0037] Further, in the next step, sum of edges (X direction) for the chains are determined. Based on sum of edges, C cluster locations (W*H window) are chosen. For each of the chosen cluster, kernel based operation is perform to get 2 lists i.e., L1 and L2. The first list L1 indicates candidate locations where NP is in the center of M*N window, and the second list L2 indicates candidate locations where NP is towards edges of M*N window. Based on preset value of number of candidate locations per cluster (W), the candidate location is chosen as shown below. Further, a total of C*W text location candidates are obtained.
[0038] Further, in case of the multiple candidate selection, two parameters may be defined i.e., Number of clusters (C) and Number of windows per cluster (W). Further, the Number of clusters defines the W*H windows to be selected based on sum of edges (X direction), and the Number of windows per cluster defines the M*N text windows to be selected from each cluster. Further, for the given C cluster positions, two lists L1 and L2 are prepared for each cluster. L1 contains the locations where valid kernels are above a threshold and NP is in the center of M*N window. L2 contains the locations where valid kernels are above a threshold and NP is not in the center of M*N window. Both the lists (L1 and L2) are sorted based on total valid kernels in descending order. Further, for each cluster, W candidates are picked from the list L1. If sufficient candidates are not available in the list L1, the list L2 is referred to get the remaining candidates. This way, the candidate with NP in the center will get the preference. Once C*W candidates have been obtained, an overlap between the candidates are checked. If the overlap is more than a certain threshold, one of the candidates can be removed.
[0039] After the text location is identified using edge image and the angle of rotation, the ANPR analytics module performs the segmentation of the text region in following processor executable steps. In the first step, the coordinates of text location are mapped on the grayscale ROI image as shown below.
[0040] Further, the text location is binarized as shown below.
[0041] Further, next step is performed for Connected Components based cleaning on the text region, wherein the image after performing the Connected Components based cleaning is shown below.
[0042] Further, the next step is performed to estimate the shearing angle on the cleaned up region. A step is further performed for de-shearing using the angle of shearing approximated. After performing the de-sharing, the image is shown below. Further, it can be observed from the below image that is slightly adjusted from a tilted position to a horizontally straight position.
[0043] According to embodiments of present disclosure, it may be possible that the text written is in normal font but appears to be slant because of the camera positioning. The image is shown below with the text in slant direction after 1st de-rotation.
[0044] This needs to be corrected further as it can affect segmentation and recognition. For the correction, various processor executable steps may be performed by the ANPR analytic module. In the first step, for various angles, black/white pixels are counted along the direction as shown below.
[0045] Further, for each angle, number of pixels is determined in such a manner that number of black pixels is nearly equal to image height. The number of black pixels determined is shown below.
[0046] Further, the angle is chosen where the number of peaks is maximum. The image after the 2nd de-rotation is shown below.
[0047] Now referring back to the shearing angle estimation, according to embodiments of present disclosure, the angle of shearing is estimated using Hough Transform. Firstly, the edge detection in Y direction using Sobel operator is done on the number plate image, and thereafter, the Hough Transform is then applied on this edge image. The edge detection in Y direction is shown below.
[0048] The representation of a line in orthogonal coordinate system is given as
y = ax + b, where ‘a’ is slope of line, and ‘b’ is intercept of line. Then, the line is a set of all the points [x, y], for which the equation is valid, The above equation can be re-written as
b = -ax + y, where x and y are parameters
[0049] Then the equation defines a set of all the lines [a, b], which cross the point [x, y]. For each point in the ‘XY’ coordinate system, there is a line in an ‘AB’ coordinate system. This is called as Hough Space. Further, The ‘XY’ coordinate system, and the ‘AB’ coordinate system is shown below.
XY coordinate system
[0050] According to embodiments of present disclosure, consider the W*H to be the dimensions of the input image. The, the angle of shearing i.e., “?” is estimated by performing following processor executable steps. In the first step, the X, Y coordinates are converted in the range [-1, 1] i.e.,
x’ = 2x/W-1, y’ = 2y/H-1.
Further, in the next step, bins are initialized for all the possible values of (a,b), where a ? [0, W-1] and b ?[0, H-1].
Further, in the next step, for each possible value of a, convert it in the range [-1, 1], and estimate b’ using following equations:
a’ = 2a/W-1; and
b’ = -a’x’ + y’ where a’, x’, y’? [-1, 1]
If b’ is in the range of [-1, 1], increment the bin value for (a, b) i.e., b = (b’+1)*H/2.
Further, in the next step, bin is obtained with maximum count. Further, the (am, bm) value corresponding to this bin will give angle of shearing i.e., ? = tan-1 (am).
[0051] In the next stage, for correcting the number plate sheared by an angle ?, an affine transformation may be used to shear it by negative angle –?. For performing this transformation, the transformation matrix A is defined below.
,where Sx and Sy are shear factors. Sx = 0 as number plate is sheared in Y direction only
[0052] Further, in the next step, let P be a vector representing a certain point.
P=
[0053] Then, the new coordinates Ps of the point after shearing can be computed as PsT= PT.A and xs = x, ys = -tan ?. x + y.
[0054] Further, the ANPR analytic module is configured to segment the characters from the image. For a given de-rotated image as an input, the characters are segmented using run-length of white pixels along the vertical direction as shown below.
[0055] Let M*N be the size of de-rotated image. For each pixel position ‘n’, along the X-axis, calculate the sum, W, of white pixels along the vertical direction. If W is greater than a threshold T, then column ‘n’ belongs to background.
Wn = ?m=0..M-1 PmPm = 0 for black pixel; and
Pm = 1 for white pixel
If Wn> T ? Background Else, Text
[0056] The segmented characters are shown above. But, before the segmentation, it may be necessary to pre-process the input image so as to remove the non character blobs. (A blob is defined as the segmented portion of the image that is continuously connected.) One method to do the same is chaining algorithm. Details of the chaining algorithm are described in subsequent paragraphs.
[0057] According to embodiments of present disclosure, the blobs are obtained after Connected Component (CC) based cleaning. The chaining algorithm connects the blobs in a chain such that blobs are moving along same direction. This pre-processing can help in removing vehicle headlamps which come in the text window but are far from NP text. Also, the text window sometime has non-text details like headlight. This may also get removed by chaining process. Some of the examples for the detected text region before pre-processing, the blobs after CC, and detected text region after the pre-processing is shown below.
Example 1
Example 2
[0058] The chaining process is performed in three steps. In the first step, the blobs are sorted in ascending order of top left x-coordinate of the blob. This ensures chains moving from left to right direction. In the next step, for every blob, check is performed for every other blob for the following conditions i.e., (a) dimensions are within a threshold; (b) distance between their centers is within a threshold; and (c) angle formed by line joining the centers is within a threshold. If all the above conditions are satisfied, form a pair of blobs, then all the chains initially will have 2 components as shown below.
[0059] Once all the blob pairs have been formed, 2 chains (i, j) are merged if the angle formed by line joining the centers is within a threshold as shown below.
[0060] Further, all the chains with more than 2 blobs are considered as valid chains. Chain merging continues till no further change in chains is possible. According to embodiments of present disclosure, it may be possible to get overlapping or disconnected chains. The disconnected chains can happen when distance between blobs is large and no pairing happens based on thresholds set. In order to remove overlapping chains and connect the disconnected chains, another level of merging is done which is shown below.
Examples of Overlapping Chains (shown above)
Examples of Disconnected Chains (shown above)
[0061] To remove non-text blobs for Optical character recognition (OCR), for each of the chain formed, the blobs are identified which have at least 2 vertices in the bounding box of the chain. If there is any blob which does not lie in bounding box of any chain, then it is discarded for further processing.
[0062] In the next stage, the ANPR analytics module performs OCR on the image. For a given segmented characters, ANPR analytics module recognizes them as 1 from 36 possible characters (A-Z) and (0-9). In this application, normal font type is considered for recognition purpose. Optical character recognition may be performed in two steps i.e., (1) Training; and (2) Testing/Recognition.
[0063] Training involves learning of character features by the algorithm. For all the 36 characters, a training set containing character images is created. Appropriate features for the character images are identified. For each of the training image, feature is extracted. A classifier is built using these features.
[0064] A sample set for training examples consist of raw images for all the 36 characters. Although, number of characters are 36 the actual number of classes will be 39 as 3 extra classes are added for characters 1, 3 and 4 as they can be represented in different form. Another class of negative samples which consist of random/noise images is used which does not contain any character, since this class has large variation a large no. of samples are added to this class. This will help in discarding features which are not useful in discriminating between character class and noise class.
[0065] Further, a boost base classifier uses a combination of weak classifiers to perform recognition. Therefore, common features can be used from multiple classes to improve the recognition performance and reduce algorithmic complexity. Classifiers are jointly trained and used to differentiate between different classes and the noise. This requires training that is done separately, where feature extracted from character and non character images are provided as an input. The character recognition algorithm shares features for recognition between multiple classes, thus reducing the computational load for online operation. The preprocessing of training vectors involves feature extraction. Extracted features along with class label are sent to Multi-Class boosting based classifier.
[0066] During then feature extraction, the input to the learning algorithm is a set of features which are extracted from character images. Further, the feature extraction may be performed in multiple steps. In the first step, a binary image (shown below) is provided as an input to the feature extractor. A bounding box is obtained around character using Connected Components, and these parameters are used to resize the image to size of 24*32.
[0067] The resized image is then thinned to obtain a single pixel thin image. This image is then passed to stroke length calculator module to obtain stroke length each direction at each non-zero pixel location. Thereafter, using the thinned character image, Stroke Length features for an image of size 24x32 are extracted by performing the following steps.
[0068] In the first step, the image is sub-divided into 4x4 blocks. For each block, a count of pixels, is maintain, having direction vertical/horizontal/left diagonal/right diagonal i.e., cnt_hor, cnt_ver, cnt_RD, and cnt_LD = 0. At each white pixel point, the run-length of consecutive white pixels in the vertical, horizontal, left diagonal and right diagonal directions is calculated. Further, the direction is identified which has longest run-length. In case 2 directions have same run-length, the preference is given to slant direction count. In the next step, the block to which this pixel belongs to is determined. Further, the count for direction identified is incremented for this block. For example, if horizontal direction is identified, the count for direction is incremented as “cnt_hor++”. Further, in the next step, feature vector of size 192 (48 blocks x 4 directions/block) is obtained as shown below.
[0069] A feature vector is formed for each sample in the training images. The feature vectors obtained are passed to training of the classifier. For online operation same process is followed for candidate image and provided to the recognizer module for decision. Further, the thinned characters are projected in vertical and horizontal dimensions to obtain extra features which are concatenated to the features extracted earlier. Thus total 252 (196 + 24(width) + 32(height)) features are used for classification. An input feature vector can be used for classification. Further, boosting is a simple way of classifying character and non character. The classifier H(v) is defined as:
[0070] In the above equation, M indicates the number of rounds of boosting or weak classifiers. Further, hm(v) indicates the mth weak classifier which is a decision stump based classifier of the form hm(v)= ad(vf > ?) + bd(vf= ?), where a and b are optimal regression parameters for a value ?. Further, the f indicates the feature that is used for classification. The values of a, b, f, and ? are obtained my minimizing the squared classification error at each stage of m. The algorithm for classifier formation is given below. Assuming there are N training feature vectors vi, each having a class label yi={-1,1} associated along with it. Where yi=1 indicates object class and yi= -1 indicates background class. Then, the mean squared error for N training examples, is given as:
,
where , the stage classifier is formed by minimizing at each stage . Further, the algorithm for boosting (i.e., Gentleboost Algorithm for single class decision) is performed in the following steps:
1. Initialize the weights and set
2. Repeat for
a) Fit stump:
b) Update class estimates for examples
c) Update weights for examples , where
is the th feature of the input vector that has been selected, indicates the threshold for the particular feature and and are associated regression parameters that minimizes the error for the corresponding threshold. The best along with the best which minimizes error is selected for as a particular classifier .
[0071] For the final classifier we store the best feature f along with its threshold ? and their corresponding a and b for all M classifiers. The term boosting is referred to multiplicative update of the weights wi which are increased in case of misclassification and vice-versa. In case of multiple class classification sharing of features will help to improve the performance, the above classifier can be extended for classification of C classes of objects.
[0072] According to embodiments of present disclosure, in single class classification, a classifier needs to be designed to obtain important features, for all the classes C collectively or separately. The number of features required for classification will be much less than the features required by a single object classifier to achieve similar performance. The error to be minimized is performed by the following equation.
,where each training example will have weights, where for a chosen subset will have permutations and combinations of classes, a regression stump can be fitted for each subset. For classes not chosen in subset i.e. , a weak classifier is designed to have class specific .
[0073] Further, each weak learner will have five parameters, for positive class and parameters for the negative class.
[0074] Further, an algorithm for multiclass character classification training i.e., exact search to be performed is as follows:
1. Initialize the weights and set ,
2. Repeat for
a. Repeat for
i. Fit shared stump:
ii. Evaluate error
b. Find best subset: .
c. Update the class estimates
d. Update the weights
[0075] Further, an error calculation for Joint Boosting is given for selected feature f and its corresponding threshold ? at a selected node n.
[0076] Further, in the second step of the OCR i.e., Recognition, an input to the ANPR analytic module will be individual segmented characters (as explained above). The number plate will be passed where first feature extraction is performed i.e., resizing of each character to 24x32 then binarization and thinning is performed and Stroke lengths are obtained. The obtained feature vector v is provided to the trained classifier which performs classification using the best features f, thresholds ? which were obtained from the boosting. The sum of weak classifier gives the final decision of particular character class which is shown below.
, where Class is the nearest class for particular candidate, but it also needs to exceed a predefined threshold which indicates whether the candidate belongs to character class. If it does exceed then is the output of the classifier along with the confidence else it is noise or distorted sample.
[0077] Further, during the decision making, the ANPR analytic module uses recognized number plate to improve on the performance. It checks for the format which is specific to the country and try to minimized the ambiguous cases like O 0, 1 I, 2 Z, B 8, 5 S etc.
[0078] According to the embodiments of present disclosure, the extraction module 214 of the device 102 may extract a metadata from the plurality of images. The metadata comprises the license number of the vehicle, an image of the vehicle, and timestamp and location details associated with the vehicle 110. Further, the transmitting module 216 of the device 102 may transmit the metadata to the server 104 for further analysis. The device 102 may be connected with the server 104 via the network 106. Further, the metadata may also be termed as “ANPR data” associated with the license number of the vehicle 110. According to embodiments of present disclosure, the device 102 may be interfaced with other hardware devices such as a vehicle detection loop, boom barrier and the like depending on requirement of a customer/user. The interface may be provided via RS 232/485 serial communication interface.
[0079] Now referring the 3B, of the figure 3, illustrates the server application executed on the server 104 disclosed in the present disclosure. The server application, running on the server 104, may provide a user-interface for interactively engaging the user for various activities. The server application may provide a login-interface for the user enabling the user to login to the application. The user may enter an authentication data to login to the server 104. After the login, the server application may list down number of cameras connected for a particular area/location for which the vehicle surveillance may be performed by the user.
[0080] In one example of present disclosure (as shown in block 4A, of figure 4,) the user may select “Area 2” as the are/location for which he/she may wish to perform the vehicle surveillance. After selecting the area as “Area 2”, the corresponding cameras (Camera1, Camera2, Camera3, and Camera4 in this case) associated with the “Area 2” are listed with the ON/OFF status. It can be observed from the block 4A, that the Camera 1 and the Camera 3 are in ON state, whereas the Camera 3 and Camera 4 are in OFF state.
[0081] Further, the server application may provide a registration module (as shown in block 4B of figure 4) for enabling the user to register a license number and driver image, associated with a particular vehicle. According to embodiments, the server application may allow static registration by the user when the license number data along with license number plate image, the driver data along with the drive image is available. Further, the user may register, with the server application, the license number details and driver details such as vehicle’s license number plate image in bmp format, license number, vehicle type, driver name, driver image in bmp format and the like. Further, the driver details and the vehicle details may be stored in database of the server 104.
[0082] Now, referring back to the block 4A of the figure 4, the server 104 may enable the user to select a particular camera to connect with the device 102 (connected with the selected camera). After selecting the camera and the device 102, the server 104 may further displays a configuration rules in drop-down box. According to embodiments of present subject matter, the configurations are like “line of crossing”, “vehicle direction”, “region/zone”, “maximum number plate dimension”, “minimum number plate dimension”, “maximum character dimension”, and “minimum character dimension”. Once the analytics are configured, the ANPR execution may start between the device 102 and the server 104.
[0083] The server 104, communicatively coupled with the device 102, may be configured to switch ON or switch OFF the device 102. When the device 102 is powered ON, it may check its connection status and configuration settings with the server 104. Once the device is ON and configured the device 102 initiates ANPR functions and transmits the ANPR data to the server 104 through the network 106. The server 104 receives the ANPR data from the device 102. The ANPR data may be stored in a central database of the server 104. Further, the server 104 may validate the ANPR data with the vehicle data previously registered by the users. The server 104 may be further configured to send alerts wirelessly to the users, for example, through SMS or emails. Further, the server application may be configured to view reports from the central database of the server 104.
[0084] Referring now to Figure 5, method for providing vehicle surveillance is shown, in accordance with an embodiment of the present disclosure. The method 500 may be described in the general context of computer executable instructions. Generally, computer executable instructions can include routines, programs, objects, components, data structures, procedures, modules, functions, etc., that perform particular functions or implement particular abstract data types. The method 500 may also be practiced in a distributed computing environment where functions are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, computer executable instructions may be located in both local and remote computer storage media, including memory storage devices.
[0085] The order in which the method 500 is described is not intended to be construed as a limitation, and any number of the described method blocks can be combined in any order to implement the method 500 or alternate methods. Additionally, individual blocks may be deleted from the method 500 without departing from the spirit and scope of the disclosure described herein. Furthermore, the method can be implemented in any suitable hardware, software, firmware, or combination thereof. However, for ease of explanation, in the embodiments described below, the method 500 may be considered to be implemented in the above described device 102.
[0086] At block 502, a real time video data stream captured by a video capturing unit may be received. The real time video data stream may comprise a plurality of images.
[0087] At block 504, the plurality of images may be processed concurrently by performing the steps of the blocks 504A-504C.
[0088] At block 504A, a region of interest from the plurality of images may be detected. Further, the region of interest indicates a license number plate of a vehicle.
[0089] At block 504B, alphanumeric characters from the region of interest may be recognized. Further, the alphanumeric characters may indicate a license number of the vehicle.
[0090] At block 504C, the license number of each vehicle may be tracked in order to count number of vehicles crossing a predefined line in a vehicle surveillance area.
[0091] At block 506, a metadata from the plurality of images may be extracted. Further, the metadata extracted may comprise the license number of the vehicle, an image of the vehicle, and timestamp and location details associated with the vehicle.
[0092] At block 508, the metadata extracted may be transmitted to a server configured to provide a user-interface for facilitating vehicle surveillance to a user.
[0093] Although implementations for device, methods and systems for providing the vehicle surveillance have been described in language specific to structural features and/or methods, it is to be understood that the appended claims are not necessarily limited to the specific features or methods described. Rather, the specific features and methods are disclosed as examples of implementations for providing the vehicle surveillance for recognizing the license number of the vehicles. ,CLAIMS:WE CLAIM:
1. A method for providing a vehicle surveillance, the method comprising:
receiving, by a processor, a real time video data stream captured by a video capturing unit, wherein the video data stream comprises a plurality of images corresponding to vehicles;
processing, by the processor, the plurality of images by concurrently
detecting a region of interest from the plurality of images, wherein the region of interest indicates a license number plate of a vehicle,
recognizing alphanumeric characters from the region of interest, wherein the alphanumeric characters indicates a license number of the vehicle, and
tracking the license number of each vehicle in order to count number of vehicles crossing a predefined line in a vehicle surveillance area;
extracting, by the processor, a metadata from the plurality of images, wherein the metadata comprises the license number of the vehicle, an image of the vehicle, and timestamp and location details associated with the vehicle; and
transmitting, by the processor, the metadata to a server configured to provide a user-interface for facilitating vehicle surveillance to a user.
2. The method of claim 1, wherein the video capturing unit is one of an analog camera or an Internet Protocol (IP) camera.
3. The method of claim 2 further comprising digitizing the video data stream received from the analog camera.
4. The method of claim 1, wherein user-interface provides a login-interface for enabling the user to login to the server for tracking the vehicles in a specific location.
5. A vehicle surveillance device 102, comprising:
a processor 202;
a memory 206 coupled with the processor 202, wherein the processor 202 executes a plurality of modules 208 stored in the memory 206, and wherein the plurality of modules 208 comprises:
a receiving module 210 to receive a real time video data stream captured by a video capturing unit, wherein the video data stream comprises plurality of images corresponding to vehicles;
a processing module 212 to process the plurality of images by concurrently
detecting a region of interest from the plurality of images, wherein the region of interest indicates a license number plate of a vehicle,
recognizing alphanumeric characters from the region of interest, wherein the alphanumeric characters indicates a license number of the vehicle, and
tracking the license number of each vehicle in order to count number of vehicles crossing a predefined line in a vehicle surveillance area;
extracting module 214 to extract a metadata from the plurality of images, wherein the metadata comprises the license number of the vehicle, an image of the vehicle, and timestamp and location details associated with the vehicle; and
transmitting module 216 to transmit the metadata to a server configured to provide an user-interface for facilitating vehicle surveillance to a user.
6. The vehicle surveillance device 102 of claim 5, wherein the video capturing unit is one of an analog camera or an Internet Protocol (IP) camera.
7. The vehicle surveillance device 102 of claim 6 further comprising an encoder for digitizing the video data received from the analog camera.
8. A non-transitory computer readable medium embodying a program executable in a computing device for providing a vehicle surveillance, the program comprising:
a program code for receiving a real time video data stream captured by a video capturing unit, wherein the video data stream comprises a plurality of images corresponding to vehicles;
a program code for processing the plurality of images by concurrently
detecting a region of interest from the plurality of images, wherein the region of interest indicates a license number plate of a vehicle,
recognizing alphanumeric characters from the region of interest, wherein the alphanumeric characters indicates a license number of the vehicle, and
tracking the license number of each vehicle in order to count number of vehicles crossing a predefined line in a vehicle surveillance area;
a program code for extracting a metadata from the plurality of images, wherein the metadata comprises the license number of the vehicle, an image of the vehicle, and timestamp and location details associated with the vehicle; and
a program code for transmitting the metadata to a server configured to provide a user-interface for facilitating vehicle surveillance to a user.
| # | Name | Date |
|---|---|---|
| 1 | 2520-MUM-2014-FORM 26(30-10-2014).pdf | 2014-10-30 |
| 2 | 2520-MUM-2014-CORRESPONDENCE(30-10-2014).pdf | 2014-10-30 |
| 3 | 2520-MUM-2014-PA [19-03-2018(online)].pdf | 2018-03-19 |
| 4 | 2520-MUM-2014-FORM28 [19-03-2018(online)]_106.pdf | 2018-03-19 |
| 5 | 2520-MUM-2014-FORM28 [19-03-2018(online)].pdf | 2018-03-19 |
| 6 | 2520-MUM-2014-FORM-26 [19-03-2018(online)].pdf | 2018-03-19 |
| 7 | 2520-MUM-2014-FORM FOR SMALL ENTITY [19-03-2018(online)].pdf | 2018-03-19 |
| 8 | 2520-MUM-2014-EVIDENCE FOR REGISTRATION UNDER SSI [19-03-2018(online)].pdf | 2018-03-19 |
| 9 | 2520-MUM-2014-ASSIGNMENT DOCUMENTS [19-03-2018(online)]_116.pdf | 2018-03-19 |
| 10 | 2520-MUM-2014-ASSIGNMENT DOCUMENTS [19-03-2018(online)].pdf | 2018-03-19 |
| 11 | 2520-MUM-2014-8(i)-Substitution-Change Of Applicant - Form 6 [19-03-2018(online)]_109.pdf | 2018-03-19 |
| 12 | 2520-MUM-2014-8(i)-Substitution-Change Of Applicant - Form 6 [19-03-2018(online)].pdf | 2018-03-19 |
| 13 | 2520-MUM-2014-ORIGINAL UNDER RULE 6 (1A)-ASSIGNMENT & FORM 26-26-03-2018.pdf | 2018-03-26 |
| 14 | Form 2.pdf | 2018-08-11 |
| 15 | Figure of Abstract.jpg | 2018-08-11 |
| 16 | 2520-MUM-2014-FORM 1(2-9-2014).pdf | 2018-08-11 |
| 17 | 2520-MUM-2014-CORRESPONDENCE(2-9-2014).pdf | 2018-08-11 |
| 18 | 2520-MUM-2014-REPLY FROM SECRECY DIRECTION-061118.pdf | 2018-11-12 |
| 19 | 2520-MUM-2014-FER.pdf | 2019-07-30 |
| 20 | 2520-MUM-2014-CORRESPONDENCE(IPO)-(DEFENCE LETTER)-(8-6-2018).pdf | 2019-08-01 |
| 21 | 2520-MUM-2014-AbandonedLetter.pdf | 2020-02-13 |
| 1 | SEARCHSTRATEGY_03-07-2019.pdf |