The present invention relates to a method of indexing a video taken by a mobile camera, wherein the video is in the form of a collection of frames each comprising a date and shooting information.
The indexing method is intended to index videos produced by moving sensors for intelligence purposes, for example for video surveillance or military intelligence based on images for subsequent searches using image analysts. For example, this indexing method may be used to index videos taken from a drone.
In the case of observation videos made from drones, the mining of these videos is most often earned out in emergency situations where it is necessary to extract as quickly as possible relevant sequences with a minimal consumption of bandwidth between the video storage facility and the video viewing and analysis station. In particular, this method may be used to index and analyze videos of very long duration, typically of a duration of about twenty hours, from which it is required to separate relevant video sequences.
Solutions exist for indexing videos to allow searching for sequences of interest. In particular, it is known to use indexing based on the date and some shooting information accompanying videos accessed from a database to facilitate future searches. Nevertheless, during a search, these solutions require that video sequences should be downloaded in their entirety to determine whether they contain the sought location. This implies high bandwidth usage between the video storage facility and the video viewing and analysis station. In addition some videos, although satisfying the criteria of date and location, are sometimes of low quality and therefore difficult to interpret, and thus require the transmission of another video therefore leading "to additional consumption of bandwidth.
There is thus a need to improve the method of indexing videos while using low bandwidth when searching a database indexed according to this method.
For this purpose, the object of the invention is an indexing method of the aforementioned type, characterized in that the method comprises for each frame: a step of selecting the date of the frame;
a step of extracting the location of the frame from the shooting information of the frame; a step of assigning interpretability to the frame, and
- a step of generating a metadata file separate from the video, comprising for
each frame of the video, the date, the location and the interpretability of the
frame, wherein it is configured to be interrogated by a request comprising at
least one criterion among at least the date, the location or the interpretability.
According to other advantageous aspects of the invention, the method of indexing a video comprises one or more of the following characteristics, taken separately or in any feasible combination:
- the step of extracting the location comprises:
o a substep of determining the relative positions of the frames relative to each other by tracking the path of the link points of the frame; and
o a sub-step of recalculating the fine-tuned shooting information of each of the frames in a world coordinate system, based on the relative positions of the frames among themselves and the shooting information of each frame;
- wherein the step of extracting the location includes:
o the possible selection of the frame as a key frame and, if the frame is a
keyframe,: o a substep of selecting the shooting information of the key frame; and o a sub-step of recalculating enhanced shooting information in a world coordinate system based on the frame shooting information by comparing the content of the frame and a set of geotagged images derived from a reference mat independent of the referenced video;
- the step of extracting the location includes:
o a substep of identifying support points" in the key frames of the video;
o'"a"sub'step'0f searching for coordinates in a ground coordinate system, of
support points of the key frames on the basis of enhanced shooting
information; o a substep of searching for coordinates in the world coordinate system, of
the support points of the key frames having enhanced shooting
information; o a sub-step of calculating a failover function from the coordinates of the
support points of the key frames in the ground coordinate system and in
the world coordinate system; and
o a sub-step of determining the location in a ground coordinate system from final shooting information obtained by applying the faiiover function to the fine-tuned shooting information;
- the step of assigning the interpretability of the frames comprises a substep of calculating an interpretability evaluation function, wherein the function applies to radiometric and geometric indices of the frames of the video;
- the step of assigning the interpretability further comprises;
o a substep of collecting at least one perceived interpretability, entered by
at least one operator, and o a substep of recalculating the evaluation function based on the function calculated by the calculation sub-step, and at least one interpretability perceived by at least one operator for at least one video sequence. The invention also relates to a method for searching video sequences contained in videos, wherein the videos are indexed according to the aforementioned indexing method, wherein the search method comprises:
- a step of creating a query comprising at least one criterion relating to at least the date, the location, or the interpretability;
- a search step in the metadata files associated with the video, frames of the video satisfying the, or each, criterion, and
- a step of providing at least one or more video sequences wherein each is reconstructed from temporally contiguous frames satisfying the, or each, criterion during the search step.
According "to a variant, the search method further comprises a step of providing at least one combination of at least two reconstructed video sequences.
Finallyrtheinventron relates to a computer program comprising software instructions which, when implemented by a computer, implement the above indexing method and/or search method.
Other features and advantages of the invention will appear upon reading the following description of embodiments of the invention, given by way of example only and with reference to the drawings, wherein:
Fig. 1 shows a schematic representation of an architecture implementing the indexing method according to the invention and applied to data collected by a
mobile carrier, and implementing the search method according to the invention;
Fig. 2 shows a schematic representation of the means for implementing the step of extracting a location using the indexing method according to the invention;
Fig. 3 and 4 show operating flowcharts of the indexing method according to the invention; and
Fig. 5 shows an operating flowchart of the search method according to the invention.
Fig. 1 shows a mobile carrier such as a drone 10 that is provided with a camera capable of shooting videos during a flight phase.
A ground data center 20 includes a server 25 and a storage disk 30. Alternatively, the server 25 and the storage disk 30 may be located in two physically separated sites.
The data center 20 is connected to a query terminal 32 of the light client type manipulated by an operator 33, via a communication network 34.
The videos 35 shot from the drone are formed as frames 37 in a known manner, wherein each contains a date and shooting information specific to each frame of the video.
The collected videos are, for example, in STANAG 4609 format.
Each video 35 defines a collection of frames 37 (or images), wherein each frame comprises shooting information. The shooting information of a frame 37 comprises, for example, the location of the four corners of the frame 37 (latitude, longitude) and the image center. The shooting information also comprises the location and orientation of the sensor at the time of shooting each frame.
Each frame 37 also comprises geometric information of the frame. The geometric information is related to the configuration of the sensor and comprises, for example, the pixel resolution, the focal 'tength;"or the'Opening of the field of view. The geometric information makes it possible to create the shooting information of a frame using a reference mat comprising a digital terrain model, if it is not already present in the frame, or to enhance the accuracy when it does exist.
The server 25 comprises a computer 40 capable of recording the videos provided by the drone in the storage disk 30. It is also designed to carry out operations on the recorded videos 35.
The computer 40 comprises a module 50 for selecting the date of a frame 37, a module 55 for determining the location of a frame 37, and a calculation module 60 for the interpretability of a frame 37.
The date selection module 50 of a frame 37 is designed to select the date contained in a frame.
The module 55 for determining the location of a frame is able to determine a location for a video frame 37 from the shooting information and/or the geometric information of the frame 37.
The module 60 for calculating interpretability of the frame implements an autonomous function for calculating the interpretability.
The interpretability of a frame is defined as the ability of the image to provide usable information for a predefined need. An interpretability is defined for each frame and selected from a range of values. For example, the National Imagery Interpretability Scale (NIIRS) is used. This scale contains 13 different levels, from level 2 for a frame that is very difficult to interpret up to level 14 for a video sequence of very good quality and whose interpretation is easy.
The autonomous interpretability calculation function is capable of calculating the interpretability for a frame 37 of a video sequence 35 from radiometric and geometric indices of the frames of the video sequence 35. The radiometric indices are, for example, the luminance, the contrast, the radiometric noise of the frame. Geometric indices are extracted from the geometric information of a frame and are the scale or resolution in the image, the inclination of the line of sight, the speed of the image center. Geometric indices are evaluated from a reference mat with a digital terrain model.
The data center 20 comprises a storage base 61 for each frame 37 of a video"35, of a calculated interpretatiitity "anti;""posstbtyr one or more interpretabilities entered by the operator 33.
The module 60 is connected to a recalculation module 62 for the autonomous calculation function, on the basis of the information contained in the storage database 61 and, in particular, the interpretabilities entered by the operator 33.
The parameters to be taken into account in order to obtain an interpretability calculation function are described, for example, in the document MISB ST 0901 Video NIIRS (27 February 2014).
The server 25 further comprises a metadata repository 63 comprising metadata files 64. Each file 64 of the repository is associated with a video 35. The metadata files 64 are designed to structure the data they contain, for example the metadata files 64 are in JSON format.
Each metadata file 64 is specific to one video 35 and comprises, for each frame 37 of the video, video frame search information 37, i.e. the date of the video frame, the location of the video frame, and the interpretability of the video frame.
The location of a frame 37 is a coordinate vector calculated by the module 55 to determine the location of a video sequence.
With reference to Fig. 2, the module 55 for determining the location of a video frame is described more precisely.
The module 55 for determining the location of a video frame comprises an aerotriangulation sub-module 65 and a sub-module 70 for generating support points capable of performing complementary processing of the shooting information of the same video 35. It further comprises a failover module 75 designed to combine the processing from the sub-modules 70 and 75, in order to enhance the information of the shooting and/or the location of the frames of a video.
The aerotriangulation sub-module 65 is able to determine the positions of the frames 37 of a "related" video sequence 35, i.e. to determine the relative positions of the frames 37 relative to one another. For this purpose, the aerotriangulation sub-module comprises a calculator 76 implementing a KLT type algorithm making it possible to follow the path of link points extracted from the frames 37 in the video sequence 35.
The aerotriangulation sub-module 65 further comprises a calculator 78 capable of calculating fine-tuned" image information for each frame 37 from the frame's shooting information and "the" relative '"positions between the frames. This calculation of fine-tuned shooting information is, for example, achieved by a bundle adjustment algorithm for reconstructing a three-dimensional structure from a succession of frames. The bundle adjustment algorithm "is based, for example, on the Levenberg-Marquardt algorithm. The aerotriangulation calculation is made in a repository linked to the geolocation of the video, called "world coordinate system".
The sub-module 70 for generating support points is connected to a database containing a reference mat 79 comprising a terrain model and a set of geotagged images.
The reference mat 79 is of the Geobase Defense type (IGN, French National Geographic Institute) or, for example, of the Reference 3D type. The set of geotagged images of the reference mat 79 contains images in which the position of all the pixels is known by their geographical position in a terrain coordinate system linked to the reference mat 79, as well as by their accuracy. Ail of these geotagged images allow, for example, the readjustment of a given image by mapping the pixels. The recalibration operation consists in recalibrating the geographic information and the pixel accuracy of the image to be recalibrated based on the pixel positions in all the geotagged images of the reference mat 79.
The sub-module 70 comprises a selection unit 80 able to select key frames. The sub-module 70 further comprises a key frame recalibration unit 82 for calculating enhanced key frame shooting information from the reference mat 79.
The recalibration unit 82 uses a so-called "downhill simplex" algorithm to maximize the mutual information between the key frames and -the geotagged images of the set of images of the reference mat 79.
Finally, the sub-module 70 comprises a support point selection unit 84, able to extract support points from the key frames recalibrated by the recalibration unit 82. The support points are points whose coordinates are known both in the video 35 and in the reference mat 79. They are extracted from the pixels of the geotagged images of the set of images of the reference mat 79. The coordinates of the extracted support points are known in the coordinate system linked to the reference mat 79 called "digital terrain".
The failover sub-module 75 is able to look for the support points calculated by the sub-module 70 in the frames processed by the module 65.
The failover sub-module 75 is furthermore designed to determine a failover function by comparingthe coordinates of the support points in the terrain coordinate system, with the coordinates of the'same'support points in the world coordinate system for the key frames.
Finally, the submodule 75 is designed to apply this function to the shooting information of a frame in order to calculate enhanced shooting information for each frame 37.
With reference to Fig. 1, the interrogation terminal 32 comprises an interface 86 and a search engine 88.
The interface 86 is capable of displaying video sequences obtained from searches carried out by the search engine 88 and relating to videos contained in the storage disk 30 of the data center 20.
A video sequence is formed by at least one frame extracted from a video 35.
The search engine 88 comprises an interface for entry by the operator 33 of the requests comprising criteria relating to at least one, and advantageously all, of the criteria, i.e. the date, the location and the interpretability.
It is designed to search the metadata file repository 63, wherein the metadata files 64 contain data satisfying the criteria. It is also designed to transfer from the storage disk 30 to the interrogation terminal 32 video sequences formed by frames satisfying the search criteria.
With reference to Fig. 3, the method of indexing a video will now be described.
During an acquisition step 90, the mobile carrier 10 acquires, by means of its camera, videos of a scene. Throughout its flight phase, the videos acquired by the mobile carrier are enriched with information such as the date and shooting information as well as geometric information. Then, the videos acquired by the mobile carrier are saved, for example, in a STANAG 4609 type format in a storage disk 30 of the data center 20 during a storage step 95.
The indexing method of the videos then applies to each frame 37 of a video 35. The following steps are implemented video by video.
During a selection step 100, the module 50 selects the date of each frame.
With reference to Fig. 4, the step 105 of extraction by the module 55 of the location of the frames 37 of the video 35 will now be described.
During a step 106, the module 76 determines for each frame 37 of the video, the coordinates of link points in order to determine the relative positions of the frames with respect to each other by tracking the path of the same link points from one frame to another. The link points are determined automatically by the KLT algorithm implemented by the module 76. In this step, the coordinates of the points of a frame are defined in a frame linked to the video, called "worfdxoordinate system".
Then, in a step 108, the aerotriangulation module 78 calculates fine-tuned shooting information for each frame based on the relative positions of the frames and the frame's shooting information.
In a step 110, the module 80 selects key frames. The key frames are chosen subjectively as a function of their content. The number of key frames to be selected depends on the length of the sequence and the redundancy of the frames of the video.
These key frames are recalibrated with respect to the set of geotagged images of the reference mat 79 during a step 112 which recalculates, only for the key frames, enhanced
shooting information in a terrain coordinate system from the frame shooting information by comparing the content of the frame and the geotagged images of the set of geotagged images of the reference mat 79 independent of the referenced video. For this purpose, the shooting information is recalculated from the comparison of the content of the frames and the content of the geotagged images of the set of images of the reference mat 79. The recalculated shooting information is chosen in a known manner to maximize the mutual information between the key frames and the geotagged images by implementing a so-called "downhill simplex" algorithm.
Finally, during a step 114, support points Pi ...Pn are selected in the key frames. The coordinates C1iS ...Cnj of the support points are evaluated in each key frame T-,. In this vase, d,i defines the coordinates of the support point P^ in the key frame of the video, in the terrain coordinate system linked to the reference mat 79.
During a step 116, the failover module 75 searches for the support points P^ ...Pn in_ the frames processed by the aerotriangulation step 108.
This step makes it possible, during the step 118, to determine for the key frame 7^ the coordinates d.i ...Cn.j of the support points P, ...Pn in the world coordinate system linked to the video processed by the aerotriangulation step 108.
In a step 120, the failover module performs a comparison of the coordinates d,i ...Cn,j and C,1i...C'nj to determine a failover function.
The failover function is a geometric transformation, for example a commonality, making it possible to make a change between the world coordinate system linked to the relative geotagging of the video and the terrain coordinate system linked to the reference mat 79.
The failover function is such that, for the "key frames, the image of the shooting information in the^wrtd'coordinate system by the failover function is the shooting information in the terrain coordinate system. It is obtained from the recognition and the knowledge of the coordinates of the support points on the key images in the world coordinate system and in the terrain coordinate system.
In a step 122, the failover module applies this failover function to all of the shooting information of the frames 37 of the video 35 in order to calculate the final shooting information. For key frames, the final shooting information is enhanced shooting information. It is calculated from the failover function for frames that are not key frames.
The location of a frame 37 is calculated from this final shooting information.
With reference" to Fig. 2, the calculation module 60 in an assignment step 123 calculates an autonomous evaluation function of the interpretability. The interpretability value calculated by this autonomous function is stored in the storage database 61, which also possibly contains, for each frame, values of perceived interpretability as entered by the operator 33.
The date, the location, and the interpretability obtained during the steps 100, 105 and 123 are transmitted to the input of the metadata repository 63, which, during the step 124, creates, for each video sequence, a metadata file 64 comprising the date, location and interpretability of each frame. The files 64 are configured to be interrogated by a query with respect to one or more criteria relating to date, location, and interpretability.
With reference to Fig. 5, the search method will now be described.
During a creation step 126, the operator 33 enters a request in the search engine 88 of the interrogation terminal 32 in order to interrogate the data contained in the storage disk 30 of the data center 20. The request includes criteria relating to the date, location and interpretability. Upon submitting the request, a search step 128 is performed in the metadata files 64 of the videos 35 to find the frames 37 satisfying the criteria provided.
The frames 37 satisfying the search criteria are then selected, and recomposed to form video sequences during a video sequence recomposition step 130. The recomposition step uses computer pointers to determine whether two frames 37 obtained during the search step succeed each other temporally in a video 35. The recomposed video sequence is a video formed of frames satisfying the search criteria in the search step 128 and succeeding one another in the video 35 from which they are derived. In the case of an isolated frame, i.e. without a previous or following frame, the video sequence will then be reduced to this single isolated frame.
The video1 sequenceswformed rtaring the recomposition step 130 are then, during a step 132 making them accessible, displayed at the interface 86 after transmission via the network 34.
Following the "step 130 making them accessible, the operator 33 advantageously enters from the interrogation terminal 32 the interpretability value as he perceives it, for each frame 37 of the sequence made available. The value entered is transmitted via the network 34 to the storage base 61 and assigned to the corresponding video sequence.
The recalculation module 62 selects from the storage base 61 the values of perceived interpretability for the frames, and recalculates the evaluation function of the module 60 from
a comparison made between the selected values of perceived interpretability and the calculated interpretability values obtained by the autonomous function. From the recalculated evaluation function, a value taking into account the interpretabilities entered by the operator 33 is recalculated. It is applied to recalculate the calculated interpretabilities of all the frames.
According to one variant, the interrogation terminal comprises a frame combining module that is designed to combine the frames of at least two video sequences transmitted by the network 34 from the storage disk 30. During the step of making them accessible, the combining module combines a sub-selection of the frames of the video sequences made accessible and generates resultant objects. For example, the combination obtained is a summary video combining the frames corresponding to a predefined location within a chosen date range, in another example, the combination obtained consists of combining for a given date, frames corresponding to different points of view at a given geographical location to form a 3D scene from a stereoscopic type of process.
Such an indexing method makes it possible, as a result of the determination of the location of the frame and to the precise calculation of the interpretability, to carry out queries on fine-tuned data.
Such an indexing method allows rapid obtaining of the only relevant results from the light client, which represents a saving of time and significant bandwidth in a context of massification of the data.
Such a method makes it possible, as a result of the calculation of the interpretability, to select only relevant images for analysis.
Such a method makes it possible, as a result of the possibility of combining the criteria of date, location, interpretability, to conduct a very effective search in the videos.
CLAIMS
1. Method of indexing a video taken with a mobile camera, wherein the video is
formed of a collection of frames (37) each comprising a date and shooting information,
wherein the method comprises for each frame (37):
- a step (100) for selecting the date of the frame (37);
- a step (105) for extracting a location of the frame from the shooting information of the frame (37);
- a step (123) for assigning interpretability to the frame (37), and
- a step (124) for generating a metadata file (64) separate from the video, comprising for each frame of the video, the date, the location and the interpretability of the frame (37), and wherein it is designed to be interrogated by a query comprising at least one criterion, i.e. at least the date, location or interpretability.
2. Method according to claim 1, wherein the step (105) for extraction of the location
comprises:
- a substep (106) for determining the relative positions of the frames (37) with respect to one another by tracking the path of the frame link points; and
- a sub-step (108) for recalculating the fine-tuned image information of each of the frames (37) in a world coordinate system, based on the relative positions of the frames (37) with respect to each other and the shooting information of each frame (37).
3. Method according to claim 1 or 2, wherein the step (105) for extracting "the location
comprises:
- the selection (110) of the frame (37) as a key frame and, if the frame (37) is a key
frame,:
- a substep of selecting the shooting information of the key frame (37); and
- a sub-step (112) of recalculating an enhanced image information in a terrain coordinate system from the image information of the frame (37) by comparing the content of the frame (37) and a set of geotagged
images from a reference mat (79) Independent of the referenced video.
4. Method according to claims 2 and 3 taken together, wherein the step (105) for
extracting the location comprises:
- a sub-step (114) for identifying support points in the key frames of the video;
- a sub-step (114) for searching the coordinates, in a terrain coordinate system, of the support points of the key frames having enhanced shooting information;
- a substep (118) for finding the coordinates, in the world terrain coordinate system, of the support points of the key frames having enhanced shooting information;
- a substep (120) for calculating a failover function based on the coordinates of the support points of the key frames in the terrain coordinate system and in the world coordinate system; and
- a sub-step (122) for determining the location in a terrain coordinate system from final shooting information obtained by applying the failover function to the fine-tuned shooting information.
5. Method according to any one of the preceding claims, wherein the frame
interpreting assignment step (123) comprises a substep of calculating an interpretability
evaluation function, wherein this function applies to radiometric and geometric indices of
video frames.
6. Method according to claim 5 wherein the step (123) of assigning the interpretability
further comprises:
- a substep of collecting at least one perceived interpretability that is entered by at least one operator (33), and
- a sub-step of recalculating the evaluation function from the function calculated by the calculation sub-step, and at least one interpretability perceived by at least one operator (33) for at least one video sequence.
7. Method for searching video sequences contained in videos, wherein the videos are
indexed according to the method of any one of the preceding claims, wherein the search
method comprises:
- a step (120) for creating a query comprising at least one criterion relating to at least the date, the location or the interpret ability;
- a search step (125) in the metadata files (64) associated with the video, for frames (37) of the video satisfying the,, or each criterion, and
- a step (130) of making accessible at least one or more video sequences respectively reconstituted from temporally-contiguous frames satisfying the, or each, criterion during the search step.
8. Search method according to claim 7, further comprising a step of providing at least
one combination of at least two reconstituted video sequences.
9. Computer program product comprising software instructions which, when
implemented by a computer, implement the method according to any one of the preceding
claims.