Resilient Image Feature Description through Evolution

: Feature description is an important stage in many different vision algorithms. Image features detected by various detectors can be described using descriptors either with a binary or floating-point structure. This study presents the use of evolutionary algorithms, namely Genetic Algorithms (GA), in order to improve the robustness of the feature descriptors against increasing levels of photographic distortions such as noise or JPEG compression. Original feature descriptors were evolved in order to reduce the descriptor distance for the mentioned test cases. Results, tested using a statistical framework, suggest that the evolved descriptors offer better matching performance for two state-of-the-art descriptors.


Introduction
A feature is an image primitive that contains valuable information about the content of the image. Every feature appearing in an image shadows a real-world object. A feature can be in form of a corner [1], an edge [2], a small region (blob) [3] or a segment [4]. Features are represented using descriptors, which are calculated using the pixel information around the feature using a variety of methods: A small patch of surrounding pixels can comprise the descriptor, or a more complex description like an oriented gradient histogram [5]. The literature presents many different feature detectors and descriptors. Evaluations of many feature detectors can be found in [6][7][8]. Based on the review given therein, a good feature detector should be able to detect features that are stable in terms of geometry under different viewing conditions [9,10], should present significant amount of variation in its neighbourhood so that they will be prominent and provide useful information as well as presenting good localization accuracy [11]. It is also important for the detector to detect such features in a reasonable amount of time, a vital requirement for real-time applications. It is known that the quality of images can be seriously affected by various imaging conditions including motion, diffusive environment, high compression or low signal to noise rates [12]. The descriptors mentioned above display a significant amount of robustness, or invariance, against such distortions. The inherent methods used in the descriptor extraction algorithms employ a number of approaches to achieve high performance, for instance robustness against noise is achieved using Gaussian smoothing. The main research question tackled in this paper is whether algorithms for enhancing the results already achieved by these inherent methods can be developed using sophisticated approaches. Evolutionary computation methods such as GA can be a good candidate for improving the robustness of the descriptors. There are a very limited number of studies focused on the use of evolutionary algorithms [13] for enhancing the performance of features used in computer vision [14]. The work by Chen et al. [15] used Ant Colony Optimization for reducing the image features (not key-points as proposed in this paper) used in classification such as first/second order moments, entropy, etc. A different study conducted in [16] aimed to select the optimal key-points extracted by SIFT to be used for face recognition. These optimal set of keypoints were the ones producing the best matches. Referred work presented a theoretical approach rather than solid experimental results. A very recent study [17] employed GA for improving the coverage of image features across the image using a robust technique based on spatial statistics as proposed in [7]. A broad classification of the descriptors can be made by looking at their structure: binary and floating-point descriptors. The binary descriptors include BRIEF [18], ORB [19], BRISK [20], FREAK [21] and LATCH [22], whereas the integer and floating-point descriptors are SIFT [23], SURF [24], KAZE [25] and AKAZE [26]. The work presented in this paper aims to improve the original descriptors generated by both binary and floating-point descriptors in order to improve robustness against a variety of changes in imaging conditions such as noise, blur and JPEG compression. A GA was used here to improve the robustness of the image descriptors under severe cases of each condition. A detailed statistical analysis using Mc Nemar's test [27] showed that there are statistically significant differences between the performances of the original and the evolved descriptors. The rest of the paper is structured as follows: Section 2 gives details of the evolutionary approach for descriptors with binary and floating-point structure. The experimental design is presented in 3, followed by Section 4 where the results are presented with a detailed statistical analysis. Finally, conclusions are drawn in Section 5.

Evolutionary Approach for Feature Descriptors
A feature descriptor is essentially a vector, either binary or floating-point, describing a local neighbourhood around a feature. This structure of a descriptor is very suitable for representation in form of a chromosome in GA context. The following sections will elaborate more on the chromosome representation, matching methods and the genetic operators used in the evolutionary process used to evolve original descriptors into a form that is more robust against a variety of distortions. Based on these quantities, the Hamming distance is defined as follows: = 11 + 00 00 + 01 + 10 + 11 (2) This distance can also be computed efficiently using options provided by modern processor instruction sets (e.g. X-OR operation) [28].

Chromosome representation
A binary feature descriptor is a bit string with constant size, hence can be represented as a binary chromosome: where is the descriptor size which can be 64, 128 or 256 bits for current binary descriptors. The gene size for the ORB descriptor was chosen as 256.
Starting with an initial population of 20 individuals, the evolutionary operations create new individuals to be added to a population of maximum 100 individuals.

Genetic operators
Selection: The GA selects the most suitable (fittest) individuals in order to create new individuals using the cross-over. This selection was implemented using the tournament selection [13]. In this selection type, 10 tournaments are performed with a tournament size of 8. Randomly selected individuals are used to fill each tournament, then a mating pool is created by selecting the fittest individuals of the tournaments. Recombination: An important stage in evolutionary computing is the phase where new individuals are added to the population. This stage is called recombination (or cross-over). The new individuals carry genetic information inherited from the parents selected during selection. The recombination was implemented using two-point crossover for the binary descriptors. The length of the chromosome region to be exchanged (i.e. cut-length) was selected as 128 bits. Mutation: A third evolutionary operator in the GA is mutation where newly created individuals are subject to random changes in their chromosome structure. For a binary descriptor this change was trivially performed using negation or flip.

Matching method
The matching method used for comparing two floating-point descriptors is selected as the Euclidean distance between two vectors: where is the size of the 128 bin floating-point KAZE descriptor for two descriptors and .

Chromosome representation
The chromosome representation for the floating-point descriptors follows the same notation with the binary descriptor, except for the fact that the genes are from the real number domain (ℝ):

Genetic operators
Selection: The selection mechanism for deciding the mating pool that will be used in the recombination phase follows the same structure described for the binary descriptors.
Recombination: The gene structure in the floating-point description varies from the binary descriptors in nature, and hence requires different genetic operators. hole arithmetic recombination was proposed in [29] for chromosomes with floating-point structure as: which uses as an aggregation weight for combining the information residing in alleles from both parents. In the equation, ≠ 0.5, since this would result in the very same genetic information to be transferred to both children. Mutation: Individual diversity was ensured using the mutation operator which generated a random value between zero and unity to be employed as an offset. This offset is added to a randomly selected gene position based on the mutation rate.

Fitness Definition
The fitness of a chromosome regardless of the binary or the floating-point structure is based on the matching performance of the descriptor for a number of distortions described in the experimental framework in Section 3. The fitness of an individual in the GA is defined to minimize the total descriptor distance for different levels of distortions such as noise, blur or JPEG compression. Here, the Hamming distance was employed for the binary descriptors and the Euclidean distance was used for the floating-point descriptors. Let , and be the distances between the original descriptor and the ones that were extracted from images subject to noise, Gaussian blur and JPEG compression respectively, the fitness function ( ) for an individual in the GA is designed as: where is the distortion level described in Section 3.1. This fitness function will result in a lower fitness value for an evolved descriptor as the total distance for each distortion case increases. This enforces the descriptors to evolve so that they will result in lower distance values in case of different levels of a variety of distortions.

Experimental Design
The following sections will elaborate the experimental framework by presenting the dataset created for different cases of distortion levels and then describe the measures used in the evaluation along with the statistical test employed.

Dataset
The original Oxford database (which can be found at http://www.robots.ox.ac.uk/~vgg/research/affine/) was used to create the dataset used in testing the presented evolutionary approach. A new database was developed here by adding different levels of distortions as shown in Figure 1.
Here there are 3 main levels with 5 different sub-levels of distortions including noise, blur and JPEG compression. Details for these levels are presented in Table 1. For the JPEG compression, a lower value from the 0 − 100 scale indicates a higher amount of compression and hence higher distortion in the image as can be seen from Figure 1. The original database included 8 datasets including bark, bikes, boat, graf, leuven, trees, wall and ubc. The newly created database includes a total of 8 × 3 × 5 = 120 images, a relatively large enough database for drawing statistically meaningful conclusions [8]

Performance Measures
The measures for evaluating the results are the same measures actually used for finding feature matches. The Hamming distance was used to measure the performance of the binary descriptor and the Euclidean distance was employed for the floating-point descriptor.

Evaluation
The evaluation method adopted for comparing the original descriptors with the evolved descriptors is the Mc Nemar's test which is a non-parametric test, a variant of 2 test. The test creates 2 × 2 contingency tables in order to compute scores. This test has a published past of usage by medical research community [30] and has recently been used for performance comparison in computer vision for the first time by Clark [7] and later for machine learning by Bostanci [27]. The test is significantly robust against Type-I error, i.e. detecting a difference when there is none.

Results
This section presents results of the statistical analysis performed in order to compare the performances of the original descriptors with the evolved descriptors. Figure 2 demonstrates the effect of the evolutionary process for the binary and the floating-point descriptors.
In the plots given above, eight datasets created by adding varying levels of distortions were tested for the descriptors. The change from the original descriptor structure after the evolutionary process was demonstrated here. The binary descriptor had more changes in its gene structure as the level of distortion increased. Note that this distance is normalized by the total number of bins used by the descriptor, hence the normalized distance values. It is interesting to see that the floating-point descriptor required less change in its gene structure through the evolutionary process. Lower levels of distortions created a higher amount of need for change in the descriptor. Tables 2 and 3 show the actual descriptor distances between matched images for each couple of images in the datasets e.g. matching the first image from the bark dataset with the rest five images in the same dataset. The benefit of the evolutionary process presented in this work is clear from the results, i.e. the evolved descriptors result in a smaller descriptor distance for both binary and floating-point descriptors.    Tables 5 and 6. For both the binary descriptor and the floating-point descriptor, the evolution resulted in smaller distances (shown in the plots with ). Statistical significance of these comparisons are represented with scores. Note that these results should be analyzed using Table 4 (given in [27,31]) which presents the confidence levels as an indicator of the statistical significance based on scores. In terms of statistical significance, the binary case resulted in significant results for 3 out of 8 datasets, while the floating-point descriptor yielded statistically significant results for the complete dataset.  Table 6. Mc Nemar's test results for the floating-point descriptor. is the number of times when the original and the evolved descriptor resulted in a smaller Euclidean distance. is the measure of the statistical significance of . Cases in which the performance difference was statistically significant for one and two-tailed predictions with confidence levels above 90% are denoted with *.

Conclusion
This paper presented an evolutionary approach for increasing the robustness of binary and floating-point descriptors. The test cases were comprised of photographic distortions such as noise, blur and JPEG compression of varying severity. A comparison was conducted for analysing their performance against the original descriptor using a new dataset derived from a popular dataset.
Results show that the evolved descriptors are different from the original descriptors since they were evolved in order to accommodate differences in local image regions due to the various types of distortions. It was shown that the performance of the evolved descriptor is significantly, in statistical terms, better than the original. Smaller descriptor distances between the evolved descriptor and the ones extracted from datasets suggest better matching performance under different conditions.
This study has shown that current state-of-the-art feature description algorithms have still room for improvement for creating completely robust feature descriptors. The evolutionary algorithms require a significant amount of time to generate satisfactory results. Real-time applications require fast detection and description of image features. Future work will investigate how the lessons learned from these results can be incorporated into the descriptor extraction process.