# Nearest feature line

Post-publication activity

Curator: Stan Z Li

The nearest feature line (NFL) method extends the classification capability of the nearest neighbor (NN) method by taking advantages of multiple (more than one) templates per class. It effectively improves the classification performance especially when the number of templates (sample size) per class is small, a problem frequently encountered in many applications such as in face recognition problems.

It is suggested in (Li and Lu, 1998) that the ability to generalize better with small sample size is due to the feature lines' ability to expand the representational capacity of the available points, accounting for new conditions not represented by the original set. There have been attempts of theoretical analysis explaining why nearest feature line works (Zhou et al., 2000).

Figure 1: Modified from Li and Lu (1998) with permission. Generalization of two feature points $$\mathbf{x_1}$$ and $$\mathbf{x_2}$$ to the feature line $$\overline{\mathbf{x_1}\mathbf{x_2}}\ .$$ The feature point $$\mathbf{x}$$ of a query face is projected onto the line as point $$\mathbf{p}\ .$$.

## Motivation

An important part of pattern classification methods is the definition of a distance metric. Examples include Euclidean distance, Mahalanobis distance, Cosine distance, Hamming distance. Such distances are defined between two feature points (FPs) which are vectors in a feature space. The nearest neighbor (NN) classification method calculates the distance from the query FP to each template FP of a class, and choose the minimum of those distances as the query-to-class distance.

In many applications, multiple template FPs are available for a class. Such information may be used to improve classification performance. This has been ignored by the NN type of methods.

## Overview of Nearest Feature Line

The nearest feature line (NFL) method (Li and Lu, 1998), originally proposed for face recognition and subsequently used in many applications, takes advantages of multiple FPs per class to infer intra-class variations from them and extend their capacity to represent pattern classes.

The nearest feature line method assumes that that multiple template FPs are available per class. Refer to Figure 1. Consider a pair of template FPs belonging to the same class, and the line passing through the two FPs. This line is called a feature line (FL). The nearest feature line uses the FL to infer intra-class variation, by interpolating and extrapolating the two FPs. Thus the FL may approximate variation of the class beyond the FPs, generalizing the representational capacity of the two FPs. The nearest feature line classification rule is based on the nearest feature line distance, namely, the distance between the query and its projection onto the FL generated between FPs.

## Definitions

### Feature Line

Consider a variation in the image space from point $$\mathbf{z_1}$$ to $$\mathbf{z_2}$$ and the incurred variation in the feature space from $$\mathbf{x_1}$$ to $$\mathbf{x_2}\ .$$ The degree of the change may be measured by $$\delta\mathbf{z}=\|\mathbf{z_2}-\mathbf{z_1}\|$$ or $$\delta\mathbf{x}=\|\mathbf{x_2}-\mathbf{x_1}\|\ .$$ When $$\delta\mathbf{z}\to\mathbf{0}$$ and thus $$\delta\mathbf{x}\to\mathbf{0}\ ,$$ the locus of $$\mathbf{x}$$ due to the change can be approximated well enough by a straight line segment between $$\mathbf{x_1}$$ and $$\mathbf{x_2}$$ when the variation is small enough. Thus any change between the two can be interpolated by a point on the line. A further small change beyond $$\mathbf{x_2}$$ can be extrapolated using the linear model.

The straight line passing through $$\mathbf{x_1}$$ and $$\mathbf{x_2}$$ of the same class, denoted $$\overline{\mathbf{x_1}\mathbf{x_2}}\ ,$$ is called a FL of that class. The query feature point $$\mathbf{x}$$ is projected onto an FL as point $$\mathbf{p}$$ (Figure 1).

Assuming that there are $$N_c>1$$ FPs available for the class $$c\ ,$$ a number of $$K_c=\frac{N_c(N_c-1)}{2}$$ lines can be constructed to represent the class. For example, $$N_c=5$$ feature points are expanded by their $$K_c=10$$ feature lines. The total number of feature lines for a number of $$M$$ classes is $$N_{total}=\sum_{c=1}^M K_c\ .$$

### FL Subspace

The FL approximates linear variants of the class derived from the two FPs, and virtually provides an infinite number of FPs of the class that the two FPs belong to. For example, in face recognition, a face image is subject to illumination, pose, expression, and so on. The FL would derive possible variations from the two face images. All FLs constitute a FL subspace of the class (FLCS). The capacity of the FP set is thus expanded beyond the original FPs. A query FP is classified to the nearest FLCS by the classification rule.

However, it should be noticed that the locus of the feature point of a face image under a perceivable variation in illumination, pose, or expression is highly nonconvex and complex (Bichsel and Pentland,1994), and can hardly be precisely described by a straight line. To obtain a more accurate description of the variations, one may suggest that a higher order curve, such as splines (Murase and Nayar, 1995), should be used. This requires (i) that there should be at least three FP points for every class, and (ii) that these points should be ordered to account for relative variations described by only one parameter.

### FL Distance

The FL distance between $$\mathbf{x}$$ to $$\overline{\mathbf{x_1}\mathbf{x_2}}$$ is derived as $d(\mathbf{x},\overline{\mathbf{x_1}\mathbf{x_2}})=\|\mathbf{x}-\mathbf{p}\|$ where $$\|\cdot\|$$ is some norm.

The projection point can be computed as $$\mathbf{p}=\mathbf{x_1}+\mu(\mathbf{x_2}-\mathbf{x_1})$$ where $$\mu\in\mathcal{R}\ ,$$ called the position parameter can be calculated from $$\mathbf{x},\mathbf{x_1}$$ and $$\mathbf{x_2}$$ as follows: Because the line segment $$\overline{\mathbf{p}\mathbf{x}}$$ is perpendicular to $$\overline{\mathbf{x_2}\mathbf{x_1}}\ ,$$ we have $$(\mathbf{p}-\mathbf{x})\cdot(\mathbf{x_2}-\mathbf{x_1})= [\mathbf{x_1}+\mu(\mathbf{x_2}-\mathbf{x_1})-\mathbf{x}]\cdot(\mathbf{x_2}-\mathbf{x_1})=0$$ where "$$\cdot$$" stands for dot product, and thus $$\mu={(\mathbf{x}-\mathbf{x_1})\cdot(\mathbf{x_2}-\mathbf{x_1}) \over (\mathbf{x_2}-\mathbf{x_1})\cdot(\mathbf{x_2}-\mathbf{x_1})}\ .$$ The parameter $$\mu$$ describes the position of $$\mathbf{p}$$ relative to $$\mathbf{x_1}$$ and $$\mathbf{x_2}\ .$$ When $$\mu=0\ ,$$ $$\mathbf{p}=\mathbf{x_1}\ .$$ When $$\mu=1\ ,$$ $$\mathbf{p}=\mathbf{x_2}\ .$$ When $$0<\mu<1\ ,$$ $$\mathbf{p}$$ is an interpolating point between $$\mathbf{x_1}$$ and $$\mathbf{x_2}\ .$$ When $$\mu>1\ ,$$ $$\mathbf{p}$$ is a forward extrapolating point on the $$\mathbf{x_2}$$ side. When $$\mu<0\ ,$$ $$\mathbf{p}$$ is a backward extrapolating point on the $$\mathbf{x_1}$$ side.

## Nearest Feature Line Classification Algorithm

Let $$\mathbf{x}^c_i$$ and $$\mathbf{x}^c_j$$ be two distinct FPs belonging to class $$c\ .$$

1. Calculate the FL distance between $$\mathbf{x}$$ of the query and each pair of feature line $$\overline{\mathbf{x}^c_i\mathbf{x}^c_j}\ ,$$ for each class $$c\ ,$$ and each pair $$i\neq j\ .$$ This yields a number of $$N_{total}$$ distances
2. Sort the distances in ascending order, each being associated with a class identifier, the two FPs and the corresponding $$\mu$$ value.
3. Define the nearest feature line distance as the rank 1 distance $d(\mathbf{x},\overline{\mathbf{x}^{c^*}_{i^*}\mathbf{x}^{c^*}_{j^*}})= \min_{1\le c\le M}\min_{1\le i < j\le N_c} d(\mathbf{x},\overline{\mathbf{x}^c_i\mathbf{x}^c_j})$

The rank 1 result, composed of the best matched class $$c^*\ ,$$ the nearest feature line distance, and the two best matched FPs $$i^*$$ and $$j^*$$ of the class, gives the nearest feature line classification. The position parameter $$\mu^*$$ of the first rank match, which indicates the position of the projection $$\mathbf{p}$$ of the query relative to $$\mathbf{x}^{c^*}_{i^*}$$ and $$\mathbf{x}^{c^*}_{j^*}\ ,$$ may be used to infer the relative position of $$\mathbf{x}\ .$$

## Results

Nearest feature line based methods have been successfully applied to various pattern classification applications, including face recognition (Li and Lu, 1999), image retrieval (Li et al., 2000), audio classification (Li, 2000), speaker recognition (Chen et al., 2002), iris recognition (Ma et al., 2002), prediction of protein locations (Gao and Wang, 2005). It improves classification performance, as compared to NN, especially when the number of templates per class is small. The results reported in (Li and Lu, 1999) (Li et al., 2000) (Li, 2000) are summarized below whereas other results can be found in the respective references.

For face recognition (Li and Lu, 1999), the nearest feature line is compared to the NN in terms of error rates on a combination of several face databases, using eigenface features (Turk and Pentland, 1991). The error rate of the nearest feature line method is between 55.6% and 65.4% of that of the NN method for one protocol, and between 43.7% and 48.3% of the latter for the other protocol.

For image retrieval (Li et al., 2000), an extensive experimental evaluation is carried out using the Brodatz texture database (Brodatz, 1966) and a general color image database. The evaluation shows that the nearest feature line produces consistently superior results over the NN-type search. For example, the nearest feature line achieves 90.00% retrieval efficiency as opposed to 74.37% of the NN search as reported in Manjunath and Ma (Manjunath and Ma, 1996) for the Brodatz database (when top 15 matches are considered). For the color image database, the nearest feature line has a 75% of increase in retrieval efficiency over the NN method (when top 20 matches are considered).

(Zhou and Chee 2005) reports experimental results comparing nearest feature line with NN and k-NN. The experiments are done on a simulated data set and 15 real-life benchmark data sets from the UCI machine learning repository. The simulated data set assumes 16 classes randomly generated from Gaussian distributions. The 15 real-life benchmark data sets from UCI machine learning repository are balance, image, ionosphere, iris, lenses, liver, newthyroid, pendigits, pima, sonar, soybean, spect, waveform, yeast, and zoo. The value k for k-NN are set to 5, 10 and 15. The evaluations are done by using leave-one-out tests. In all experiments, nearest feature line yields consistently higher accuracy rates of classifications than NN and k-NN.

## Extensions of Nearest Feature Line

Recently, several extensions of nearest feature line have been proposed. In a straightforward way, the concept of FLs can be generalized to nearest feature combinations (Li, 1998), where any number of intra-class FPs can be linearly combined, for example, feature planes determined by triplets of three FPs, with FLs as a special case of combining pairs of FPs. In (Chien and Wu, 2002), nearest feature line is extended to nearest feature space (NFS). In (Liu et al., 2004), the nearest feature line distance is extended by a nearest intra-class subspace derived by a regularized principal component analysis. In (Zhang et al., 2004), nearest feature line is extended to nearest manifold (NM). In (He, 2006), kernel method and subspace analysis are introduced to extend nearest feature line to nonlinear nearest feature line subspace. In (Pangetal., 2007), subspaces are constructed by nearest feature line distance of intra-class to achieve a desirable discriminating ability.

Theoretical analysis of nearest feature line is also developed from probability perspectives. Given infinite training data, the classification error rate of nearest feature line is equal to that of nearest neighbor (NN) method, and is always less than or equal to twice the Bayes rate (Zhou et al., 2000) (Orozco-Alzate and Castellanos-Domingue, 2007). It is shown that if the dimension of feature space is large enough, the error rate of nearest feature line is smaller than that of NN (Zhou et al., 2000).

## Previous Work

The nearest feature line has a relationship with the linear combination approach (Ullman and Basri, 1991), the latter being a shape-based approach for recognizing 3D objects from 2D images. It makes use of a linear combination of two instance templates in a feature space, whereas in (Ullman and Basri, 1991) a 3D object is represented by a linear combination of 2D boundary maps of the object. An object in the image is classified as belonging to a model object if it can be expressed as a linear combination of the views of the object for some range of coefficients.

A theory of view-based object recognition is presented in (Poggio and Edelman, 1990). It is based on the observation that the views of a shape-based 3D rigid object undergoing transformation such as rotation reside in a smooth low-dimensional manifold embedded in the space of coordinates of points attached to the object; and for the object, there exists a smooth transformation function which can map any perspective view into another view of the object. Further, it is also demonstrated that this transformation function can be approximated from a small number of views of the object. The theory is further demonstrated in (Edelman and Duvdevani-Bar, 1997) on a variety of objects, and its application is extended from recognition to categorization. However, object recognition in those studies is based on shape information alone; variations in illumination and texture of objects and non-rigid shape changes, crucial issues for face recognition, are not dealt with.

In (Vetter and Poggio, 1997), a technique is presented to synthesize a new image of an object from a single 2D view of the object using a linear combination of images of template objects of the same class, provided that the object belongs to linear object classes. This approach avoids the use of 3D models for the view synthesis and is capable of generating a new view of a 3D object from a single 2D view of the object, using both shape and texture information. The technique requires correspondence between all landmarks of template images and between the new image and one of the template images.

It is proven in (Shashua, 1992) that with ideal point light sources, the brightness of a new image at a point can be expressed as a linear combination of the brightness of three template images at the corresponding point, when the viewpoint is fixed and the images are subject to variations in illumination only. This suggests that variations in illumination can be compensated for prior to recognition, by finding the underlying linear combination according to the brightness at the corresponding point in the images, expressing the new image as the linear combination of the three images, and then matching along all the non-shadowed corresponding points.

The feature line can be considered as a simpler version of the spline type manifold of the parametric appearance representation (Murase and Nayar, 1995). There, the appearance manifold of an object is constructed from images of the object taken on a turnable (parameterized by a single parameter) under carefully controlled lighting (parameterized by another single parameter). However, such strictly controlled conditions are difficult to meet in acquiring face images. The nearest feature line provides a simple yet effective solution.

## References

• Bichsel, M. and Pentland, A.P. (1994). "Human face recognition and the face image set's topology". CVGIP: Image Understanding, 59:254-261.
• Brodatz, P. (1966). "Textures: A photographic album for artists and designers". Dover Publications, Inc., New York.
• Chen, K., Wu, T. Y., and Zhang, H. J. (2002). "On the use of nearest feature line for speaker identification". Pattern Recognition Letters, 23:41735-1746.
• Chien, J. T. and Wu, C. C. (2002). "Discriminant waveletfaces and nearest feature classifiers for face recognition". IEEE Transactions on Pattern Analysis and Machine Intelligence,(12):1644-1649.
• Edelman, S. and Duvdevani-Bar, S. (1997). "A model of visual recognition and categorization". Proceedings of Royal Society, London, B-352:1191-1202.
• Gao, Q. B. and Wang, Z. Z. (2005). "Using nearest feature line and tunable nearest neighbor methods for prediction of protein subcellular locations". Comput. Biol. Chem, pages 388-392.
• He, Y. H. (2006). "Face recognition using kernel nearest feature classifiers". In Proceedings of Intl. Conf. on Computational Intelligence and Security, pages 678-683.
• Li, S. Z. (1998). "Face recognition based on nearest linear combinations". In Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pages 839-844, Santa Barbara, CA.
• Li, S. Z. (2000). "Content-based classification and retrieval of audio using the nearest feature line method". IEEE Transactions on Speech and Audio Processing.
• Li, S. Z., Chan, K. L., and Wang, C. L.(2000). "Performance evaluation of the nearest feature line method in image classification and retrieval". IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(11)1335:1339.
• Li, S. Z. and Lu, J. (1998). "Generalizing capacity of face database for face recognition". In Proceedings of Third IEEE International Conference on Automatic Face and Gesture Recognition,pages 402-405, Nara, Japan.
• Li, S. Z. and Lu,J. (1999). "Face recognition using the nearest feature line method". IEEE Transactions on Neural Networks,10(2):439-443.
• Liu, W., Wang, Y. H., Li, S. Z., and Tan, T. N.(2004). "Nearest intra-class space classifier for face recognition". In Proceedings of International Conference on Pattern Recognition, pages 495- 498.
• Ma, L., Wang, Y., and Tan, T. (2002). "Iris recognition using circular symmetric filters". In Proceedings of International Conference on Pattern Recognition, pages 414-417.
• Manjunath, B. S. and Ma, W. Y. (1996). "Texture features for browsing and retrieval of image data". IEEE Transactions on Pattern Analysis and Machine Intelligence, 18(8):837-842.
• Murase, H. and Nayar, S. K. (1995). "Visual learning and recognition of 3-D objects from appearance". International Journal of Computer Vision, 14:5-24.
• Orozco-Alzate, M. and Castellanos-Domingue, G. (2007). "Nearest Feature Rules and Dissimilarity Representations for Face Recognition Problems", pages 337-356. ISBN 978-3-902613-03-5,Vienna, Austria.
• Pang, Y., Yuan, Y., and Li, X. (2007). "Generalized nearest feature line for subspace learning". IEE Electronics Letters, pages 1079-1080.
• Poggio, T. and Edelman, S. (1990). "A network that learn to recognize three-dimensional objects". Nature, 343:263-266.
• Shashua, A. (1992). "Illumination and view position in 3D visual recognition". InMoody,J., Hanson,S.J., and Lippman,R.L., editors, Neural Information Processing Systems, volume 4, pages 404-411, SanMateo, CA. Morgan Kaufmann.
• Turk, M. A. and Pentland, A. P. (1991). "Eigenfaces for recognition". Journal of Cognitive Neuroscience, 3(1):71-86.
• Ullman, S. and Basri, R. (1991). "Recognition by linear combinations of models". IEEE Transactions on Pattern Analysis and Machine Intelligence, 13:992-1006.
• Vetter, T. and Poggio, T. (1997). "Linear object classes and image synthesis from a single example image". IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(7):733-742.
• Zhang, J., Li, S. Z., and Wang, J. (2004). "Nearest manifold approach for face recognition". In Proceedings of Sixth IEEE International Conference on Automatic Face and Gesture Recognition, pages 223- 228, Seoul, Korea.
• Zhou, Z. L., Li, S. Z., and Chan, K. (2000). "A theoretical justification of nearest feature line method". In Proceedings of International Conference on Pattern Recognition, pages 759-762.
• Zhou, Z. L. and Chee, K. K.(2005) "The Nearest Feature Midpoint - A Novel Approach for Pattern Classification", International Journal of Information Technology, Vol.11 No.1, page 1-15.

Internal references