Pattern Separation and Prediction via Linear and Semidefinite Programming

Xing LIU
NewDay Financial LLC 8171
Maple Lawn Blvd, Suite 300 Fulton, MD 20759, USA

Florian A. POTRA
Department of Mathematics and Statistics
University of Maryland Baltimore County, 1000 Hilltop Circle, Baltimore, MD 22150, USA

This paper is dedicated to Neculai Andrei on his sixtieth birthday

Abstract: We present several optimization methods for separating two sets of points in the n-dimensional space that have nondisjoint convex closures. We give five methods based on linear programming techniques, and two methods based on semidefinite programming techniques. For predictive purposes, we construct two parallel hyperplanes, using linear programming, or two similar concentric ellipsoids, using semidefinite programming, so that the intersection of the convex hulls of the two sets is contained between the two hyperplanes, or the two ellipsoids, and the rest of the two sets, are separated completely. We then construct another hyperplane (or ellipsoid) situated between the two constructed hyperplanes (or ellipsoids), in order to achieve a superior pattern separation. We illustrate our methods on two breast cancer databases.

Keywords: Pattern separation, data mining, linear programming, semidefinite programming.

Xing Liu is a senior research scientist at Newday financial LLC located in Fulton, Maryland. He is leading the analytical effort in the marketing department, in charge of developing the core analytical tools, mathematical models, and information driven marketing platforms. Dr. Xing Liu has a Ph.D. degree in numerical optimization from the University of Maryland, Baltimore County, and has published several research papers in professional journals.

Florian A. Potra is a Professor of Mathematics and Statistics at the University of Maryland, Baltimore County, and a faculty appointee at the National Institute of Standards and Technology. He has published over 100 research papers in professional journals, and he serves on the editorial board of four well known mathematical journals. He was invited to give talks and work with research groups in different countries all over the world, including Germany, France, Switzerland, Italy, Belgium, Spain, the Netherlands, Australia, Singapore, Japan and Canada.

>>Full text
CITE THIS PAPER AS:
Xing LIU, Florian A. POTRA, Pattern Separation and Prediction via Linear and Semidefinite Programming, Studies in Informatics and Control, ISSN 1220-1766, vol. 18 (1), pp. 71-82, 2009.

1. Introduction

Pattern separation, also known as pattern recognition, is one of the main machine learning problems, originally developed by Vapnik and co-workers [16,17]. It is also one of the fundamental problems in data mining [2,3,7]. The support vector machine approach, and sophisticated optimization techniques, have been proved to be very efficient for solving this problem, as shown by the pioneering work of O. L. Mangasarian and others [2,4,7,8,9,10,11]. In the machine

learning process, we need to consider set separation problems where each set contains only one pattern, which is assigned by the supervisor. It is known that the multi-set separation problem can be tackled by solving a sequence of two-set separation problems [14], so that in this paper, we limit our attention to the two-set separation problem, which can be stated as follows: Suppose we have two sets of points and in the -dimensional real space . We want to find a partition of , that separates the points in X from the points in Y. This separation is then used for predictive purposes. Our numerical experiments have been performed on two breast cancer data sets. In those examples, X represents patients with malignant tumors, and Y represents patients with benign tumors. The coordinates of the vectors in X and Y represent different measurements. For example in the data set [1,11,12,20,18] considered in section 5 of the present paper, we have 9 such attributes: Clump Thickness, Uniformity of Cell Size, Uniformity of Cell Shape, Marginal Adhesion, Single Epithelial Cell Size, Bare Nuclei, Bland Chromatin, Normal Nucleoli, and Mitosis, which makes our problem a two-set separation problem in a 9-dimensional real space.

An elegant approach to separating X and Y, is by finding a separating hyperplane in, such that all vectors in X lie on one side of this hyperplane and all the vectors in Y lie on the opposite side. A separating hyperplane is defined by a pair (a, ), where a is an n-dimensional vector and is a scalar, such that

for all (1)

for all (2)

If the convex hulls of X and Y are disjoint, such a separating hyperplane always exists. Finding the vector a and the scalar may be important for predictive purposes. Relations (1) and (2) say that a certain linear combination of the coordinates of all vectors in X is below a certain threshold, while the same linear combination of the coordinates of the vectors in Y is above this threshold. For the first breast cancer example, this would imply that a certain linear combination of the 9 attributes can separate malignant from benign tumors.

The coefficients of the linear combination (i.e., the coordinates of the vector a) may reveal important facts for cancer research. Once a and are obtained from the existing data base, they can be used for predictive purposes: after determining the attributes z of a new patient we may deduce that the patient is likely to have a malignant tumor if , and a benign tumor if .

We have to mention that conditions (1) and (2) do not determine the separating hyperplane (a, ) in a unique way. An optimal way of determining a and , suggested in previous work [4,7,16], is to compute them as the solution of the following minimization problem

(3)

for all (4)

for all (5)

The solution of this optimization problem, which is known as the maximal margin classifier [16,4], provides the strongest possible separation, in a certain sense. It maximizes the distance, in the dual norm ‘, between the hyperplanes (a, ) and (a, ), which is the geometric margin [4]. If we choose to be the -norm (and therefore ‘ the -norm), the exact solution of (3) can be obtained by solving 2n linear programs.

Unfortunately, in most practical applications the convex hulls of X and Y intersect, so that no separating hyperplane exists. Generalizations of the above approach were made in order to deal with this situation [9,16]. A hyperplane (a, ) is determined in such a way that most points of X satisfy (2), most points of Y satisfy (1), and the distance of the misclassified points (i.e., the points of X that do not satisfy (1) and the points of Y that do not satisfy (2)) to the hyperplane is minimized. A similar approach is used in dealing with the formulation (3)-(5), by adding to the objective function in (3) a multiple of the distance of the misclassified points (i.e., the points of X that do not satisfy (4) and the points of Y that do not satisfy (5)) to the corresponding hyperplane).

In the present paper, rather than directly trying to minimize the number of the misclassified points to a single hyperplane, we propose to separate the sets X and Y by using two parallel hyperplanes and with the following properties:

(P1) All points of X lie on one side of H₁.

(P2) All points of Y lie on the opposite side of H₂.

(P3) The intersection of the convex hulls of X and Y is contained in region C between the hyperplanes H₁ and H₂.

The points in C are called unclassified points. We would like to determine the hyperplanes and such that the number of unclassified points is small. Mangasarian [11] and Falk [5] proposed similar approaches for the two-set separation problem, with hyperplanes having the same partition properties. In the present paper, the two hyperplane separation is only used for predictive purposes: after taking measurement z of a new patient, we conclude that the patient is likely to have a malignant tumor if z is on one side of , or we conclude that the patient is likely to have a benign tumor if z is on the other side of . If z is between and , we conclude that the patient needs further investigation. For the purpose of pattern separation, we construct a hyperplane , parallel to (and lying between) and , such that most of the points of X lie on the same side () of and most of the points of Y lie on the opposite side () of . A point that fails to do so, i.e., belonging to , is called a misclassified point. We want to construct so that the number of misclassified points is minimized.

In section 3 we propose several approaches for obtaining hyperplanes satisfying (P1)–(P3), and the hyperplane lying between that minimizes the number of misclassified points. All our formulations are solvable and require the solution of at most 2n linear programs [15] plus a linear search procedure, where n is the dimension of the underlying space.

In section 4 we investigate the possibility of using ellipsoids, instead of hyperplanes, for solving the separation problem. More precisely, in the first part of our construction, which is used for predictive purposes, we want to determine two similar ellipsoids (in the sense that they have the same center and shape) , such that either all points of X lie inside and all points of Y lie outside , or all points of Y lie inside and all points of X lie outside . In either case, the intersection of the convex hulls of X and Y is contained between and . The points lying between and belong to either X or Y and are called unclassified points. The objective is to determine the ellipsoids and such that the number of unclassified points is small. The second part of our construction, which is used for pattern separation, is to find a third ellipsoid, with the same shape and center, lying between the two ellipsoids found in the first part, so that the number of misclassified points is minimized. Francois Glineur, in his master thesis [6], had a similar approach for the two ellipsoids separation problem. He claimed that his formulation is not convex and cannot be solved using SQL conic programming. Then, he used some elegant transformations to change the it into a n+1-dimensional problem using homogeneous ellipsoids, and solved the problem. In the present paper, we use a different formulation of the ellipsoids, and reformulate the problem in the first part of our construction into a semidefinite problem, which can be solved in polynomial time by interior point methods. The availability of polynomial time solvers for the resulting problem, was in fact our main motivation for using ellipsoids for the separation problem. Moreover, the availability of reliable interior point methods for semidefinite programming [21], makes this approach feasible for moderate to high dimensional problems. For the second part of our construction, we use linear search techniques to obtain the third ellipsoid, for which the number of misclassified points is minimized.

We note that an ellipsoid can be determined by a triple (A,a, ), where A is a positive semidefinite matrix, a is a vector, and is a scalar. A point x is inside if

If A has entries, and a has coordinates, then the above relation can be written as

If the matrix A is the zero matrix, then the ellipsoid reduces to a hyperplane. Thus, separation by ellipsoids is more general. The coefficients and may contain important information about our data. As mentioned before, we will choose and to be similar in the sense that and , with . Thus, separating by ellipsoids implies that a certain quadratic combination of most vectors in X is less than (or greater than) the same quadratic combination of most vectors in Y.

We should also notice that according to our constructions, the problem is always feasible. The two hyperplanes (or ellipsoids), to be used for prediction purposes, always exist, since in the worst case scenario all the points from both X and Y are between the hyperplanes (or the ellipsoids). Once we have the two hyperplanes (or the two ellipsoids), it is obvious that there is an optimal solution between them, which can be found by linear search.

We note that by using the classical support vector machine approach, we could obtain different types of separation surfaces (decision boundaries). However, we feel that it is very important to be able to give an intuitive meaning of the separation surface for predictive purposes. We choose to work on hyperplane and ellipsoid separations, not only because they can be solved efficiently by sophisticated modern optimization techniques, but also because the meaning of these separation surfaces is easier understood by doctors and biologists. A high value for a coefficient in the vector defining the separating hyperplane may imply that the corresponding feature of the patient is relevant for breast cancer diagnosis. Similarly a high value for the coefficient in the matrix defining the separating ellipsoid, may mean that the interaction between features i and j is relevant for breast cancer.

In the last section of this paper we will apply both separation by hyperplanes and ellipsoids to two breast cancer data sets. The cross-validation results for all the formulations will be given.

REFERENCES

K. P. Bennett and O. L. Mangasarian, Robust linear programming discrimination of two linearly inseparable sets, In Optimization Methods and Software 1, Gordon & Breach Science Publishers, 1992, pp. 23-34.
P. S. Bradley, Usama M. Fayyad, O. L. Mangasarian, Mathematical programming for data mining: formulations and challenges, INFORMS J. Comput., 11(3):217-238, 1999.
P.S. Bradley, O. L. Mangasarian, and W. N. Street, Feature selection via mathematical programming, INFORMS J. Comput., 10(2):209-217, 1998.
N. Cristianini and J. Shawe-Taylor, Support Vector Machines, Cambridge University Press, Cambridge, United Kingtom, 2000.
J. E. Falk and E. Lopez-Cardona, The surgical separation of sets, INFORMS J. Comput., 11:433-462, 1997.
F. Glineur, Pattern separation via ellipsoids and conic programming, Mémoire de D.E.A. (Master’s thesis), Faculté Polytechnique de Mons, Belgium, September, 1998.
Y.-J. Lee, O.L. Mangasarian, and W. H. Wolberg, Breast cancer survival and chemotherapy: a support vector machine analysis, In Discrete mathematical problems with medical applications (New Brunswick, NJ, 1999), pages 1-10. Amer. Math. Soc., Providence, RI, 2000.
O. L. Mangasarian, Misclassifi-cation minimization. J. Global Optim., 5(4):309-323, 1994.
O. L. Mangasarian, Arbitrary-norm separating plane, Oper. Res. Lett., 24(1-2):15-23, 1999.
O. L. Mangasarian, Support vector machine classification via parameterless robust linear programming, Optim. Methods Softw., 20(1):115-125, 2005.
O. L. Mangasarian, R. Setiono, and W. H. Wolberg, Pattern recognition via linear programming: theory and application to medical diagnosis, In Large-scale numerical optimization (Ithaca, NY, 1989), SIAM, Philadelphia, PA, 1990 pp. 22-31.
O. L. Mangasarian and W. H. Wolberg, Cancer diagnosis via linear programming, SIAM News, 23(5):1&18, September, 1990.
J. F. Sturm, Using SeDuMi 1.02, a MATLAB toolbox for optimization over symmetric cones, Optim. Methods Software, 11 / 12 (1-4) : 625-653, 1999. Interior point methods.
J. Ullman, Pattern Recognition Techniques, Crane, London, 1973.
R. J. Vanderbei, Linear programming: foundations and extensions, volume 4 of International Series in Operations Research & Management Science, Kluwer Academic Publishers, Boston, MA, 1996.
V. N. Vapnik, The nature of statistical learning theory, Springer-Verlag, New York, 1995.
V. N. Vapnik. Statistical learning theory, Adaptive and Learning Systems for Signal Processing, Communications, and Control. John Wiley & Sons Inc., New York, 1998. A Wiley-Interscience Publication.
W. H. Wolberg, Wisconsin breast cancer database, 1991.
W. H. Wolberg and O. L. Mangasarian, Wisconsin diagnostic breast cancer data set, 1995.
W. H. Wolberg and O. L. Mangasarian, Multisurface method of pattern separation for medical diagnosis applied to breast cytology, In Proceedings of the National Academy of Sciences, U.S.A., volume 87, December, 1990, pp. 9193-9196.
H. Wolkowicz, R. Saigal, and L. Vandenberghe, editors. Handbook of semidefinite programming, International Series in Operations Research & Management Science, 27. Kluwer Academic Publishers, Boston, MA, 2000. Theory, algorithms, and applications.