Locally linear embedding algorithm : extensions and applications
Kayo, Olga (2006-04-25)
Raw data sets taken with various capturing devices are usually multidimensional and need to be preprocessed before applying subsequent operations, such as clustering, classification, outlier detection, noise filtering etc. One of the steps of data preprocessing is dimensionality reduction. It has been developed with an aim to reduce or eliminate information bearing secondary importance, and retain or highlight meaningful information while reducing the dimensionality of data.
Since the nature of real-world data is often nonlinear, linear dimensionality reduction techniques, such as principal component analysis (PCA), fail to preserve a structure and relationships in a highdimensional space when data are mapped into a low-dimensional space. This means that nonlinear dimensionality reduction methods are in demand in this case. Among them is a method called locally linear embedding (LLE), which is the focus of this thesis. Its main attractive characteristics are few free parameters to be set and a non-iterative solution avoiding the convergence to a local minimum. In this thesis, several extensions to the conventional LLE are proposed, which aid us to overcome some limitations of the algorithm. The study presents a comparison between LLE and three nonlinear dimensionality reduction techniques (isometric feature mapping (Isomap), self-organizing map (SOM) and fast manifold learning based on Riemannian normal coordinates (S-LogMap) applied to manifold learning. This comparison is of interest, since all of the listed methods reduce high-dimensional data in different ways, and it is worth knowing for which case a particular method outperforms others.
A number of applications of dimensionality reduction techniques exist in data mining. One of them is visualization of high-dimensional data sets. The main goal of data visualization is to find a one, two or three-dimensional descriptive data projection, which captures and highlights important knowledge about data while eliminating the information loss. This process helps people to explore and understand the data structure that facilitates the choice of a proper method for the data analysis, e.g., selecting simple or complex classifier etc. The application of LLE for visualization is described in this research.
The benefits of dimensionality reduction are commonly used in obtaining compact data representation before applying a classifier. In this case, the main goal is to obtain a low-dimensional data representation, which possesses good class separability. For this purpose, a supervised variant of LLE (SLLE) is proposed in this thesis.
- Avoin saatavuus