Clustering by graph density variation analysis (GDVA) with density-based cluster validity indices (DVI)

Le, Thang Viet

doi:doi:10.7282/T3610ZD8

RUcore: Rutgers University Community Repository

Search
- All
- Text
- Images
- Audio
- Video
Advanced Search | Help

Search all content in all RUcore collections.
Services
Collections

Help Contact Us My Account

Home

Resource

Clustering by graph density variation analysis (GDVA) with density-based cluster validity indices (DVI)

PDF

PDF format is widely accepted and good for printing.

Plug-in required

PDF-1(1.74 MB)

Citation & Export

View Usage Statistics

Staff View

Citation & Export
Hide

Simple citation

Le, Thang Viet. Clustering by graph density variation analysis (GDVA) with density-based cluster validity indices (DVI). Retrieved from https://doi.org/doi:10.7282/T3610ZD8

Export

Click here for information about Citation Management Tools at Rutgers.

Statistics
Hide

Description

TitleClustering by graph density variation analysis (GDVA) with density-based cluster validity indices (DVI)

NameLe, Thang Viet (author); Kulikowski, Casimir (chair); Elgammal, Ahmed (internal member); Pavlovic, Vladimir (internal member); Statnikov, Alexander (outside member); Rutgers University; Graduate School - New Brunswick

Date Created2011

Other Date2011-10 (degree)

SubjectComputer Science, Cluster analysis—Computer programs

Extentx, 112 p. : ill.

DescriptionCluster analysis is an important problem in data mining and machine learning. In reality, clustering has been a useful exploratory technique for many different applications such as image segmentation, document grouping, gene expression data analysis, paralog/ortholog detection, etc. Typically, clustering algorithms seek to partition a data set into subsets such that data objects in each subset are very similar, while data objects from different subsets are not so similar. Graph-based clustering methods partition proximity graphs, where nodes represent objects and edge weights indicate pairwise similarities, into subgraphs such that nodes of each subgraph are strongly connected, while nodes of different subgraphs are weakly connected. This thesis presents a computationally efficient clustering method based on graph density variation analysis, where the density of an object with respect to a set of objects is defined as its average similarity to the objects of the set. The method partitions a proximity graph using the assumption that each cluster has a core region composed of high-density nodes which are strongly interconnected by edges that have a high and comparable weight. Cores of clusters are identified by extracting nodes from the core regions, and then expanded by assigning non-core nodes to the most similar cluster. Novel density-based cluster validity indices are incorporated to measure the quality of clustering solutions so that parameter values can be determined automatically. For a direct and objective evaluation and comparison with other graph clustering techniques, random graphs are generated as input data and the results of clustering are matched against a known ground truth. The method is robust to noise and has a low complexity with two main parameters which can be used to adjust the granularity of results. Experiments show that the method is less dependent on parameters if clusters in the data are well-separated or have a dense cluster core. Good results on heterogeneous data such as multi-dimensional data, images, text and gene expression data demonstrate the effectiveness and versatility of the method.

NotePh.D.

NoteIncludes bibliographical references

Noteby Thang Viet Le

Genretheses, ETD doctoral

Persistent URLhttps://doi.org/doi:10.7282/T3610ZD8

Languageeng

CollectionGraduate School - New Brunswick Electronic Theses and Dissertations

Organization NameRutgers, The State University of New Jersey

RightsThe author owns the copyright to this work.

Version 8.5.5

Citation & ExportHide

Simple citation

Export

StatisticsHide

Description

Citation & Export
Hide

Statistics
Hide