Probabilistic distance clustering

Iyigun, Cem

doi:doi:10.7282/T3JW8F81

RUcore: Rutgers University Community Repository

Search
- All
- Text
- Images
- Audio
- Video
Advanced Search | Help

Search all content in all RUcore collections.
Services
Collections

Help Contact Us My Account

Home

Resource

Probabilistic distance clustering

PDF

PDF format is widely accepted and good for printing.

Plug-in required

PDF-1(5.69 MB)

Citation & Export

View Usage Statistics

Staff View

Citation & Export
Hide

Simple citation

Iyigun, Cem. Probabilistic distance clustering. Retrieved from https://doi.org/doi:10.7282/T3JW8F81

Export

Click here for information about Citation Management Tools at Rutgers.

Statistics
Hide

Description

Uniform TitleProbabilistic distance clustering

NameIyigun, Cem (author); Prekopa, Andras (chair); Boros, Endre (internal member); Ben-Israel, Adi (dissertation committee member); CHAOVALITWONGSE, W. ART (internal member); ARAV, MARINA (outside member); Rutgers University; Graduate School - New Brunswick

Date Created2008

Other Date2008-01 (degree)

SubjectOperations Research, Cluster analysis

Extentxiii, 124 pages

DescriptionWe present a new iterative method for probabilistic clustering of data. Given clusters, their centers, and the distances of data points from these centers, the probability of cluster membership at any point is assumed inversely proportional to the distance from (the center of) the cluster in question. This assumption is our working principle.
The method is a generalization, to several centers, of the Weiszfeld method for solving the Fermat-Weber location problem. At each iteration, the distances (Euclidean, Mahalanobis, etc.) from the cluster centers are computed for all data points, and the centers are updated as convex combinations of these points, with weights determined by the above principle. Computations stop when the centers stop moving.
Progress is monitored by the joint distance function (JDF), a measure of distance from all cluster centers, that evolves during the iterations, and captures the data in its low contours.
There are problems where the cluster sizes are given (as in capacitated facility location problems) and there are problems where the cluster sizes are unknowns to be estimated. The probabilistic distance clustering approach works well in both cases. The probabilistic distance clustering method adjusted for cluster size (called PDQ method) method is described, and applied to location problems, and mixtures of distributions, where it is a viable alternative to the EM method.
The method is simple, fast (requiring a small number of cheap iterations) and insensitive to outliers.
An important issue in clustering is the "right"number of clusters that best fits a data set. The JDF is used successfully to settle this issue and determine the correct number of clusters for a given data set.

NotePh.D.

NoteIncludes bibliographical references (p. 117-122).

Genretheses, ETD doctoral

Persistent URLhttps://doi.org/doi:10.7282/T3JW8F81

LanguageEnglish

CollectionGraduate School - New Brunswick Electronic Theses and Dissertations

Organization NameNjNbRU

RightsThe author owns the copyright to this work.

Version 8.5.5

Citation & ExportHide

Simple citation

Export

StatisticsHide

Description

Citation & Export
Hide

Statistics
Hide