DISTANCE_MEASURE
The DISTANCE_MEASURE function computes the pairwise distance between a set of items or observations. The DISTANCE_MEASURE function is designed to be used with the CLUSTER_TREE function.
This routine is written in the IDL language. Its source code can be found in the file distance_measure.pro in the lib subdirectory of the IDL distribution.
Syntax
Result = DISTANCE_MEASURE( Array [, /DOUBLE] [, /MATRIX] [, MEASURE=value] [, POWER_MEASURE=value] )
Return Value
The Result is a vector of m*(m1)/2 elements containing the distance matrix in compact form. Given a distance between two items, D_{i}_{, }_{j}, the distances within Result are returned in the order: [D_{0, 1}, D_{0, 2}, ..., D_{0, }_{m}_{1}, D_{1, 2}, ..., D_{m}_{2, }_{m1}].
If MATRIX is set, then the Result is an mbym symmetric array containing the full distance matrix, with zeroes down the diagonal.
Arguments
Array
An nbym array representing the coordinates (in an ndimensional space) of m items. For example, a set of m points in a twodimensional Cartesian space would be passed in as a 2bym array.
Keywords
DOUBLE
Set this keyword to perform computations using doubleprecision arithmetic and to return a doubleprecision result. Set DOUBLE=0 to use singleprecision for computations and to return a singleprecision result. The default is /DOUBLE if Array is double precision, otherwise the default is DOUBLE=0.
MATRIX
Set this keyword to return the distance matrix as an mbym symmetric array. If this keyword is not set then the distance matrix is returned in compact vector form.
MEASURE
Set this keyword to an integer giving the distance measure (the metric) to use. Possible values are:
Value

Distance

0

(Default): Euclidean distance. The Euclidean distance is defined as:

1

CityBlock (Manhattan) distance. The CityBlock distance is defined as:

2

Chebyshev distance. The Chebyshev distance is defined as:

3

Correlative distance. The correlative distance, where r is the correlation coefficient between two items, is defined as:

4

Percent disagreement. This distance is useful for categorical data and is defined as:
(Number of x_{i} ≠ y_{i})/n

Note: This keyword is ignored if POWER_MEASURE is set.
POWER_MEASURE
Set this keyword to a scalar or a twoelement vector giving the parameters p and r to be used in the power distance, defined as:
If POWER_MEASURE is a scalar then the same value is used for both p and r (this is also known as the Minkowski distance).
Note: POWER_MEASURE=1 is the same as the CityBlock distance, while POWER_MEASURE=2 is the same as Euclidean distance.
Example
DATA = [ $
[1, 1], $
[1, 3], $
[2, 2.2], $
[4, 1.75], $
[4, 4], $
[5, 1], $
[5.5, 3]]
DISTANCE = DISTANCE_MEASURE(data)
i1 = [0,0,0,0,0,0, 1,1,1,1,1, 2,2,2,2, 3,3,3, 4,4, 5]
i2 = [1,2,3,4,5,6, 2,3,4,5,6, 3,4,5,6, 4,5,6, 5,6, 6]
PRINT, 'Item# Item# Distance'
PRINT, TRANSPOSE([[i1],[i2],[distance]]), $
FORMAT = '(I3, I7, F10.2)'
PLOT, data[0,*], data[1,*], PSYM = 6, SYMSIZE = 2, $
XRANGE = [0,6], YRANGE = [0,5], $
TITLE='Distance between each point'
FOR i = 0,N_ELEMENTS(distance)1 DO $
PLOTS, data[*, [i1[i], i2[i]]], linestyle = 1
AVG = 0.5*(data[*, i1] + data[*, i2])
XYOUTS, avg[0,*], avg[1,*], ALIGN = 0.5, $
STRTRIM(STRING(distance, format = '(F7.2)'),2)
When this code is run, IDL prints:
Item# Item# Distance
0 1 2.00
0 2 1.56
0 3 3.09
0 4 4.24
0 5 4.00
0 6 4.92
1 2 1.28
1 3 3.25
1 4 3.16
1 5 4.47
1 6 4.50
2 3 2.05
2 4 2.69
2 5 3.23
2 6 3.59
3 4 2.25
3 5 1.25
3 6 1.95
4 5 3.16
4 6 1.80
5 6 2.06
Version History
See Also
CLUSTER_TREE, DENDROGRAM