de.folt.similarity
Class LevenshteinSimilarity

java.lang.Object
  extended by de.folt.similarity.LevenshteinSimilarity

public class LevenshteinSimilarity
extends java.lang.Object

Class computes the Levenshtein distance and similarity. The main function to be used is
int levenshteinSimilarity(String sKey, String sPattern).
It returns a % value where 100 (%) means identical strings.
Code is partially based on merriampark.

Author:
Klemens Waldhör

Constructor Summary
LevenshteinSimilarity()
           
 
Method Summary
static boolean bCharValueDifference(java.lang.String source, java.lang.String match, int percent)
          bCharValueDifference returns true if the character sum difference between the two strings is > then percent given.
static int getLevenshteinDistance(java.lang.String string1, java.lang.String string2)
          getLevenshteinDistance computes the Levenshtein distance.
static int getLevenshteinDistance(java.lang.String string1, java.lang.String string2, int minPercent)
          getLevenshteinDistance computes the Levenshtein distance.
static int levenshteinSimilarity(java.lang.String compareString1, java.lang.String compareString2)
          levenSimilarity computes the Levenshtein similarity of two strings
static int levenshteinSimilarity(java.lang.String compareString1, java.lang.String compareString2, int minPercent)
          levenSimilarity computes the Levenshtein similarity of two strings The similarity in % is computed by using: percent = 100 - (dlw * 100) / maxlwlm; where dlw is Levenshtein edit distance and maxlwlm the maximum of the length of the two strings
static int levenshteinWordBasedSimilarity(java.lang.String compareString1, java.lang.String compareString2, int minPercent)
          levenshteinWordBasedSimilarity computes the Levenshtein similarity of two strings on a word basis.
static void main(java.lang.String[] args)
          Function LevenTest Description test function Parameter Type Comment Returns print test Annotation:
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

LevenshteinSimilarity

public LevenshteinSimilarity()
Method Detail

bCharValueDifference

public static boolean bCharValueDifference(java.lang.String source,
                                           java.lang.String match,
                                           int percent)
bCharValueDifference returns true if the character sum difference between the two strings is > then percent given. A very simple comparision function.

Parameters:
source - string 1
match - string 2
percent - the difference between the two strings
Returns:
true if match, false otherwise

getLevenshteinDistance

public static int getLevenshteinDistance(java.lang.String string1,
                                         java.lang.String string2)
getLevenshteinDistance computes the Levenshtein distance. The value as such does not state much, basically the edit distance between the two strings; it is suggested to use levenshteinSimilarity instead as this method returns a % value.

Parameters:
string1 - String 1
string2 - String
Returns:
the Levenshtein distance 0 = identical strings

getLevenshteinDistance

public static int getLevenshteinDistance(java.lang.String string1,
                                         java.lang.String string2,
                                         int minPercent)
getLevenshteinDistance computes the Levenshtein distance. The value as such does not state much, basically the edit distance between the two strings; it is suggested to use levenshteinSimilarity instead as this method returns a % value.

Parameters:
string1 - String 1
string2 - String
minPercent - minimul percentage to be used 100% = Strings have to be identical, -1 ignore this parameter
Returns:
the Levenshtein distance 0 = identical strings

levenshteinSimilarity

public static int levenshteinSimilarity(java.lang.String compareString1,
                                        java.lang.String compareString2)
levenSimilarity computes the Levenshtein similarity of two strings

Parameters:
compareString1 - String 1
compareString2 - String 2
Returns:
the Levenshtein similarity (100 = exact match) in %

levenshteinSimilarity

public static int levenshteinSimilarity(java.lang.String compareString1,
                                        java.lang.String compareString2,
                                        int minPercent)
levenSimilarity computes the Levenshtein similarity of two strings The similarity in % is computed by using:
 percent = 100 - (dlw * 100) / maxlwlm;
 
where dlw is Levenshtein edit distance and maxlwlm the maximum of the length of the two strings

Parameters:
compareString1 - String 1
compareString2 - String 2
minPercent - the minimum percentage to be used; can be used to optimize the similarity computations
Returns:
the Levenshtein similarity (100 = exact match) in %

levenshteinWordBasedSimilarity

public static int levenshteinWordBasedSimilarity(java.lang.String compareString1,
                                                 java.lang.String compareString2,
                                                 int minPercent)
levenshteinWordBasedSimilarity computes the Levenshtein similarity of two strings on a word basis.

Parameters:
compareString1 - String 1
compareString2 - String 2
minPercent - the minimum percentage to be used; can be used to optimize the similarity computations
Returns:
the Levenshtein similarity (100 = exact match) in %

main

public static void main(java.lang.String[] args)
Function LevenTest Description test function Parameter Type Comment Returns print test Annotation: