labm8.text

Text utilities.

exception labm8.text.Error

Module-level error.

exception labm8.text.TruncateError

Thrown in case of truncation error.

labm8.text.diff(s1, s2)

Return a normalised Levenshtein distance between two strings.

Distance is normalised by dividing the Levenshtein distance of the two strings by the max(len(s1), len(s2)).

Examples

>>> text.diff("foo", "foo")
0
>>> text.diff("foo", "fooo")
1
>>> text.diff("foo", "")
1
>>> text.diff("1234", "1 34")
1
Parameters:
  • s1 (str) – Argument A.
  • s2 (str) – Argument B.
Returns:

Normalised distance between the two strings.

Return type:

float

labm8.text.get_substring_idxs(substr, string)

Return a list of indexes of substr. If substr not found, list is empty.

Parameters:
  • substr (str) – Substring to match.
  • string (str) – String to match in.
Returns:

Start indices of substr.

Return type:

list of int

labm8.text.levenshtein(s1, s2)

Return the Levenshtein distance between two strings.

Implementation of Levenshtein distance, one of a family of edit distance metrics.

Based on: https://en.wikibooks.org/wiki/Algorithm_Implementation/Strings/Levenshtein_distance#Python

Examples

>>> text.levensthein("foo", "foo")
0
>>> text.levensthein("foo", "fooo")
1
>>> text.levensthein("foo", "")
3
>>> text.levensthein("1234", "1 34")
1
Parameters:
  • s1 (str) – Argument A.
  • s2 (str) – Argument B.
Returns:

Levenshtein distance between the two strings.

Return type:

int

labm8.text.truncate(string, maxchar)

Truncate a string to a maximum number of characters.

If the string is longer than maxchar, then remove excess characters and append an ellipses.

Parameters:
  • string (str) – String to truncate.
  • maxchar (int) – Maximum length of string in characters. Must be >= 4.
Returns:

Of length <= maxchar.

Return type:

str

Raises:

TruncateError – In case of an error.