SimHashUtils
class SimHashUtils
SimHashUtils contains a set of static methods to serialize and normalize person names and titles. The function getSimHash1 calculates the interhash of a Rousource.
Constants
SINGLE_LETTER |
|
DEFAULT_LAST_FIRST_NAMES |
By default, all author and editor names are in "Last, First" order
|
PERSON_NAME_DELIMITER |
the delimiter used for separating person names
|
FIRSTNAME_LASTNAME_DELIMITER |
the delimiter used for separating first and last name
|
Methods
No description
No description
No description
No description
No description
Normalizes a collection of persons by normalizing their names and sorting them.
Used for "sloppy" hashes, i.e., the inter hash.
No description
Details
at line 135
static
getSimHash1(Resource $resource)
at line 144
static
getSimHash2(Resource $resource)
at line 162
static string|null
serializePersonNames(array $persons, bool $lastFirstNames = self::DEFAULT_LAST_FIRST_NAMES, string $delimiter = self::PERSON_NAME_DELIMITER)
at line 186
static null|string
serializePersonName(Person $person, bool $lastFirstName)
at line 220
static string
getNormalizedTitle(string $string)
at line 234
static string
getNormalizedPersons(array|ArrayList $persons)
at line 247
static string
normalizePersonList(array|ArrayList $persons)
Normalizes a collection of persons by normalizing their names and sorting them.
at line 282
static string
normalizePerson(Person $person)
Used for "sloppy" hashes, i.e., the inter hash.
The person name is normalized according to the following scheme: x.last, where x is the first letter of the first name and last is the last name.
Example:
Donald E. Knuth --> d.knuth D.E. Knuth --> d.knuth Donald Knuth --> d.knuth Knuth --> knuth Knuth, Donald --> d.knuth Knuth, Donald E. --> d.knuth Maarten de Rijke --> m.rijke Balby Marinho, Leandro--> l.marinho