Product: MergeOmatic & ImportOmatic
Description: This solution provides information on the Omatic Similarity Score Algorithm
The Omatic Similarity Score Algorithm is used in MergeOmatic & ImportOmatic (version 3.0 and higher). Users can change the Scoring Weights in MergeOmatic (Configuration / Advanced) but they do not have the ability to change them in ImportOmatic (the defaults are used). Users can change the Deductions in both MergeOmatic & ImportOmatic.
Similarity Score Overview
To calculate the confidence of a match between two records (the “Similarity Score”), individual data fields are compared between the two records, with the result of each field comparison being a value somewhere between 0 and 1. Some field comparisons are exact and can only result in a score of either 0 or 1; other fields can be scored on the similarity of the values, resulting in a value anywhere between 0 and 1. To calculate the numeric similarity of two text values, an algorithm called the Levenshtein Distance is used. The resulting value is then multiplied by the Scoring Weight assigned to that field (in MergeOmatic, these weights are configurable in Configuration / Advanced). These resulting values for matched or partially matched fields are adjusted by a ‘diminishing returns’ algorithm which allows some initial matching fields to quickly escalate the similarity score but reduces the effect of more and more successive matching values, providing more granularity of score at the top end of the scale. These adjusted individual field scores are summed and divided by the Total Weight Points to provide a result in a range of 1 to 100. Some fields are not included in the “Total Weight Points” under the assumption that the presence of this data provides affirmation of a match, but the absence of it does not mean the score should be lower. Finally, configurable scoring deductions (e.g., “Remove 10 points if these records have different Suffixes”) are removed to calculate the final score.
Similarity Score Process
Partial match comparisons
For example, by default the First Name field has a Scoring Weight of 600 (Configuration / Advanced).
If the First Name is an exact match (value of 1) then it is assigned 600 points. Example: Jamie & Jamie
1 x 600 = 600
If the First Name is a partial match (value of .75) then it is assigned 450 points. Example: Jaime & Jamie
.75 x 600 = 450
Fields that are partial match comparisons: First Name, Middle Name, Last Name, Organization Name, Address, City and Email
Exact match comparisons
For example, Phone numbers (after removing all non-numeric characters) are compared only as an exact match (value of 1) or not an exact match (value of 0). By default, Phones have a Scoring Weight of 300 (Configuration / Advanced). If the Phones are an exact match (value of 1), then it is assigned 300 points.
1 x 300 = 300
Fields that are exact match comparisons: Suffix, Birth Year, Birth Month, Birth Day, State, Zip and Phone
First Name Points
Last Name Points
Total Scored Points
||450 points (partial match)
||600 points (exact match)
||300 points (exact match)
Total Scored Points = 1,350
The Total Scored Points are compared to the Total Weight Points. The Total Weight Points are calculated using the following fields:
- Individuals: First Name 600 points + Last Name 600 points + Address 400 points + City 100 points + State 55 points + ZIP 190 points + Email 755 points = 2,700 points
- Organizations: Organization Name 1,200 points + Address 400 points + City 100 points + State 55 points + ZIP 190 points + Email 755 points = 2,700 points
Middle Name, Suffix, Birth Year, Birth Month, Birth Day and Phone fields are not considered in the Total Weight Points, but they are considered in the Total Scored Points. The assumption is that the presence of this data provides affirmation of a match, but the absence of it does not mean the score should be lower.
Total Weight Points (by default) = 2,700 points
The Total Scored Points (in this case 1,350 points) are adjusted using our Diminishing Returns algorithm, then summed together and divided by the Total Weight Points (2,700 points). This creates the raw Similarity Score. As a general rule, the higher the Total Scored Points, the higher the Similarity Score.
Adjusted Total Scored Points divided by Total Weight Points = Raw Similarity Score
Using our example, 1,350 Total Scored Points corresponds to a Similarity Score of 71.
After the scoring is performed, we then subtract the Deductions which are defined in Configuration / Advanced. This gives us our final, Similarity Score:
Raw Similarity Score – Deductions = Similarity Score