Start a new topic

RegEx Dictionary Removing # after Apt. Text

We have been using a RegEx dictionary for our addresses...I believe it was copied from a post/posts in this forum many years ago.  Lately it's been removing the first digit that comes after "apartment" in the address line.  It's doing this:


Incoming Address:  123 Any Road., Apt.101

IOM reads address as:  123 Any Road., Apt.01


In this address the apartment number comes immediately after the "." - there are no spaces.  (e.g., "Apt.101" vs "Apt. 101")  If a space was introduced after the "." then the expression would remove the blank space, moving the apartment number to the left.  (e.g., from "Apt. 101" to "Apt.101")  


To see an example of this and the RegEx dictionary entry which may be causing the error, please see the images below.  (Specifically, Omatic_Long Dictionary_Ex1.)  


I noticed the error only occurred while accessing the Dictionary Test window and selecting Test: Whole Dictionary.  Choosing the Test: Selected Item option did not result in a positive hit (of the error) for any entry in the entire dictionary.  (See image Omatic_Long Dictionary_Ex2).


Thank you so much in advance, for your help.


Is there another entry in the dictionary that has a value to search on as "Apt" (separate from the entries in the Road replacement value?


Hi Matt,


In a regular expression, a "." is considered "any character". To enforce the "." to be considered a period, you can enter a "\" in front of it. 


For example, you can replace (?i)(?<= )(road Apt.) with  (?i)(?<= )(road Apt\.) 


Try updating the values that have a period, and let me know if that resolves your issue.


Best,


Kristen Gajdica

Associate Product Manager

Barbara - yes, we have  a couple other entries in the dictionary that search for the value "Apt." These are Street, Boulevard, and Avenue.  The Values to Match On string in each of these has been coded the same as described above (but "Road" is replaced with "Avenue," etc.) and I encounter the same issue with these variants as well.

Hi Kristen,


Thanks for the explanation of the "." in the expression!  That kind of makes sense as to why it was removing the character.  


Implementing your suggestion works!  The apartment number was no longer removed.  However, when I tested the new strings against the Whole Dictionary, the address format was not updated to match the expression. (An example of the expected results is below.  Bold text identifies words that were not updated in the computed address.)  Strangely, when I tested the address against the Selected String, the output was correct. 


For example:


Incoming Address:         123 Any Rd Apt.101

IOM reads address as:  123 Any Rd Apt101


Whereas the output should read: 123 Any Road, Apt. 101.


But, perhaps I am wanting to do too much through these RegEx's.  The addresses we're importing are supplied through a vendor, and lately their formatting has been inconsistent.  (i.e., some apartment values are submitted as: "Apt. 101" and others: "Apt.101".)  And that's what has been causing issues with the import.   Maybe it would be easier for me to update the source file, what do you think?

Hi Matt,


The dictionary I am using to test is slightly different than yours, however, it does handle both of those scenarios:


If you'd like to export your dictionary to excel and attach it here, I'd be happy to take a look! It might be another field that is impacting this one. However, if you are able to update the source file, that works too!


Best,

Kristen

Very interesting...a dictionary that could handle these irregularities would be amazing.  So, I will take you up on your very generous offer!  


The doc is attached, let me know what you find.  I'm very much new at regex and appreciate the help.

xlsx

Hi Matt,

It looks like you just need another line that handles the scenario of "rd apt" coming in without any punctuation:



Best,

Kristen

Matt - I just updated my previous response with the image. It didn't go through the first time. Best,

Kristen

Kristen - Your suggestions fixed several road type and apartment combinations!  Thank you!  


There is one more that I can't figure out, though: it's when the road type is spelled out and there is no comma between the road and apartment, like so:


Example 1

123 Any Road Apt.101


The regex for this combination is (?i)(?<= )(road Apt.).  And it translates to: 



123 Any Road, Apt.01.  (Here the issue is the first apt digit.)


Using your suggestion, I updated the regex to: (?i)(?<= )(road Apt\.).  Unfortunately this gave me:


Example 2

123 Any Road Apt101


It did not update the format to match the regex value (Road, Apt.).  Seems like it removed the "." from Apt and passed the remaining values through.  


I found another entry for "road" in the dictionary.  Is this causing the issue? Here's the line:

(?i)(?<= )(road\.?$|road\.?(?=,)|road\.?(?= #)|rd\.?$|rd\.?(?=,)|rd\.?(?= #))


Attached are a couple screen shots of the dictionary entry and tests for examples 1 and 2. 

Hi Matt,


The other dictionary isn't impacting this, which you can tell by testing the specific dictionary, and selecting "Selected Item" instead of "whole dictionary".


The replacement value of "Road, Apt." doesn't include the addition of a space following the period. In order to account for addresses that need a space added, you would need to have another replacement value of "Road, Apt. ". Then the values to match on would need to be split out depending on the ones that need spaces and the ones that don't, and you would need updated code to determine if a space is needed.


I think this type of cleanup might be rare enough, that it's worth just creating a cleanup query to check once a month, instead of updating the dictionary.


Ok, thanks for your help, Kristen!

Login or Signup to post a comment