Start a new topic

RegEx Dictionary Removing # after Apt. Text

We have been using a RegEx dictionary for our addresses...I believe it was copied from a post/posts in this forum many years ago.  Lately it's been removing the first digit that comes after "apartment" in the address line.  It's doing this:


Incoming Address:  123 Any Road., Apt.101

IOM reads address as:  123 Any Road., Apt.01


In this address the apartment number comes immediately after the "." - there are no spaces.  (e.g., "Apt.101" vs "Apt. 101")  If a space was introduced after the "." then the expression would remove the blank space, moving the apartment number to the left.  (e.g., from "Apt. 101" to "Apt.101")  


To see an example of this and the RegEx dictionary entry which may be causing the error, please see the images below.  (Specifically, Omatic_Long Dictionary_Ex1.)  


I noticed the error only occurred while accessing the Dictionary Test window and selecting Test: Whole Dictionary.  Choosing the Test: Selected Item option did not result in a positive hit (of the error) for any entry in the entire dictionary.  (See image Omatic_Long Dictionary_Ex2).


Thank you so much in advance, for your help.


Is there another entry in the dictionary that has a value to search on as "Apt" (separate from the entries in the Road replacement value?


Hi Matt,


In a regular expression, a "." is considered "any character". To enforce the "." to be considered a period, you can enter a "\" in front of it. 


For example, you can replace (?i)(?<= )(road Apt.) with  (?i)(?<= )(road Apt\.) 


Try updating the values that have a period, and let me know if that resolves your issue.


Best,


Kristen Gajdica

Associate Product Manager

Barbara - yes, we have  a couple other entries in the dictionary that search for the value "Apt." These are Street, Boulevard, and Avenue.  The Values to Match On string in each of these has been coded the same as described above (but "Road" is replaced with "Avenue," etc.) and I encounter the same issue with these variants as well.

Hi Kristen,


Thanks for the explanation of the "." in the expression!  That kind of makes sense as to why it was removing the character.  


Implementing your suggestion works!  The apartment number was no longer removed.  However, when I tested the new strings against the Whole Dictionary, the address format was not updated to match the expression. (An example of the expected results is below.  Bold text identifies words that were not updated in the computed address.)  Strangely, when I tested the address against the Selected String, the output was correct. 


For example:


Incoming Address:         123 Any Rd Apt.101

IOM reads address as:  123 Any Rd Apt101


Whereas the output should read: 123 Any Road, Apt. 101.


But, perhaps I am wanting to do too much through these RegEx's.  The addresses we're importing are supplied through a vendor, and lately their formatting has been inconsistent.  (i.e., some apartment values are submitted as: "Apt. 101" and others: "Apt.101".)  And that's what has been causing issues with the import.   Maybe it would be easier for me to update the source file, what do you think?

Hi Matt,


The dictionary I am using to test is slightly different than yours, however, it does handle both of those scenarios:


If you'd like to export your dictionary to excel and attach it here, I'd be happy to take a look! It might be another field that is impacting this one. However, if you are able to update the source file, that works too!


Best,

Kristen

Very interesting...a dictionary that could handle these irregularities would be amazing.  So, I will take you up on your very generous offer!  


The doc is attached, let me know what you find.  I'm very much new at regex and appreciate the help.

xlsx

Hi Matt,

It looks like you just need another line that handles the scenario of "rd apt" coming in without any punctuation:



Best,

Kristen

Matt - I just updated my previous response with the image. It didn't go through the first time. Best,

Kristen

Login or Signup to post a comment