Start a new topic

RegEx to Format and Abbreviate Street Names

Here is a large set of regular expressions designed to replace a street name with its standardized USPS abbreviation. (See the "Street Suffixes" and "Secondary Units" tabs). It takes an address such as "15800 Countrydrive Avenue blvd., apt. 1506" and formats it to "15800 Countrydrive Ave Blvd Apt 1506". This isn't perfect as it mistakes an intended part of the street name as an abbreviation. Hence "Avenue" becomes "Ave" and some of the meaning is lost. However if the street name includes a possible abbreviation (such as "Countrydrive") then the abbreviation is appropriately NOT applied.
Regular Expressions - Street Suffixes & Secondary Unit Designators.xlsx (UPDATED BY A LATER POST)

The list includes 202 Street Suffixes and 24 Secondary Unit Designators. The "Use Regular Expressions" check box must be painstakingly ticked for each item. The form needs a "Check All" button or link.

image


There are two regular expressions that should also be included, BLANK and PO Box. They are:

1. BLANK --> , AND \. (two match-on entries)
2. PO Box --> (?i)\b((P\.O\.)|(P\. O\.)|(PO BOX)|(POBOX)|(BOX)|(POB)|(P O BOX))\b

Hopefully someone can modify the regular expression format to restrict alterations to only the last word before the Secondary Unit Designator. That would be helpful.

1 person has this question

Hi Patrick,

That is a great list! Thanks so much for sharing with us.

Best,

Callie
Do you have one that works the other way around? Since you need to follow post office standards to do bulk mail (and you should anyway if you want to ensure delivery) it would be the other way around your clients should be more concerned about.
Hi Melissa,

One of our clients has shared a spreadsheet with regular expressions that formats the addresses as per the postal service standards.

http://www.omaticsoftware.com/Forums/tabid/108/aft/288/Default.aspx#775

We are currently working on one that would format the addresses in a more formal fashion. We will let you know as soon as that is available.

Best,

Callie

THE FILES HERE HAVE BEEN UPDATED BY A LATER POST. 

Attached is an update to the Street Abbreviations and Secondary Unit translations. There are two files: Address Abbreviations (Long) and Address Abbreviations (Short). The Long version takes 'St' to 'Street' and the Short version flips that around, 'Street' becomes 'St'.

Address Abbreviations (Long).xlsx

Address Abbreviations (Short).xlsx

Directions are now considered which means NW (among others) can translate to Northwest.

The problem mentioned in the first post has been fixed, as well as several other issues. For instance "140 St. Mary's Street Ofc. 204" now translates in long form to "140 St Mary's Street Office 204".

 

image

You'll need to create two separate dictionaries, one for the Long form and one for the Short form. Open the spreadsheet and select all of the cells with data (including TRUE) and Copy (Ctrl + C) the entire block. Next go to IOM Configuration -> Dictionaries -> click Add -> click No. Right click in Replacement Values and choose Paste.

These are the titles and descriptions I used:

Address Abbreviations (Long)
Alter Arcane Address Abbreviations. Translate address abbreviations to their long form.

Address Abbreviations (Short)
Alter Arcane Address Abbreviations. Translate address abbreviations to their short form.

These are some mixed long/short examples used in testing:

123 West Hampton Drive Unit 1506
123 W Hampton Dr Unit 1506
123 W. Hampton Dr. Unit 1506
123 W. Hampton Dr. Apt. 1506
123 nw. St. Hampton Dr., Ofc #1506
123 NorthWest. St. Hampton Circle, Office 1506
123 sw St. Hampton Circle, Office 1506
123 St Main Street Po box 124
15800 Countrydrive Avenue blvd., apt. 1506

You will probably find some limitations with with this regular expression set. If you do and can fix it, post your solution back here!

Thanks,

Patrick

Hi Patirick,

Thank you so much for these files with Reg Expression coding. They have been VERY helpful. I did run into a few problems with directions. For NE and SE, it replaced my street addres of "7200 Newton Ave South" into "7200 NEwton Ave S". Now, I know this isn't too big a deal, but is there some way I can format the Reg Expression not to capitalize NE within a word?

Also, with Court and Courts. The Reg expression set was turning my Court into Crts. I changed the coding of "Cts" to add the "s" to read this: (?i)(?
Thanks,
Christina

Can one of you explain the (?

Also, for the directions I was using:

N from (?i)(n(\s|\.)|north(\s|\.))

N from (?i)(n|north$)

Does that make sense?  That works for the NE in Newton as well since I only take ne if it's followed by a space or . or end of line.

NE from (?i)(ne(\s|\.)|northeast(\s|\.))

NE from (?i)(ne$|northeast$)

it looks like the editor took out my tags in the last one.

The first N has a space in front of it, and second N does not.  Same with the NE.

Wayne,

(?

That's what I thought, but I was confused until just now when I copy and pasted the lines from the file.

He is using (?

Thanks so much for posting this. It's incredibly timely as I've got about 15,000 records to import and ALL of the address information is abbreviated! However, we're having trouble with the RegEx for Avenue. No matter what we do, it seems to be broken. Has anyone come up with a fix for it? We had two relatively mathematical brains looking at it but couldn't figure it out.

Thanks in advance!
Kerri,

Can you provide an example address?
Sure. We're trying to convert "123 Easy Ave." to "123 Easy Avenue". When we use the RegEx list that Patrick provided above, we get an error message. The regex string for avenue in that spreadsheet is: (?i)(?
Hi Kerri,

I am sorry you are encountering a more complicated issue! In order to properly assist you with this particular issue, I will need to examine the following files: the data file, the error.txt (if applicable), and the IOM profile being used for this import. Please send these files to Support@omaticsoftware.com so that I can better assist you with your issue. When I have a resolution, I will post to this thread.

To send your IOM profile, please follow these steps:

* Go to IOM Configuration
* Highlight the name of the profile you are using
* Right click on it and choose to export
* Send the .xml file that is created
* It will be named in this fashion: IOM_Profile_Export_.xml

Thanks!
Amanda
Omatic Support

Hi Christine,

You're welcome and I am glad you have found the expressions useful! 

Below is an update to the Street Abbreviations, Secondary Unit, and Street Directions translations. As before there are two files: Address Abbreviations (Long) and Address Abbreviations (Short). The Long version takes 'St' to 'Street' and the Short version flips that around, 'Street' becomes 'St'. 

Address Abbreviations (Long).xlsx

Address Abbreviations (Short).xlsx

The directions regular expressions have been tuned to not affect letters other than followed by a space (N ), period (N.), or at the end of a line. 

image

There were two changes made: the directions issue that Christine mentioned is now fixed and the 'Court' to 'Courts' problem is gone.

As before if you find a problem or anything that could be improved upon either let us know or post your solution here.

Thanks!

Patrick

Login or Signup to post a comment