Start a new topic

Creating Dictionaries Using Regular Expressions

I was very excited to see the recent option to use regular expression with dictionaries in Import-O-Matic.  We strive to keep strict formatting standards on addresses in The Raiser's Edge and would spend quite a bit of time "manually" making changes to addresses in our import file.

I'm a novice with regular expressions but have found it really worth the time to build our address dictionaries.  Below are some screen shots of a few of them.  I'm wondering if other users would be willing to share some of their dictionaries?  Especially if you have written more complicated and/or efficient expressions.

image


image


image

image

image


image
















Hi Julie,

This is great work! We're excited to see people using the Regular Expression functionality in IOM. Here are a few of my favorites (though we highly recommend that each organization evaluate RegEx formulas to make sure they are a good fit. These are provided as examples only, with no warranty that they're a good fit for any particular purpose!).

This one is great if you ever have Excel eating leading zeros on ZIP codes:

image



Here's one that converts "St" or "St." to "Street", but only when the "St" is at the end of a string or followed by a comma [here's the formula: "(?i)(?

image




Finally, here are some PO Box cleanup formulas:

image


Those are "^POB(?= )|^P.O.B." and "^P.O. Box".

I hope to hear from some other users on what formulas they've developed!

-Jeff
I have also found the RegEx formulas to be very helpful. They save us a great deal of time by eliminating the need to clean up addresses before import. I am adding more formulas as we come across new things that need to be handled, but wanted to offer some of the ones I have worked on so far.

I added the Street formula that Jeff lists above because I like the way it handles addresses with a ‘saint’ in them by not replacing the St. for saint, but replacing it for street.

I created a number of other formulas using the street formula as a base to handle other common address abbreviations such as Road, Lane, Boulevard, etc. To do so I just replaced the ‘St.’ in the formula with the appropriate abbreviation. So, here is a list of the ones I set up. To save space I am not going to insert screen shots for each, but am just listing the ‘Replacement value’ before the hyphen and then the formula for the ‘Values to match on’ after the hyphen:

 

Road - (?i)(?<= )(Rd\.$|Rd$|Rd.(?=,)|Rd(?=,)) 
Drive - (?i)(?<= )(Dr\.$|Dr$|Dr.(?=,)|Dr(?=,)) 
Circle - (?i)(?<= )(Cir\.$|Cir$|Cir.(?=,)|Cir(?=,)) 
Lane - (?i)(?<= )(Ln\.$|Ln$|Ln.(?=,)|Ln(?=,)) 
Avenue- (?i)(?<= )(Ave\.$|Ave$|Ave.(?=,)|Ave(?=,)) 
Parkway -(?i)(?<= )(pkwy\.$|pkwy$|pkwy.(?=,)|pkwy(?=,)) 
Boulevard - (?i)(?<= )(Blvd\.$|Blvd$|Blvd.(?=,)|Blvd(?=,)) 

 



Formulas Created for Apartments and Units in Address Line:

I did come across one issue with these formulas because they only update the abbreviations if they are at the very end of the line or if they are followed by a comma.

For instance, the formula will correctly spell out street in these situations:

1 St. Mary St.
1 St. Mary St., Apartment 1

But it will not update street in this situation since the St. is not at the end of the line or followed by a comma:

1 St. Mary St. Apartment 1

To handle this, as well as some other cleanup I identified around apartments being listed in the same line, I created a new ‘Replacement Value’ and corresponding ‘Values to match on’. Instead of stringing them all together as they are in Jeff’s examples, I have listed them individually like Julie did. I find this to be a little easier to keep them straight and make changes as necessary, but that's just a personal preference. So here is how I set up the formulas to spell out both street and apartment with a comma in between if they are abbreviated:

Replacement Value:

Street, Apartment

Values to match on:

 

(?i)(?<= )(st\. Apt.) 
(?i)(?<= )(st\. Apartment) 
(?i)(?<= )(street Apartment) 
(?i)(?<= )(street Apt.) 
(?i)(?<= )(street, Apt.) 
(?i)(?<= )(st Apartment) 
(?i)(?<= )(st Apt.) 

 

I came across the same issues with units being listed in the address line, so I created these formulas for unit:

Replacement Value:

Street, Unit

Values to match on:

 

(?i)(?<= )(st\. Unit) 
(?i)(?<= )(street Unit) 
(?i)(?<= )(st Unit) 
(?i)(?<= )(st, Unit) 

 

I then created corresponding formulas for road, apartment; lane, apartment, etc. by replacing the street values with the appropriate abbreviations like I did for my first examples.



Formulas Created for Apartment on Its Own Line:

I also came across situations where apartments were listed in the second line of the address and started that line. For example, I have addresses like this:

1 St. Mary St.
Apt. 1

I created the following formulas to handle updating the ‘Apt.’ in that case (the street in the first line will be handled by the street example Jeff listed above):

Replacement Value:

Apartment

Values to match on:

(?i)(? ^apt\.
^apt
^Apt\.
^Apt




Formulas Created for P.O. Boxes:

We handle P.O. Box a little differently than the formula that Jeff listed above because we include the periods after P and O so I updated it slightly and added some more ‘Values to match on’. Here is how I set up our P.O. Box formulas:

Replacement Value:

P.O. Box

Values to match on:

^POB(?=)
^P.O.B
^PO Box
^Po Box
^po box
^po Box
^box



Help on Formulas to Handle # in Address Line:

One thing that I am struggling with is when there is a # in an address line. For example, I have addresses that look like this:

1 Main Street, #4

I would like to replace the # with Apartment so that the address would appear as:

1 Main Street, Apartment 4

However, when I try to replace the # with Apartment I end up with the apartment number right after the word apartment this because there is no space between the # and the number, so it looks like this:

1 Main Street, Apartment4

If I set up my replacement value with a space following ‘Apartment’ I end up getting an error message when I save and close the dictionary. Any ideas of how to insert a space between Apartment and the number or how to create a formula to handle this?


So that is where I am so far. I also use the zip code example that Jeff lists above. This is just a start, I know there are many more ways that I will be using this, so I will provide updates as I find more uses. Hopefully this is helpful, but let me know if you have any questions about what I have posted. As Jeff notes in his post, these formulas might no be right for everyone depending on how you format addresses so make sure you test them with the dictonary test before you implement them.

- Nick
Thanks so much for sharing, Nick.  I am sure that everyone will find these regular expressions very useful.

Callie
Has anyone else had trouble using the regex for the leading zeros problem? I set it up exactly like the screenshot, but it hasn't worked for me when I run the test. Any ideas what I'm doing wrong? I am using version 1.8.6.4. Thanks, Honey
Hi Honey,

With the replacement value selected, did you check the box at the bottom that says "Use Regular Expressions"?

Jeff
Yes, when the replacement value is selected, I checked the Use Regular Expressions box and it added the RegEx symbol to the replacement value, but the test doesn't work for some reason.
Hi Honey,

I don't know, that should do it. Can you post a screenshot of how you have it set up? Or copy and paste the replacement and matching values in a post? Just to make sure there is nothing weird going on, I would recommend creating a brand new test dictionary, then adding it in there fresh, making sure the RegEx box is checked, save it, reopen it and test it and see if it makes a difference.

Jeff

Hi there,
Wondering if anyone has come across this problem.
We have both "Road" and "Boulevard" in a dictionary of street types as below.

 

(?i)(?<=)(rd\.$|rd$|rd(?=,)|rd(?=,))
(?i)(?<=)(RD\.$|RD$|RD(?=,)|RD(?=,))
(?i)(?<=)(rd.\.$|rd.$|rd.(?=,)|rd.(?=,))
(?i)(?<=)(road\.$|road$|road(?=,)|road(?=,))
(?i)(?<=)(ROAD\.$|ROAD$|ROAD(?=,)|ROAD(?=,))
(?i)(?<=)(Rd\.$|Rd$|Rd(?=,)|Rd(?=,))
(?i)(?<=)(Rd.\.$|Rd.$|Rd.(?=,)|Rd.(?=,))
(?i)(?<=)(RD.\.$|RD.$|RD.(?=,)|RD.(?=,))

(?i)(?<=)(Blvd\.$|Blvd$|Blvd.(?=,)|Blvd(?=,))
(?i)(?<=)(BOULEVARD\.$|BOULEVARD$|BOULEVARD(?=,)|BOULEVARD(?=,))
(?i)(?<=)(boulevard\.$|boulevard$|boulevard(?=,)|boulevard(?=,))
(?i)(?<=)(Blvd.\.$|Blvd.$|Blvd.(?=,)|Blvd.(?=,))
(?i)(?<=)(Blvd\.$|Blvd$|Blvd(?=,)|Blvd(?=,))
(?i)(?<=)(bvd.\.$|bvd.$|bvd.(?=,)|bvd.(?=,))
(?i)(?<=)(bvd\.$|bvd$|bvd(?=,)|bvd(?=,))
(?i)(?<=)(blvd\.$|blvd$|blvd(?=,)|blvd(?=,))
(?i)(?<=)(blvd.\.$|blvd.$|blvd.(?=,)|blvd.(?=,))
(?i)(?<=)(BLVD\.$|BLVD$|BLVD(?=,)|BLVD(?=,))
(?i)(?<=)(BLVD.\.$|BLVD.$|BLVD.(?=,)|BLVD.(?=,))
(?i)(?<=)(BVD.\.$|BVD.$|BVD.(?=,)|BVD.(?=,))
(?i)(?<=)(BVD\.$|BVD$|BVD(?=,)|BVD(?=,))

 


When I test boulevard the result is BoulevaRoad


If I remove the Road replacement value completely from the dictionary then it works so there is something with having both Road and Boulevard in the same dictionary.

Am at a loss on how to fix this. Any ideas guys?
Working the weekend Paul? :)

I can't tell for sure because I'm viewing it from my phone, but it looks like you do not have a space before your first "RD" in each line. Therefore it will match an "4#" anywhere in a string. You'll need to do the same for "blvd". Also, if memory serves, you should not have to create an upper and lower case version of each line because the "(?i)" at the beginning of each line should be telling it to be case insensitive.
Fantastic thank you! That fixed it.
Paul-

I tried using your regular expressions for our addresses, and I found that the abbreviations with the period on the end combined with the word before it (i.e. 4 Main St.-->4 MainStreet)  whereas the abbreviations without the period were fine (i.e. 4 Main St--> 4 Main Street).

These are the RegExs I have setup:

 

(?i)(?<=)( St\.$|St$|St.(?=,)|St(?=,))
(?i)(?<=)( street\.$|street$|street(?=,)|street(?=,))
(?i)(?<=)( st.\.$|st.$|st.(?=,)|st.(?=,))
(?i)(?<=)( St.\.$|St.$|St.(?=,)|St.(?=,))
(?i)(?<=)( st\.$|st$|st(?=,)|st(?=,))

 

Any guidance you can give would be greatly appreciated.

Thanks!

Madeline
Hi Amy,

You are essentially correct. I've attached a screenshot that shows the buttons you would click. You must first add a "replacement value" (1) then press the "+" button (2) to add it. When you have a replacement value you want to add more "values to match on" to, select the replacement value from the list (3), enter the additional "value to match on" (4), and press the "+" button (5). If the dictionary is to use Regular Expression pattern matching, be sure to check the "Use Regular Expression" checkbox (6). You can test your regular expression by selecting the Replacement value you wish to test (3) and clicking the "Test" button (7). If you want to import a lot of dictionary entries all at once, you can copy two columns of data from Excel (replacement column and value to match column) then right-click on the list of "values to match on" and select the option "Paste values from Excel".

image


By the way, to embed screenshots in posts here on the forum, click on the "reply" button to the post you are replying to rather than using the "quick reply" box at the bottom of the thread. That will take you to the wysiwyg post editor which has an "attachments" option at the bottom. If you're asking about specific formulas though, might be good to just copy and paste the text into the post so other users can try it without retyping it/keying errors.

I used Nick's formula for PKWY in the original post on this thread and that seemed to do the trick (you should always evaluate formulas for the specific needs of your organization, of course!). Here's that one:

(?i)(?  
(?i)(?<= )(pkwy\.$|pkwy$|pkwy.(?=,)|pkwy(?=,))

 

-Jeff

Hi Madeline,

In all your formulas it looks like you are missing a space (highlighted in yellow below) and need to delete a space (highlighted in purple below):

 

(?i)(?<= )( st\.$|st$|st.(?=,)|st(?=,))

 

 

Your updated formula should look something like this:

 

(?i)(?<= )(st\.$|st$|st.(?=,)|st(?=,))

 

This is just an example, you should always evaluate formulas for the specific needs of your organization, of course!

Let me know if that doesn't work.

Jeff
Hi Amy,

I copied your values, put them into Excel, highlighted and copied both columns, then created a new dictionary, right-clicked in the "values to replace" box on the right and selected the menu option to paste the values from Excel. It worked like a charm for me and I was able to save it no problem. (I did have to go through and manually check each of them as "Use regular expression" - may need to see if we can find a solution to that in the future.)

I've attached the csv file I opened in Excel. Please give it a shot and let us know what you find.

Thanks,
Jeff
zip
(868 Bytes)
Login or Signup to post a comment