Start a new topic

RegEx 'Catch-All' Formula

Was hoping someone help think through the construct of a data dictionary and a regex statement.  I've searched the forums and not found anything that matches this scenario exactly (or else I'm having a hard time translating solutions in the forum to my particular need)


Example

I want to track A (Apples) and C (Celery). My data has rows with Apples, Celery and tons of other varieties of fruits/vegetables.  I want to avoid building a regex statement that lists every single type of fruit or vegetable known for 'Other'.  I want to basically say the following:

A=Apple

C=Celery

(RegEx)=If not A or C, then 'Other' for any other possible value


Many thanks, hope I've given enough information for someone to be helpful!



Thanks Wayne! I actually poured over this thread, but either coudn't conceptualize how to modify some of the suggested formulas or they didn't seem to accomplish what I was looking for. In some testing, I was only able to get it to replace (using my example above) EVERYTHING with 'Other'. So, if there was an 'A' in my data, it would 'skip' over the dictionary entry to replace 'A' with 'Apple' and instead rely on the RegEx expression and replace 'A' with 'Other'.

This is the salient part of the thread for you I think


I have come across another way to get the elusive "match anything except X" to work by using the negative lookahead function.

The expression looks like:^(?!(regex)$).* where regex is any valid regular expression.

For example, if you wanted to make anything that was not "Male" or "Female" become "Unknown" you can use:
Replacement Value: Unknown
Value to Match: ^(?!(Male|Female)$).*
Many thanks, Wayne. You were right!

It wasn't working for me, initially, because I was including the original value, not the replacement value in the RegEx statement.

Many thanks for your help, and have a great weekend.

I have used this (thanks!) and it works. BUT...


Would it be possible to use it for 2 values?


I have 2 replacement values (2014 Dinner and 1415CONFERENCE). They I am trying to add a default, so that anything that doesn't get replaved by one or the other is replacd by OTHER.


The following does NOT work: /^((?!.*Dinner.*|.*CONFERENCE.*).)*$/


Basically I am wondering if I've messed up the logic of the "either" when using a negative look-ahead.

I've found that as long as I don't use regex for the terms I want to exclude, the formula works.

^((?!(EVENTS|OTHER)).)*$ works, for instance, if I need to match on anything aside from EVENTS or OTHER. But I would love to be able to regex those two terms.
Dictionaries work in order from top to bottom.

Make sure that the "everything else" option is last.
Login or Signup to post a comment