Start a new topic

RegEx to parse middle name from a First/Middle field?

One of our data sets combines the first and middle fields into one column. This means that we have values in that field like "Mary", "Mary Catherine", and "Mary Ann Catherine". I've been able to find Regular Expression code to parse out the first name or first two names (in the case of three-word strings), but I'm having trouble returning a blank value when the string doesn't have a middle name at all (basically, no space in the string, like "Mary").

This is what I have for two- and three-word strings, respectively:

$3    (\w.*)(\s)(\w.*)

$5    (\w.*)(\s)(\w.*)(\s)(\w.*)

Any idea how to handle a one-word string ("Mary")? I want the replacement value in that situation to be blank.

1 Comment

Bethany, I personally would not use regex for this - if for no other reason I cannot guarantee the quality or consistency of the incoming data.  Here's the general steps I would use to write a bit of code and probably put it in the beforedictionary section of the omatic api.

1. Remove all invalid char's from the string (ex. double spaces, non-ascii, etc)

2. Split it into separate pieces (ex. name_array=split(full_name," ") )

3. check each piece to find any titles (i.e. Mr) or suffixes (i.e. Jr or III)

4. Assume the reminder is their name with the first piece being first name, etc

5. If there is only one name (ex. Mary) I would set the last name as blank

6. Would want to also address anything like "J K Rowling" or "J Paul Getty" as part of the business logic.

Thanks, Dennis

Login or Signup to post a comment