Pages

Friday, August 17, 2012

InfoPath Regular Expressions (RegEx)

Soon after you start building your first InfoPath forms you realise that you need more advanced validation than that in SharePoint columns. And then you discover InfoPath validation rules that allow you to compare what a user enters into a field with a specific text or number.

Although it's more than enough in the beginning, but soon you find yourself asking for more. And then there are two ways: either end up with writing code behind or use regular expressions (regex). Both are fine, but I personally suggest to avoid developing custom code inside the forms if there are more elegant solutions, out of the box solutions. 

Regex in InfoPath is awesome! It allows you to do so much in one string. For example, this one validates an email address and won't allow anything that doesn't match the pattern:

.+@.+(\.).+

It does look cumbersome at first and you may even start to hate it in case you haven't seen regex before, but that's only until you write your first string that solves a real-world validation problem you've been fighting for days.


How it works

InfoPath itself is very scarce on regex help. The only help I've managed to find inside the program was in the "Data Entry Pattern", i.e. the same window that you use for building your string:



Yes, it shows you how to insert codes for letters, digits, and gives some tips on syntax, but what about general rules, metacharacters or escape characters ? You can google for them and  browse through millions of forums just like I did, but really everything you need is here:



Unlike programming, regex only requires you to understand the syntax - everything else is a matter of combining codes for characters.

I'm not going to explain in detail how to build specific strings, but I'll give a few examples that you may find useful.


Validating domain and username

Say you are developing a site request form which provides new sites to users from specific domains only. The following string will not allow to enter a username which not in the list of specified domains (Domain1, Domain2, etc.) and contains spaces or any special characters except for underscore, dash and dot, which in their turn cannot go first or last.

(DOMAIN1|DOMAIN2|DOMAIN3)\\[A-Za-z0-9]+[_\-\.]?[A-Za-z0-9]+
  • "|" stands for boolean OR operator.
  • "\\" stands for a backslash. You have to type two of them, because one "\" is reserved metacharacter.
  • "[A-Za-z0-9]" - matches any letter or digit.
  • "+" - means that there can be one or more occurrences of the preceding expression, i.e. "[A-Za-z0-9]", which makes an unlimited combination of letters and/or digits.
  • "_" - simply an underscore.
  • "\-" - defines a dash.
  • "\." - defines a dot.
  • "?" - zero or one occurrence of the preceding expression, i.e. "[_\-\.]", so a username can contain zero or one these special characters in between.
  • "+" - one or more occurrence of the preceding expression.



Validating a site name

Another example is useful when you want to validate site names, so users cannot request sites that have any special characters except for underscore and dash, which again cannot go first or last and cannot repeat.

[^_\-][A-Za-z0-9]+[A-Za-z0-9_\-]*[A-Za-z0-9]+

  • "^" - allows to exclude characters that follow it, i.e. won't allow to enter "_" and "-" in the beginning.
  • "*" - one or more occurrences of the previous expression.

Not everything is fine with this regexp - it will still allow you to enter repeating underscores and dashes, i.e. "----". You can go and limit them in the expression or you can add extra validation using other InfoPath rules, for example:



Not as elegant and still prone to issues, i.e. users can enter five or more repeating characters, but it works for me. At least for now :)

Actually, it's a good example showing that regex is powerful, but not ideal and still has some limitations which you can either easily work around or spend hours trying to find a better way. The choice is yours :)



That's it for now, but I'll be definitely coming back to this exciting topic later. I'm sure I'll be using more regular expressions in the future so I hope to share a bit more.

3 comments:

  1. Thanks for sharing! I found this post very useful :)

    ReplyDelete
  2. Typo in "Validating a site name"
    "*" - one or more occurrences of the previous expression.
    Not "one or more" but "zero or more"
    Nice post

    ReplyDelete
  3. Needed to compose you a very little word to thank you yet again
    regarding the nice suggestions you’ve contributed here.
    Manual Testing Training institute in Chennai
    oracle dba course in chennai

    ReplyDelete