Splitting Strings in Ruby with Regular Expressions

Sometimes you need to split into parts and splitting on a sub string just won’t cut it. One such example I ran across the other day involved splitting a string containing file paths. The given string would contain a number of file paths (such as folder/cuke.feature). This string would then be split at each space. However, sometimes the file path could contain spaces. This resulted in an incorrect array of files.

"folder/cuke.feature folder two/cuke.feature".split
# => [ "folder/cuke.feature:2", "folder", "two/cuke.feature:4" ]

This would cause problems when cucumber would look for a feature file that didn’t exist.

In Ruby, split can take a regular expression as a parameter (Note that using a regular expression to split will also add the empty string “” to your result). By adding the following to the split above the correct file paths are obtained. The empty string can be discarded and the results can be trimmed to remove excess white space.

"folder/cuke.feature folder two/cuke.feature".split(/(.*?\.feature.*?) /).collect(&:strip).reject(&:empty?)
# => [ "folder/cuke.feature", "folder two/cuke.feature" ]

The capture group in the regular expression is actually important. Without it, the sub string for the file paths would be discarded and the result would be different, so keep this in mind if your split string is missing things:

"folder/cuke.feature folder two/cuke.feature".split(/.*?\.feature.*? /)
# => ["", "folder two/cuke.feature"]