Ruby bindings for rust/regex library.
Install Rust via rustup or in any other way.
Add as a dependency:
# In your Gemfile
gem "rust_regexp"
# Or without Bundler
gem install rust_regexp
Include in your code:
require "rust_regexp"
Regular expressions should be pre-compiled before use:
re = RustRegexp.new('p.t{2}ern*')
# => #<RustRegexp:...>
Tip
Note the use of single quotes when passing the regular expression as
a string to rust/regex
so that the backslashes aren't interpreted as escapes.
To find a single match in the haystack:
RustRegexp.new('\w+:\d+').match("ruby:123, rust:456")
# => ["ruby:123"]
RustRegexp.new('(\w+):(\d+)').match("ruby:123, rust:456")
# => ["ruby", "123"]
To find all matches in the haystack:
RustRegexp.new('\w+:\d+').scan("ruby:123, rust:456")
# => ["ruby:123", "rust:456"]
RustRegexp.new('(\w+):(\d+)').scan("ruby:123, rust:456")
# => [["ruby", "123"], ["rust", "456"]]
To check whether there is at least one match in the haystack:
RustRegexp.new('\w+:\d+').match?("ruby:123")
# => true
RustRegexp.new('\w+:\d+').match?("ruby")
# => false
Inspect original pattern:
RustRegexp.new('\w+:\d+').pattern
# => "(\\w+):(\\d+)"
Warning
rust/regex
regular expression syntax differs from Ruby's built-in
Regexp
library, see the
official syntax page for more
details.
RustRegexp::Set
represents a collection of
regular expressions that can be searched for simultaneously. Calling RustRegexp::Set#match
will return an array containing the indices of all the patterns that matched.
set = RustRegexp::Set.new(["abc", "def", "ghi", "xyz"])
set.match("abcdefghi") # => [0, 1, 2]
set.match("ghidefabc") # => [0, 1, 2]
Note
Matches arrive in the order the constituent patterns were declared, not the order they appear in the haystack.
To check whether at least one pattern from the set matches the haystack:
RustRegexp::Set.new(["abc", "def"]).match?("abc")
# => true
RustRegexp::Set.new(["abc", "def"]).match?("123")
# => false
Inspect original patterns:
RustRegexp::Set.new(["abc", "def"]).patterns
# => ["abc", "def"]
Currently, rust_regexp
expects the haystack to be an UTF-8 string.
It also supports parsing of strings with invalid UTF-8 characters by default. It's achieved via using regex::bytes
instead of plain regex
under the hood, so any byte sequence can be matched. The output match is encoded as UTF-8 string.
In case unicode awarness of matchers should be disabled, both RustRegexp
and RustRegexp::Set
support unicode: false
option:
RustRegexp.new('\w+').match('ю٤夏')
# => ["ю٤夏"]
RustRegexp.new('\w+', unicode: false).match('ю٤夏')
# => []
RustRegexp::Set.new(['\w', '\d', '\s']).match("ю٤\u2000")
# => [0, 1, 2]
RustRegexp::Set.new(['\w', '\d', '\s'], unicode: false).match("ю٤\u2000")
# => []
bin/setup # install deps
bin/console # interactive prompt to play around
rake compile # (re)compile extension
rake spec # run tests
Bug reports and pull requests are welcome on GitHub at https://github.com/ocvit/rust_regexp.
The gem is available as open source under the terms of the MIT License.