URIにマッチする正規表現が欲しかったので、URLにマッチする真の正規表現 – RFC3986定義のURIの話を参考にさせていただいて、簡略化とリファクタリングをしました。
ホストのipv6addressとipvFutureは長い割りに今のところ使うシーンを思いつかなかったので割愛しました。CoffeeScriptで書いています。
### regular expression that matches with URL Usage: result = string.match(urlRegExp) result[0] - URL result[1] - scheme result[2] - userInfo result[3] - host result[4] - port result[5] - path result[6] - query result[7] - fragment ipLiteral (ipv6address or ipvFuture) is not supported. reference: http://swatmac.info/?p=1064 ### pchar = '(?:[-.0-9_a-z~]|%[0-9a-f][0-9a-f]|[!$&-,;=:@])' pcharNC = '(?:[-.0-9_a-z~]|%[0-9a-f][0-9a-f]|[!$&-,;=@])' # no column vchar = '(?:[-.0-9_a-z~]|%[0-9a-f][0-9a-f]|[!$&-,;=:@/?])' vString = "(#{vchar}*)" scheme = '([a-z][a-z0-9+-.]*)' userInfo = '((?:[-.0-9_a-z~]|%[0-9a-f][0-9a-f]|[!$&-,:;=])*)' ### ipv6address = ipvFutre = ipLiteral = "\[(?:#{ipv6address}|#{ipvFuture})\]" ### ipv4address = '(?:(?:\\d|[1-9]\\d|1\\d{2}|2[0-4]\\d|25[0-5])\.(?:\\d|[1-9]\\d|1\\d{2}|2[0-4]\\d|25[0-5])\.(?:\\d|[1-9]\\d|1\\d{2}|2[0-4]\\d|25[0-5])\.(?:\\d|[1-9]\\d|1\\d{2}|2[0-4]\\d|25[0-5]))' regName = '((?:[-.0-9_a-z~]|%[0-9a-f][0-9a-f]|[!$&-,;=])+)' # host = "(?:#{ipLiteral}|#{ipv4address}|#{regName})" host = "(?:#{ipv4address}|#{regName})" port = '(\\d+)' authority = "(?:(?:#{userInfo}@)?#{host}(?::#{port})?)" pathAbempty = "(?:/#{pchar}*)*" pathAbsolute = "/(?:#{pchar}+/#{pchar}*)?" pathRootless = "#{pcharNC}+(?:/#{pchar}*)*" hierPart = "//#{authority}(#{pathAbempty}|#{pathAbsolute}|#{pathRootless})" query = vString fragment = vString urlRegExp = new RegExp "#{scheme}:#{hierPart}(?:\\?#{query})?(?:##{fragment})?", 'i' console.log 'これは http://foo@hostname:4567/path?query=query#fragment のテストです'.match urlRegExp