URIChunk (Class)

In: app/models/chunks/uri.rb
Parent: Chunk::Abstract

This wiki chunk matches arbitrary URIs, using patterns from the Ruby URI modules. It parses out a variety of fields that could be used by renderers to format the links in various ways (shortening domain names, hiding email addresses) It matches email addresses and host.com.au domains without schemes (http://) but adds these on as required.

The heuristic used to match a URI is designed to err on the side of caution. That is, it is more likely to not autolink a URI than it is to accidently autolink something that is not a URI. The reason behind this is it is easier to force a URI link by prefixing ‘http://’ to it than it is to escape and incorrectly marked up non-URI.

I’m using a part of the [ISO 3166-1 Standard][iso3166] for country name suffixes. The generic names are from www.bnoack.com/data/countrycode2.html)

  [iso3166]: http://geotags.com/iso3166/

Methods

escaped_text   new   pattern   unmask  

Constants

GENERIC = '(?:aero|biz|com|coop|edu|gov|info|int|mil|museum|name|net|org)'
COUNTRY = '(?:au|at|be|ca|ch|de|dk|fr|hk|in|ir|it|jp|nl|no|pt|ru|se|sw|tv|tw|uk|us)'
TLDS = "\\.(?:#{GENERIC}|#{COUNTRY})"
  These are needed otherwise HOST will match almost anything
USERINFO = "(?:[#{UNRESERVED};:&=+$,]|#{ESCAPED})+"
  Redefine USERINFO so that it must have non-zero length
URI_ENDING = '[)!]'
  Pattern of legal URI endings to stop interference with some Textile markup. (Images: !URI!) and other punctuation eg, (wiki.com/)
URI_PATTERN = "(?:(#{SCHEME})://)?" + # Optional scheme:// (\1|\8) "(?:(#{USERINFO})@)?" + # Optional userinfo@ (\2|\9) "(#{HOSTNAME}#{TLDS})" + # Mandatory host eg, HOST.com.au (\3|\10) "(?::(#{PORT}))?" + # Optional :port (\4|\11) "(#{ABS_PATH})?" + # Optional absolute path (\5|\12) "(?:\\?(#{QUERY}))?" + # Optional ?query (\6|\13) "(?:\\#(#{FRAGMENT}))?"
  The basic URI expression as a string

Attributes

fragment  [R] 
host  [R] 
link_text  [R] 
path  [R] 
port  [R] 
query  [R] 
scheme  [R] 
uri  [R] 
user  [R] 

Included Modules

URI::REGEXP::PATTERN

Public Class methods

[Source]

    # File app/models/chunks/uri.rb, line 55
55:   def initialize(match_data)
56:     super(match_data)
57:     # Since the URI_PATTERN is tried twice, there are two sets of

58:     # groups, one from \1 to \7 and the second from \8 to \14.

59:     # The fields are set by which ever group matches.

60:     @scheme     = match_data[1] || match_data[8]
61:     @user       = match_data[2] || match_data[9]
62:     @host       = match_data[3] || match_data[10]
63:     @port               = match_data[4] || match_data[11]
64:     @path               = match_data[5] || match_data[12]
65:     @query              = match_data[6] || match_data[13]
66:     @fragment   = match_data[7] || match_data[14]
67: 
68:     # If there is no scheme, add an appropriate one, otherwise

69:     # set the URI to the matched text.

70:         @text_scheme = scheme
71:     @uri = (scheme ? match_data[0] : nil )
72:     @scheme = scheme || ( user ? 'mailto' : 'http' )
73:     @delimiter = ( scheme == 'mailto' ? ':' : '://' ) 
74:     @uri ||= scheme + @delimiter + match_data[0]
75: 
76:     # Build up the link text. Schemes are omitted unless explicitly given.

77:         @link_text = ''
78:       @link_text << "#{@scheme}#{@delimiter}" if @text_scheme
79:       @link_text << "#{@user}@" if @user
80:       @link_text << "#{@host}" if @host
81:       @link_text << ":#{@port}" if @port
82:       @link_text << "#{@path}" if @path
83:       @link_text << "?#{@query}" if @query
84:   end

[Source]

    # File app/models/chunks/uri.rb, line 44
44:   def self.pattern()
45:     # This pattern first tries to match the URI_PATTERN that ends with 

46:     # punctuation that is a valid URI character (eg, ')', '!'). If

47:     # such a match occurs, there should be no backtracking (hence the ?> ). 

48:     # If the string cannot match a URI ending with URI_ENDING, then a second

49:     # attempt is tried.

50:     Regexp.new("(?>#{URI_PATTERN}(?=#{URI_ENDING}))|#{URI_PATTERN}", Regexp::EXTENDED, 'N')
51:   end

Public Instance methods

If there is no hostname in the URI, do not render it It’s probably only contains the scheme, eg ‘something:’

[Source]

    # File app/models/chunks/uri.rb, line 96
96:   def escaped_text() ( host.nil? ? @uri : nil )  end

If the text should be escaped then don’t keep this chunk. Otherwise only keep this chunk if it was substituted back into the content.

[Source]

    # File app/models/chunks/uri.rb, line 89
89:   def unmask(content) 
90:     return nil if escaped_text
91:     return self if content.sub!( Regexp.new(mask(content)), "<a href=\"#{uri}\">#{link_text}</a>" )
92:   end

[Validate]