/
notes
  1 #-*-mode: org;-*-
  2 * Purpose
  3 "Oh, Ducks!" is an extension to cl-unification to make parsing
  4 structured documents easy, using CSS selectors.
  5 * Installation
  6 ** Prerequisites
  7  + cl-unification
  8  + cl-ppcre
  9  + split-sequence
 10  + alexandria
 11  + asdf-system-connections
 12  * closure-html
 13  * cxml
 14  * named-readtables
 15 [+] Mandatory  [*] Optional
 16 ** Loading
 17 Loading "Oh, Ducks!" is just like loading any other ASDF system.
 18 However, because it does not mandate a particular HTML or XML parser,
 19 it does not generally become useful until you have also loaded an
 20 HTML/XML parsing library such as cxml or closure-html.
 21 
 22 Start with:
 23 : (asdf:oos 'asdf:load-op :oh-ducks)
 24 If you would like to use the built-in support for parsing via
 25 closure-html (which you almost certainly do), you'll also want to load
 26 closure-html:
 27 : (asdf:oos 'asdf:load-op :closure-html)
 28 And, if you want to use DOM objects provided by cxml:
 29 : (asdf:oos 'asdf:load-op :cxml)
 30 ** Load-order Caveats
 31 closure-html and cl-unification each define competing readers on #t.
 32 To avoid load-order issues resulting in an indeterminate reader on #t,
 33 you'll probably want to add
 34 : #.(set-dispatch-macro-character #\# #\T 'unify::|sharp-T-reader|)
 35 or
 36 : (unify:enable-template-reader)
 37 or
 38 : (named-readtables:in-readtable unify:template-readtable)
 39 to the top of any file which uses cl-unification's reader templates.
 40 (The latter two currently only work if you have cl-unification from my
 41 darcs repo.)
 42 
 43 Please feel free to submit patches to closure-html and cl-unification
 44 to fix this problem.
 45 ** Depending Upon in ASDF Systems
 46 It doesn't take long before managing your dependencies upon ASDF
 47 systems becomes easiest by creating an ASDF system for whatever
 48 project you're currently engaged in.  It's important to note that, in
 49 addition to depending upon oh-ducks, you'll also want to depend upon
 50 whichever library provides your desired object model and parser.
 51 
 52 For example,
 53 : :depends-on (:oh-ducks :closure-html :cxml)
 54 ** Differentiating between LHTML lists and XMLS lists
 55 While it would, in theory, be possible to inspect lists and determine if they
 56 are LHTML or XMLS lists, this is not currently done.  You can, however, choose
 57 which type you'd like to work with by pushing =:lists-are-xmls= or
 58 =:lists-are-lhtml= to =*features*= before loading "Oh, Ducks!".
 59 
 60 Unfortunately, this means you can only expect to use one list type in a single
 61 lisp image.  Patches to either automagically detect the list type, or to provide
 62 layered functions are welcome.
 63 * Usage
 64 The combination of oh-ducks and closure-html provides an HTML template
 65 for use with cl-unification, and has the following syntax:
 66 
 67   (match (#t(html [(:model <model>)]
 68                   <selectors>+)
 69           <document>)
 70     &body)
 71   selectors := (<selector> . <binding>) |
 72                (<selector> . <template>) |
 73                (<selector> <selectors>+)
 74   document := <parsed-document> | <document-to-be-parsed>
 75 
 76 :model is only necessary for unparsed documents (e.g., a pathname or string).
 77 
 78 ** Examples
 79 
 80 (match (#T(html (:model lhtml)
 81                 ("#id" . ?div))
 82         "<div id=\"id\">I <i>like</i> cheese.</div>")
 83         (car div)) =>
 84   (:div ((:id "id")) "I " (:i () "like") " cheese.")
 85 
 86 (match (#T(html (:model dom)
 87                 ("i" . #t(list ?j ?i))
 88                 ("span>i" . ?span))
 89         "<div>I do <i>not</i> like cheese.</div><div><span>I like <i>cheese</i>.</span></div>")
 90   (values i span)) =>
 91   #<ELEMENT i "not">,
 92   (#<ELEMENT i "cheese">)
 93 
 94 ** Selectors
 95 The goal is to support all CSS-level-3 selectors.  See the section
 96 [[*improve selector support][To Do > Improve Selector Support]] for a list of currently unsupported
 97 simple selectors and combinators.
 98 
 99 Each selector should result in the same elements which would be
100 affected by the same CSS selector.  That is,
101   #id => elements with id of "id"
102   .foo.bar => elements with both "foo" and "bar" classes
103   div => all <div>s
104 and so forth.
105 
106 NOTE: selectors are currently bound in parallel.  That is, given
107   #t(html (<selector-1> ...)
108           (<selector-2> ...))
109 selector-1 and selector-2 do not interact.  If they are both "foo", they'll
110 return identical results.  I often find myself wanting to also say something
111 like:
112   #t(html (<selector-1> ...)
113           (<element-after-selector-1> ...))
114 Ideas for a syntax to distinguish between the two cases are welcome (:mode
115 parallel) vs (:mode sequential), perhaps?  (Or even adjacent, sibling?)
116 
117 *** Limitations
118 
119 Currently, selector terms are limited to alphanumeric characters, and
120 do not support CSS-style character escapes.  Patches welcome!
121 
122 ** Included Object Models
123 *** LHTML (closure-html)
124 A list-based structure provided by closure-html.  Cannot be used by
125 selectors which require asking about parent or sibling objects.
126 *** PT (closure-html)
127 A structure-based structure provided by closure-html.
128 *** DOM (cxml)
129 DOM objects as provided by cxml and defined by the W3C.
130 * Extending
131 ** Adding an object model
132 While the supported models should generally be sufficient, you can add
133 your own fairly easily.  All models are expected to implement the
134 generic functions in <traversal/interface.lisp>.  See the other files
135 under the traversal/ directory for examples.
136 
137 You might also want to see chtml.lisp and cxml.lisp.
138 ** Adding a selector or combinator
139 see <selectors.lisp>.  Generally, you should add a class which is a
140 subclass of combinator or simple-selector, augment parse-selector with
141 an appropriate regular expression, and define a method on
142 subject-p.
143 
144 I also recommend submitting a patch.  Other people might want to use
145 that selector, too!
146 * To Do
147 ** working lhtml/xmls support [2/2]
148  * [X] non-descendant cases (class, id, etc.)
149  * [X] selectors involving descendants
150    CAUTION: Won't produce sane results if the document tree is
151             modified or you use nested (match)es.
152 ** write documentation
153 ** improve selector support
154 *** positional selectors [11/11]
155  * [X] :nth-child
156  * [X] :nth-last-child
157  * [X] :first-child
158  * [X] :last-child
159  * [X] :nth-of-type
160  * [X] :nth-last-of-type
161  * [X] :first-of-type
162  * [X] :last-of-type
163  * [X] :only-child
164  * [X] :only-of-type
165  * [X] :empty
166 *** attribute selectors [2/7]
167  * [X] attribute-present  [att]
168  * [X] attribute-equal    [att=val]
169  * [ ] attribute-member   [att~=val]
170  * [ ] attribute-lang     [att|=val]
171  * [ ] attribute-begins   [att^=val]
172  * [ ] attribute-ends     [att$=val]
173  * [ ] attribute-contains [att*=val]
174 *** :not(...)
175 *** any others?
176 ** namespace support(?)
177 ** Submit patch to cl-unification to add (enable/disable-template-reader) functions
178 Submitted.  Was it ever accepted?  Man, I don't remember.
179 ** Submit patch to closure-html to add (enable/disable-reader) functions
180 ** non-css templates (e.g., for matching on text of element)?
181 Maybe special-case string/regexp-templates, so for example
182 : #t(html ("div" (#t(regexp "f(o+)bar") . ?div)))
183 would match [<div>foooobar</div>]?
184 
185 : #t(html ("div" . #t(regexp "f(o+)bar" (?o))))
186 might cause some difficulty, however--we should get a list of matched elements
187 for the div selector, but the regexp variable (?o) can only match once (without
188 some wacky environment merging, anyway).
189 ** Element structure templates
190 For instance, sometimes it'd be nice to stuff the value of an attribute into a
191 variable, like so:
192 :  (match #t(attr ("href" ?href) ("name" ?name)) "<a href='url' name='link'></a>"
193 :     (values href name)) =>
194 :    "url", "link"
195 While it's certainly easy enough to do that using, say, XMLS-style lists, a
196 general object-model-agnostic method would seem to be preferrable.
197 ** Layered functions so LHTML vs. XMLS support can be switched at runtime