Add support for XMLS-style lists, conflicting with LHTML-style lists
Annotate for file /notes
2009-11-18 pix 1 #-*-mode: org;-*-
2009-11-23 pix 2 * Purpose
13:13:22 ' 3 "Oh, Ducks!" is an extension to cl-unification to make parsing
' 4 structured documents easy, using CSS selectors.
' 5 * Installation
' 6 ** Prerequisites
' 7 + cl-unification
' 8 + cl-ppcre
' 9 + split-sequence
' 10 + alexandria
' 11 + asdf-system-connections
' 12 * closure-html
' 13 * cxml
2010-02-07 pix 14 * named-readtables
2009-11-23 pix 15 [+] Mandatory [*] Optional
13:13:22 ' 16 ** Loading
' 17 Loading "Oh, Ducks!" is just like loading any other ASDF system.
' 18 However, because it does not mandate a particular HTML or XML parser,
2009-12-13 pix 19 it does not generally become useful until you have also loaded an
2009-11-23 pix 20 HTML/XML parsing library such as cxml or closure-html.
13:13:22 ' 21
' 22 Start with:
2009-12-28 pix 23 : (asdf:oos 'asdf:load-op :oh-ducks)
2009-11-23 pix 24 If you would like to use the built-in support for parsing via
13:13:22 ' 25 closure-html (which you almost certainly do), you'll also want to load
' 26 closure-html:
2009-12-28 pix 27 : (asdf:oos 'asdf:load-op :closure-html)
2009-11-23 pix 28 And, if you want to use DOM objects provided by cxml:
2009-12-28 pix 29 : (asdf:oos 'asdf:load-op :cxml)
2009-11-23 pix 30 ** Load-order Caveats
13:13:22 ' 31 closure-html and cl-unification each define competing readers on #t.
' 32 To avoid load-order issues resulting in an indeterminate reader on #t,
' 33 you'll probably want to add
2009-12-28 pix 34 : #.(set-dispatch-macro-character #\# #\T 'unify::|sharp-T-reader|)
2010-02-07 pix 35 or
09:21:16 ' 36 : (unify:enable-template-reader)
' 37 or
' 38 : (named-readtables:in-readtable unify:template-readtable)
2009-11-23 pix 39 to the top of any file which uses cl-unification's reader templates.
2010-02-07 pix 40 (The latter two currently only work if you have cl-unification from my
09:21:16 ' 41 darcs repo.)
2009-11-23 pix 42
13:13:22 ' 43 Please feel free to submit patches to closure-html and cl-unification
' 44 to fix this problem.
2009-12-20 pix 45 ** Depending Upon in ASDF Systems
08:23:57 ' 46 It doesn't take long before managing your dependencies upon ASDF
' 47 systems becomes easiest by creating an ASDF system for whatever
' 48 project you're currently engaged in. It's important to note that, in
' 49 addition to depending upon oh-ducks, you'll also want to depend upon
' 50 whichever library provides your desired object model and parser.
' 51
' 52 For example,
2009-12-28 pix 53 : :depends-on (:oh-ducks :closure-html :cxml)
2011-07-03 pix 54 ** Differentiating between LHTML lists and XMLS lists
08:25:45 ' 55 While it would, in theory, be possible to inspect lists and determine if they
' 56 are LHTML or XMLS lists, this is not currently done. You can, however, choose
' 57 which type you'd like to work with by pushing =:lists-are-xmls= or
' 58 =:lists-are-lhtml= to =*features*= before loading "Oh, Ducks!".
2009-12-20 pix 59
2011-07-03 pix 60 Unfortunately, this means you can only expect to use one list type in a single
08:25:45 ' 61 lisp image. Patches to either automagically detect the list type, or to provide
' 62 layered functions are welcome.
2009-11-23 pix 63 * Usage
13:13:22 ' 64 The combination of oh-ducks and closure-html provides an HTML template
' 65 for use with cl-unification, and has the following syntax:
' 66
' 67 (match (#t(html [(:model <model>)]
' 68 <selectors>+)
' 69 <document>)
' 70 &body)
' 71 selectors := (<selector> . <binding>) |
' 72 (<selector> . <template>) |
' 73 (<selector> <selectors>+)
' 74 document := <parsed-document> | <document-to-be-parsed>
' 75
' 76 :model is only necessary for unparsed documents (e.g., a pathname or string).
' 77
' 78 ** Examples
' 79
' 80 (match (#T(html (:model lhtml)
' 81 ("#id" . ?div))
' 82 "<div id=\"id\">I <i>like</i> cheese.</div>")
' 83 (car div)) =>
' 84 (:div ((:id "id")) "I " (:i () "like") " cheese.")
' 85
' 86 (match (#T(html (:model dom)
' 87 ("i" . #t(list ?j ?i))
' 88 ("span>i" . ?span))
2010-02-07 pix 89 "<div>I do <i>not</i> like cheese.</div><div><span>I like <i>cheese</i>.</span></div>")
2009-11-23 pix 90 (values i span)) =>
13:13:22 ' 91 #<ELEMENT i "not">,
' 92 (#<ELEMENT i "cheese">)
' 93
' 94 ** Selectors
2009-12-20 pix 95 The goal is to support all CSS-level-3 selectors. See the section
08:23:57 ' 96 [[*improve selector support][To Do > Improve Selector Support]] for a list of currently unsupported
' 97 simple selectors and combinators.
2009-11-23 pix 98
13:13:22 ' 99 Each selector should result in the same elements which would be
' 100 affected by the same CSS selector. That is,
2009-12-13 pix 101 #id => elements with id of "id"
2009-11-23 pix 102 .foo.bar => elements with both "foo" and "bar" classes
13:13:22 ' 103 div => all <div>s
' 104 and so forth.
' 105
2009-12-13 pix 106 NOTE: selectors are currently bound in parallel. That is, given
05:32:46 ' 107 #t(html (<selector-1> ...)
' 108 (<selector-2> ...))
' 109 selector-1 and selector-2 do not interact. If they are both "foo", they'll
' 110 return identical results. I often find myself wanting to also say something
' 111 like:
' 112 #t(html (<selector-1> ...)
' 113 (<element-after-selector-1> ...))
' 114 Ideas for a syntax to distinguish between the two cases are welcome (:mode
' 115 parallel) vs (:mode sequential), perhaps? (Or even adjacent, sibling?)
' 116
2009-11-23 pix 117 *** Limitations
13:13:22 ' 118
' 119 Currently, selector terms are limited to alphanumeric characters, and
' 120 do not support CSS-style character escapes. Patches welcome!
' 121
' 122 ** Included Object Models
' 123 *** LHTML (closure-html)
' 124 A list-based structure provided by closure-html. Cannot be used by
' 125 selectors which require asking about parent or sibling objects.
' 126 *** PT (closure-html)
' 127 A structure-based structure provided by closure-html.
' 128 *** DOM (cxml)
' 129 DOM objects as provided by cxml and defined by the W3C.
' 130 * Extending
' 131 ** Adding an object model
' 132 While the supported models should generally be sufficient, you can add
' 133 your own fairly easily. All models are expected to implement the
' 134 generic functions in <traversal/interface.lisp>. See the other files
' 135 under the traversal/ directory for examples.
' 136
' 137 You might also want to see chtml.lisp and cxml.lisp.
' 138 ** Adding a selector or combinator
' 139 see <selectors.lisp>. Generally, you should add a class which is a
' 140 subclass of combinator or simple-selector, augment parse-selector with
' 141 an appropriate regular expression, and define a method on
2010-01-04 pix 142 subject-p.
2009-11-23 pix 143
13:13:22 ' 144 I also recommend submitting a patch. Other people might want to use
' 145 that selector, too!
2009-11-18 pix 146 * To Do
2010-01-04 pix 147 ** working lhtml/xmls support [2/2]
2009-11-21 pix 148 * [X] non-descendant cases (class, id, etc.)
2010-01-04 pix 149 * [X] selectors involving descendants
07:06:50 ' 150 CAUTION: Won't produce sane results if the document tree is
' 151 modified or you use nested (match)es.
2009-11-18 pix 152 ** write documentation
10:23:22 ' 153 ** improve selector support
2010-01-04 pix 154 *** positional selectors [11/11]
06:36:34 ' 155 * [X] :nth-child
' 156 * [X] :nth-last-child
2009-11-23 pix 157 * [X] :first-child
2010-01-04 pix 158 * [X] :last-child
2010-01-04 pix 159 * [X] :nth-of-type
06:36:34 ' 160 * [X] :nth-last-of-type
2010-01-04 pix 161 * [X] :first-of-type
06:32:07 ' 162 * [X] :last-of-type
2010-01-04 pix 163 * [X] :only-child
2010-01-04 pix 164 * [X] :only-of-type
2010-01-04 pix 165 * [X] :empty
2010-02-10 pix 166 *** attribute selectors [2/7]
2010-02-10 pix 167 * [X] attribute-present [att]
2010-02-10 pix 168 * [X] attribute-equal [att=val]
2009-11-19 pix 169 * [ ] attribute-member [att~=val]
06:25:36 ' 170 * [ ] attribute-lang [att|=val]
' 171 * [ ] attribute-begins [att^=val]
' 172 * [ ] attribute-ends [att$=val]
' 173 * [ ] attribute-contains [att*=val]
' 174 *** :not(...)
2009-11-18 pix 175 *** any others?
2009-11-19 pix 176 ** namespace support(?)
2009-11-23 pix 177 ** Submit patch to cl-unification to add (enable/disable-template-reader) functions
2011-06-05 pix 178 Submitted. Was it ever accepted? Man, I don't remember.
2009-11-23 pix 179 ** Submit patch to closure-html to add (enable/disable-reader) functions
13:13:22 ' 180 ** non-css templates (e.g., for matching on text of element)?
2009-12-13 pix 181 Maybe special-case string/regexp-templates, so for example
2009-12-28 pix 182 : #t(html ("div" (#t(regexp "f(o+)bar") . ?div)))
2009-12-13 pix 183 would match [<div>foooobar</div>]?
05:32:46 ' 184
2009-12-28 pix 185 : #t(html ("div" . #t(regexp "f(o+)bar" (?o))))
2009-12-13 pix 186 might cause some difficulty, however--we should get a list of matched elements
05:32:46 ' 187 for the div selector, but the regexp variable (?o) can only match once (without
' 188 some wacky environment merging, anyway).
2011-06-05 pix 189 ** Element structure templates
21:44:21 ' 190 For instance, sometimes it'd be nice to stuff the value of an attribute into a
' 191 variable, like so:
' 192 : (match #t(attr ("href" ?href) ("name" ?name)) "<a href='url' name='link'></a>"
' 193 : (values href name)) =>
' 194 : "url", "link"
' 195 While it's certainly easy enough to do that using, say, XMLS-style lists, a
' 196 general object-model-agnostic method would seem to be preferrable.
' 197 ** Layered functions so LHTML vs. XMLS support can be switched at runtime