repos
/
Oh, Ducks!
/ annotate_shade
summary
|
shortlog
|
log
|
tree
|
commit
|
commitdiff
|
headdiff
|
annotate
|
headblob
|
headfilediff
|
filehistory
normal
|
plain
|
shade
|
zebra
Add support for XMLS-style lists, conflicting with LHTML-style lists
Annotate for file /notes
2009-11-18 pix
1
#-*-mode: org;-*-
2009-11-23 pix
2
* Purpose
13:13:22 '
3
"Oh, Ducks!" is an extension to cl-unification to make parsing
'
4
structured documents easy, using CSS selectors.
'
5
* Installation
'
6
** Prerequisites
'
7
+ cl-unification
'
8
+ cl-ppcre
'
9
+ split-sequence
'
10
+ alexandria
'
11
+ asdf-system-connections
'
12
* closure-html
'
13
* cxml
2010-02-07 pix
14
* named-readtables
2009-11-23 pix
15
[+] Mandatory [*] Optional
13:13:22 '
16
** Loading
'
17
Loading "Oh, Ducks!" is just like loading any other ASDF system.
'
18
However, because it does not mandate a particular HTML or XML parser,
2009-12-13 pix
19
it does not generally become useful until you have also loaded an
2009-11-23 pix
20
HTML/XML parsing library such as cxml or closure-html.
13:13:22 '
21
'
22
Start with:
2009-12-28 pix
23
: (asdf:oos 'asdf:load-op :oh-ducks)
2009-11-23 pix
24
If you would like to use the built-in support for parsing via
13:13:22 '
25
closure-html (which you almost certainly do), you'll also want to load
'
26
closure-html:
2009-12-28 pix
27
: (asdf:oos 'asdf:load-op :closure-html)
2009-11-23 pix
28
And, if you want to use DOM objects provided by cxml:
2009-12-28 pix
29
: (asdf:oos 'asdf:load-op :cxml)
2009-11-23 pix
30
** Load-order Caveats
13:13:22 '
31
closure-html and cl-unification each define competing readers on #t.
'
32
To avoid load-order issues resulting in an indeterminate reader on #t,
'
33
you'll probably want to add
2009-12-28 pix
34
: #.(set-dispatch-macro-character #\# #\T 'unify::|sharp-T-reader|)
2010-02-07 pix
35
or
09:21:16 '
36
: (unify:enable-template-reader)
'
37
or
'
38
: (named-readtables:in-readtable unify:template-readtable)
2009-11-23 pix
39
to the top of any file which uses cl-unification's reader templates.
2010-02-07 pix
40
(The latter two currently only work if you have cl-unification from my
09:21:16 '
41
darcs repo.)
2009-11-23 pix
42
13:13:22 '
43
Please feel free to submit patches to closure-html and cl-unification
'
44
to fix this problem.
2009-12-20 pix
45
** Depending Upon in ASDF Systems
08:23:57 '
46
It doesn't take long before managing your dependencies upon ASDF
'
47
systems becomes easiest by creating an ASDF system for whatever
'
48
project you're currently engaged in. It's important to note that, in
'
49
addition to depending upon oh-ducks, you'll also want to depend upon
'
50
whichever library provides your desired object model and parser.
'
51
'
52
For example,
2009-12-28 pix
53
: :depends-on (:oh-ducks :closure-html :cxml)
2011-07-03 pix
54
** Differentiating between LHTML lists and XMLS lists
08:25:45 '
55
While it would, in theory, be possible to inspect lists and determine if they
'
56
are LHTML or XMLS lists, this is not currently done. You can, however, choose
'
57
which type you'd like to work with by pushing =:lists-are-xmls= or
'
58
=:lists-are-lhtml= to =*features*= before loading "Oh, Ducks!".
2009-12-20 pix
59
2011-07-03 pix
60
Unfortunately, this means you can only expect to use one list type in a single
08:25:45 '
61
lisp image. Patches to either automagically detect the list type, or to provide
'
62
layered functions are welcome.
2009-11-23 pix
63
* Usage
13:13:22 '
64
The combination of oh-ducks and closure-html provides an HTML template
'
65
for use with cl-unification, and has the following syntax:
'
66
'
67
(match (#t(html [(:model <model>)]
'
68
<selectors>+)
'
69
<document>)
'
70
&body)
'
71
selectors := (<selector> . <binding>) |
'
72
(<selector> . <template>) |
'
73
(<selector> <selectors>+)
'
74
document := <parsed-document> | <document-to-be-parsed>
'
75
'
76
:model is only necessary for unparsed documents (e.g., a pathname or string).
'
77
'
78
** Examples
'
79
'
80
(match (#T(html (:model lhtml)
'
81
("#id" . ?div))
'
82
"<div id=\"id\">I <i>like</i> cheese.</div>")
'
83
(car div)) =>
'
84
(:div ((:id "id")) "I " (:i () "like") " cheese.")
'
85
'
86
(match (#T(html (:model dom)
'
87
("i" . #t(list ?j ?i))
'
88
("span>i" . ?span))
2010-02-07 pix
89
"<div>I do <i>not</i> like cheese.</div><div><span>I like <i>cheese</i>.</span></div>")
2009-11-23 pix
90
(values i span)) =>
13:13:22 '
91
#<ELEMENT i "not">,
'
92
(#<ELEMENT i "cheese">)
'
93
'
94
** Selectors
2009-12-20 pix
95
The goal is to support all CSS-level-3 selectors. See the section
08:23:57 '
96
[[*improve selector support][To Do > Improve Selector Support]] for a list of currently unsupported
'
97
simple selectors and combinators.
2009-11-23 pix
98
13:13:22 '
99
Each selector should result in the same elements which would be
'
100
affected by the same CSS selector. That is,
2009-12-13 pix
101
#id => elements with id of "id"
2009-11-23 pix
102
.foo.bar => elements with both "foo" and "bar" classes
13:13:22 '
103
div => all <div>s
'
104
and so forth.
'
105
2009-12-13 pix
106
NOTE: selectors are currently bound in parallel. That is, given
05:32:46 '
107
#t(html (<selector-1> ...)
'
108
(<selector-2> ...))
'
109
selector-1 and selector-2 do not interact. If they are both "foo", they'll
'
110
return identical results. I often find myself wanting to also say something
'
111
like:
'
112
#t(html (<selector-1> ...)
'
113
(<element-after-selector-1> ...))
'
114
Ideas for a syntax to distinguish between the two cases are welcome (:mode
'
115
parallel) vs (:mode sequential), perhaps? (Or even adjacent, sibling?)
'
116
2009-11-23 pix
117
*** Limitations
13:13:22 '
118
'
119
Currently, selector terms are limited to alphanumeric characters, and
'
120
do not support CSS-style character escapes. Patches welcome!
'
121
'
122
** Included Object Models
'
123
*** LHTML (closure-html)
'
124
A list-based structure provided by closure-html. Cannot be used by
'
125
selectors which require asking about parent or sibling objects.
'
126
*** PT (closure-html)
'
127
A structure-based structure provided by closure-html.
'
128
*** DOM (cxml)
'
129
DOM objects as provided by cxml and defined by the W3C.
'
130
* Extending
'
131
** Adding an object model
'
132
While the supported models should generally be sufficient, you can add
'
133
your own fairly easily. All models are expected to implement the
'
134
generic functions in <traversal/interface.lisp>. See the other files
'
135
under the traversal/ directory for examples.
'
136
'
137
You might also want to see chtml.lisp and cxml.lisp.
'
138
** Adding a selector or combinator
'
139
see <selectors.lisp>. Generally, you should add a class which is a
'
140
subclass of combinator or simple-selector, augment parse-selector with
'
141
an appropriate regular expression, and define a method on
2010-01-04 pix
142
subject-p.
2009-11-23 pix
143
13:13:22 '
144
I also recommend submitting a patch. Other people might want to use
'
145
that selector, too!
2009-11-18 pix
146
* To Do
2010-01-04 pix
147
** working lhtml/xmls support [2/2]
2009-11-21 pix
148
* [X] non-descendant cases (class, id, etc.)
2010-01-04 pix
149
* [X] selectors involving descendants
07:06:50 '
150
CAUTION: Won't produce sane results if the document tree is
'
151
modified or you use nested (match)es.
2009-11-18 pix
152
** write documentation
10:23:22 '
153
** improve selector support
2010-01-04 pix
154
*** positional selectors [11/11]
06:36:34 '
155
* [X] :nth-child
'
156
* [X] :nth-last-child
2009-11-23 pix
157
* [X] :first-child
2010-01-04 pix
158
* [X] :last-child
2010-01-04 pix
159
* [X] :nth-of-type
06:36:34 '
160
* [X] :nth-last-of-type
2010-01-04 pix
161
* [X] :first-of-type
06:32:07 '
162
* [X] :last-of-type
2010-01-04 pix
163
* [X] :only-child
2010-01-04 pix
164
* [X] :only-of-type
2010-01-04 pix
165
* [X] :empty
2010-02-10 pix
166
*** attribute selectors [2/7]
2010-02-10 pix
167
* [X] attribute-present [att]
2010-02-10 pix
168
* [X] attribute-equal [att=val]
2009-11-19 pix
169
* [ ] attribute-member [att~=val]
06:25:36 '
170
* [ ] attribute-lang [att|=val]
'
171
* [ ] attribute-begins [att^=val]
'
172
* [ ] attribute-ends [att$=val]
'
173
* [ ] attribute-contains [att*=val]
'
174
*** :not(...)
2009-11-18 pix
175
*** any others?
2009-11-19 pix
176
** namespace support(?)
2009-11-23 pix
177
** Submit patch to cl-unification to add (enable/disable-template-reader) functions
2011-06-05 pix
178
Submitted. Was it ever accepted? Man, I don't remember.
2009-11-23 pix
179
** Submit patch to closure-html to add (enable/disable-reader) functions
13:13:22 '
180
** non-css templates (e.g., for matching on text of element)?
2009-12-13 pix
181
Maybe special-case string/regexp-templates, so for example
2009-12-28 pix
182
: #t(html ("div" (#t(regexp "f(o+)bar") . ?div)))
2009-12-13 pix
183
would match [<div>foooobar</div>]?
05:32:46 '
184
2009-12-28 pix
185
: #t(html ("div" . #t(regexp "f(o+)bar" (?o))))
2009-12-13 pix
186
might cause some difficulty, however--we should get a list of matched elements
05:32:46 '
187
for the div selector, but the regexp variable (?o) can only match once (without
'
188
some wacky environment merging, anyway).
2011-06-05 pix
189
** Element structure templates
21:44:21 '
190
For instance, sometimes it'd be nice to stuff the value of an attribute into a
'
191
variable, like so:
'
192
: (match #t(attr ("href" ?href) ("name" ?name)) "<a href='url' name='link'></a>"
'
193
: (values href name)) =>
'
194
: "url", "link"
'
195
While it's certainly easy enough to do that using, say, XMLS-style lists, a
'
196
general object-model-agnostic method would seem to be preferrable.
'
197
** Layered functions so LHTML vs. XMLS support can be switched at runtime