Sun Jul 3 08:25:45 UTC 2011 pix@kepibu.org
* Add support for XMLS-style lists, conflicting with LHTML-style lists
Sun Jul 3 07:55:18 UTC 2011 pix@kepibu.org
* Minimal support for attribute-starts-with selector
Sun Jun 5 21:44:21 UTC 2011 pix@kepibu.org
* Update notes file
Tue Apr 5 00:14:51 UTC 2011 pix@kepibu.org
* depend-on cl-unification-lib to work with stock cl-unification
Wed Feb 10 08:50:16 UTC 2010 pix@kepibu.org
* Add attribute-equal selector
Wed Feb 10 08:28:34 UTC 2010 pix@kepibu.org
* Add attribute-present selector
Wed Feb 10 08:27:56 UTC 2010 pix@kepibu.org
* Serialize returned tags so it's easier to see what was returned
Wed Feb 10 08:26:34 UTC 2010 pix@kepibu.org
* Formatting.
Wed Feb 10 08:26:25 UTC 2010 pix@kepibu.org
* Use named-readtables instead of set-dispatch-macro-character
Wed Feb 10 08:20:45 UTC 2010 pix@kepibu.org
* Return NIL if attribute was not present
Sun Feb 7 09:21:16 UTC 2010 pix@kepibu.org
* Update notes to reflect updates to cl-unification.
Mon Jan 4 07:11:36 UTC 2010 pix@kepibu.org
* element-parent now works in lhtml
Mon Jan 4 07:06:50 UTC 2010 pix@kepibu.org
* Support for asking about ancestors under lhtml
Mon Jan 4 06:58:51 UTC 2010 pix@kepibu.org
* Don't need &allow-other-key here
Mon Jan 4 06:36:34 UTC 2010 pix@kepibu.org
* Don't count an+b|b|odd|even as separate items
Mon Jan 4 06:32:27 UTC 2010 pix@kepibu.org
* :empty selector
Mon Jan 4 06:32:07 UTC 2010 pix@kepibu.org
* Add *of-type selectors
Mon Jan 4 05:59:48 UTC 2010 pix@kepibu.org
* "lispier" regexps, l*last-child stuff
Probably against best practices to commit monolithic patches, but this
is still an unreleased library, so I don't care.
Not really sure I care for the sexp-based regexps, but they do make it
easy to use the same regexp bits across several places, and I don't
have a lexer/parser handy, so they'll have to do for now.
Mon Jan 4 01:07:02 UTC 2010 pix@kepibu.org
* subject-p makes more sense as (selector, element)
For future reference, I used the following code to do this automatically, plus a
few minor manual edits (e.g., swapping rcurry and curry):
(defun seek-forward (term)
(let ((p (search-forward term nil t)))
(when p
(goto-char p))))
(defun swap-args ()
(interactive)
(save-excursion
(while (seek-forward "defmethod subject-p (")
(forward-sexp)
(transpose-sexps 1)))
(save-excursion
(while (seek-forward "(subject-p")
(forward-sexp)
(transpose-sexps 1))))
Mon Jan 4 01:04:12 UTC 2010 pix@kepibu.org
* Bring element-matches-p more in line with CSS terms as subject-p
Mon Jan 4 01:03:10 UTC 2010 pix@kepibu.org
* Make subjects-of use subjects-in-list
Mon Jan 4 00:11:25 UTC 2010 pix@kepibu.org
* Rename some functions to better match CSS terminology
Sat Jan 2 09:45:37 UTC 2010 pix@kepibu.org
* Add fixme
Sat Jan 2 08:38:38 UTC 2010 pix@kepibu.org
* &allow-other-keys is not actually necessary
Fri Jan 1 05:06:19 UTC 2010 pix@kepibu.org
* Patch went in to cl-unification, so no longer need warning
Mon Dec 28 10:00:30 UTC 2009 pix@kepibu.org
* Another nth-last-child
Mon Dec 28 09:59:18 UTC 2009 pix@kepibu.org
* Minor syntactic changes
To make more modern org-modes happy. Woo.
Sun Dec 20 08:23:57 UTC 2009 pix@kepibu.org
* Update notes
Sun Dec 13 07:28:56 UTC 2009 pix@kepibu.org
* Export xml, too
Sun Dec 13 05:32:46 UTC 2009 pix@kepibu.org
* Add some notes
Sun Dec 13 05:24:52 UTC 2009 pix@kepibu.org
* Add element-content as a prereq to matching on an element's textual content
Sun Dec 13 05:23:23 UTC 2009 pix@kepibu.org
* Ugly unbreaking of lhtml--man I hate this bit
Sat Dec 5 07:23:38 UTC 2009 pix@kepibu.org
* Better method to do this in
Sat Dec 5 07:18:05 UTC 2009 pix@kepibu.org
* implicit-element is a better name than root
Also add a bit of support for sibling combinators when dealing with the
implicit element, and note a problem that crops up when dealing with
selections on a non-root element (should a simple-selector select the
element, or is there an implicit descendant combinator?).
Fri Dec 4 05:16:28 UTC 2009 pix@kepibu.org
* Fix an odd clisp compile issue
Fri Dec 4 04:47:58 UTC 2009 pix@kepibu.org
* Make descendant combinators work with an implicit parent
Thu Dec 3 03:26:59 UTC 2009 pix@kepibu.org
* declare ignored variables
Thu Dec 3 02:41:36 UTC 2009 pix@kepibu.org
* Better messages
Thu Dec 3 00:12:02 UTC 2009 pix@kepibu.org
* Add sibling and adjacent combinators
Thu Dec 3 00:07:44 UTC 2009 pix@kepibu.org
* Fix copy-paste issue.
Mon Nov 30 05:04:24 UTC 2009 pix@kepibu.org
* Record idea
Mon Nov 30 05:04:06 UTC 2009 pix@kepibu.org
* Work for spaces between [+-] and B
Mon Nov 30 04:48:22 UTC 2009 pix@kepibu.org
* Combine nth-child variants
Mon Nov 30 04:10:09 UTC 2009 pix@kepibu.org
* Fix bug in element-children for sgml:pt model
Mon Nov 30 04:09:10 UTC 2009 pix@kepibu.org
* Add :nth-child selector
Mon Nov 23 13:25:26 UTC 2009 pix@kepibu.org
* Whoops
Mon Nov 23 13:19:59 UTC 2009 pix@kepibu.org
* Fail unification if no match for a selector
Mon Nov 23 13:14:45 UTC 2009 pix@kepibu.org
* add FIXME test
Mon Nov 23 13:14:00 UTC 2009 pix@kepibu.org
* Error when unable to parse CSS selector
Mon Nov 23 13:13:22 UTC 2009 pix@kepibu.org
* Start documentation
Mon Nov 23 13:02:12 UTC 2009 pix@kepibu.org
* Import when-let*, too
Mon Nov 23 13:01:50 UTC 2009 pix@kepibu.org
* Indentation
Mon Nov 23 11:54:01 UTC 2009 pix@kepibu.org
* Cut out a few warnings from cl-unification
Mon Nov 23 11:38:12 UTC 2009 pix@kepibu.org
* No longer needed
Mon Nov 23 11:36:20 UTC 2009 pix@kepibu.org
* Don't return a dom:document as parent
Mon Nov 23 11:33:15 UTC 2009 pix@kepibu.org
* :first-child and :nth-child(n) selectors
Mon Nov 23 10:24:02 UTC 2009 pix@kepibu.org
* Status commit; fix unification
Sat Nov 21 18:31:09 UTC 2009 pix@kepibu.org
* Tired, probably doing stupid things
Sat Nov 21 16:12:13 UTC 2009 pix@kepibu.org
* Status commit
Fri Nov 20 13:09:18 UTC 2009 pix@kepibu.org
* Add fixme
Thu Nov 19 06:25:36 UTC 2009 pix@kepibu.org
* Moar CSS selectors, fewer explicit lambdas
Wed Nov 18 10:25:48 UTC 2009 pix@kepibu.org
* Try to set a sensible default for *default-parser*
Wed Nov 18 10:23:22 UTC 2009 pix@kepibu.org
* Add notes file
Wed Nov 18 10:23:05 UTC 2009 pix@kepibu.org
* Status commit; split to avoid absolute dependency on cxml and closure-html
Wed Nov 18 08:57:44 UTC 2009 pix@kepibu.org
* status commit; add cxml:dom support
diff -rN -u old-Oh, Ducks!/chtml.lisp new-Oh, Ducks!/chtml.lisp
--- old-Oh, Ducks!/chtml.lisp 1970-01-01 00:00:00.000000000 +0000
+++ new-Oh, Ducks!/chtml.lisp 2013-07-29 07:25:25.000000000 +0000
@@ -0,0 +1,24 @@
+(in-package #:oh-ducks)
+
+;; avoid conflicting with 'sgml:pt
+(eval-when (:compile-toplevel :load-toplevel :execute)
+ (import 'closure-html:pt))
+(eval-when (:compile-toplevel :load-toplevel :execute)
+ (export 'pt)
+ (export 'lhtml))
+
+(defclass html-template (css-selector-template) ())
+
+(add-handler 'pt 'chtml:make-pt-builder)
+(add-handler 'lhtml 'chtml:make-lhtml-builder)
+
+(unless *default-parser*
+ (setf *default-parser* (rcurry #'chtml:parse (get-handler-for-model 'pt))))
+
+(defmethod make-template ((kind (eql 'html)) (spec cons))
+ (destructuring-bind (&key parser model)
+ (append (when (%spec-includes-opts spec) (second spec))
+ (list :model 'pt))
+ (make-instance 'html-template
+ :parser (or parser (rcurry #'chtml:parse (get-handler-for-model model)))
+ :spec spec)))
diff -rN -u old-Oh, Ducks!/cxml.lisp new-Oh, Ducks!/cxml.lisp
--- old-Oh, Ducks!/cxml.lisp 1970-01-01 00:00:00.000000000 +0000
+++ new-Oh, Ducks!/cxml.lisp 2013-07-29 07:25:25.000000000 +0000
@@ -0,0 +1,20 @@
+(in-package #:oh-ducks)
+
+(eval-when (:compile-toplevel :load-toplevel :execute)
+ (export 'dom)
+ (export 'xml))
+
+(defclass xml-template (css-selector-template) ())
+
+(add-handler 'dom 'cxml-dom:make-dom-builder)
+
+(unless *default-parser*
+ (setf *default-parser* (rcurry #'cxml:parse (get-handler-for-model 'dom))))
+
+(defmethod make-template ((kind (eql 'xml)) (spec cons))
+ (destructuring-bind (&key parser model)
+ (append (when (%spec-includes-opts spec) (second spec))
+ (list :model 'dom))
+ (make-instance 'xml-template
+ :parser (or parser (rcurry #'cxml:parse (get-handler-for-model model)))
+ :spec spec)))
diff -rN -u old-Oh, Ducks!/notes new-Oh, Ducks!/notes
--- old-Oh, Ducks!/notes 1970-01-01 00:00:00.000000000 +0000
+++ new-Oh, Ducks!/notes 2013-07-29 07:25:25.000000000 +0000
@@ -0,0 +1,197 @@
+#-*-mode: org;-*-
+* Purpose
+"Oh, Ducks!" is an extension to cl-unification to make parsing
+structured documents easy, using CSS selectors.
+* Installation
+** Prerequisites
+ + cl-unification
+ + cl-ppcre
+ + split-sequence
+ + alexandria
+ + asdf-system-connections
+ * closure-html
+ * cxml
+ * named-readtables
+[+] Mandatory [*] Optional
+** Loading
+Loading "Oh, Ducks!" is just like loading any other ASDF system.
+However, because it does not mandate a particular HTML or XML parser,
+it does not generally become useful until you have also loaded an
+HTML/XML parsing library such as cxml or closure-html.
+
+Start with:
+: (asdf:oos 'asdf:load-op :oh-ducks)
+If you would like to use the built-in support for parsing via
+closure-html (which you almost certainly do), you'll also want to load
+closure-html:
+: (asdf:oos 'asdf:load-op :closure-html)
+And, if you want to use DOM objects provided by cxml:
+: (asdf:oos 'asdf:load-op :cxml)
+** Load-order Caveats
+closure-html and cl-unification each define competing readers on #t.
+To avoid load-order issues resulting in an indeterminate reader on #t,
+you'll probably want to add
+: #.(set-dispatch-macro-character #\# #\T 'unify::|sharp-T-reader|)
+or
+: (unify:enable-template-reader)
+or
+: (named-readtables:in-readtable unify:template-readtable)
+to the top of any file which uses cl-unification's reader templates.
+(The latter two currently only work if you have cl-unification from my
+darcs repo.)
+
+Please feel free to submit patches to closure-html and cl-unification
+to fix this problem.
+** Depending Upon in ASDF Systems
+It doesn't take long before managing your dependencies upon ASDF
+systems becomes easiest by creating an ASDF system for whatever
+project you're currently engaged in. It's important to note that, in
+addition to depending upon oh-ducks, you'll also want to depend upon
+whichever library provides your desired object model and parser.
+
+For example,
+: :depends-on (:oh-ducks :closure-html :cxml)
+** Differentiating between LHTML lists and XMLS lists
+While it would, in theory, be possible to inspect lists and determine if they
+are LHTML or XMLS lists, this is not currently done. You can, however, choose
+which type you'd like to work with by pushing =:lists-are-xmls= or
+=:lists-are-lhtml= to =*features*= before loading "Oh, Ducks!".
+
+Unfortunately, this means you can only expect to use one list type in a single
+lisp image. Patches to either automagically detect the list type, or to provide
+layered functions are welcome.
+* Usage
+The combination of oh-ducks and closure-html provides an HTML template
+for use with cl-unification, and has the following syntax:
+
+ (match (#t(html [(:model <model>)]
+ <selectors>+)
+ <document>)
+ &body)
+ selectors := (<selector> . <binding>) |
+ (<selector> . <template>) |
+ (<selector> <selectors>+)
+ document := <parsed-document> | <document-to-be-parsed>
+
+:model is only necessary for unparsed documents (e.g., a pathname or string).
+
+** Examples
+
+(match (#T(html (:model lhtml)
+ ("#id" . ?div))
+ "<div id=\"id\">I <i>like</i> cheese.</div>")
+ (car div)) =>
+ (:div ((:id "id")) "I " (:i () "like") " cheese.")
+
+(match (#T(html (:model dom)
+ ("i" . #t(list ?j ?i))
+ ("span>i" . ?span))
+ "<div>I do <i>not</i> like cheese.</div><div><span>I like <i>cheese</i>.</span></div>")
+ (values i span)) =>
+ #<ELEMENT i "not">,
+ (#<ELEMENT i "cheese">)
+
+** Selectors
+The goal is to support all CSS-level-3 selectors. See the section
+[[*improve selector support][To Do > Improve Selector Support]] for a list of currently unsupported
+simple selectors and combinators.
+
+Each selector should result in the same elements which would be
+affected by the same CSS selector. That is,
+ #id => elements with id of "id"
+ .foo.bar => elements with both "foo" and "bar" classes
+ div => all <div>s
+and so forth.
+
+NOTE: selectors are currently bound in parallel. That is, given
+ #t(html (<selector-1> ...)
+ (<selector-2> ...))
+selector-1 and selector-2 do not interact. If they are both "foo", they'll
+return identical results. I often find myself wanting to also say something
+like:
+ #t(html (<selector-1> ...)
+ (<element-after-selector-1> ...))
+Ideas for a syntax to distinguish between the two cases are welcome (:mode
+parallel) vs (:mode sequential), perhaps? (Or even adjacent, sibling?)
+
+*** Limitations
+
+Currently, selector terms are limited to alphanumeric characters, and
+do not support CSS-style character escapes. Patches welcome!
+
+** Included Object Models
+*** LHTML (closure-html)
+A list-based structure provided by closure-html. Cannot be used by
+selectors which require asking about parent or sibling objects.
+*** PT (closure-html)
+A structure-based structure provided by closure-html.
+*** DOM (cxml)
+DOM objects as provided by cxml and defined by the W3C.
+* Extending
+** Adding an object model
+While the supported models should generally be sufficient, you can add
+your own fairly easily. All models are expected to implement the
+generic functions in <traversal/interface.lisp>. See the other files
+under the traversal/ directory for examples.
+
+You might also want to see chtml.lisp and cxml.lisp.
+** Adding a selector or combinator
+see <selectors.lisp>. Generally, you should add a class which is a
+subclass of combinator or simple-selector, augment parse-selector with
+an appropriate regular expression, and define a method on
+subject-p.
+
+I also recommend submitting a patch. Other people might want to use
+that selector, too!
+* To Do
+** working lhtml/xmls support [2/2]
+ * [X] non-descendant cases (class, id, etc.)
+ * [X] selectors involving descendants
+ CAUTION: Won't produce sane results if the document tree is
+ modified or you use nested (match)es.
+** write documentation
+** improve selector support
+*** positional selectors [11/11]
+ * [X] :nth-child
+ * [X] :nth-last-child
+ * [X] :first-child
+ * [X] :last-child
+ * [X] :nth-of-type
+ * [X] :nth-last-of-type
+ * [X] :first-of-type
+ * [X] :last-of-type
+ * [X] :only-child
+ * [X] :only-of-type
+ * [X] :empty
+*** attribute selectors [2/7]
+ * [X] attribute-present [att]
+ * [X] attribute-equal [att=val]
+ * [ ] attribute-member [att~=val]
+ * [ ] attribute-lang [att|=val]
+ * [ ] attribute-begins [att^=val]
+ * [ ] attribute-ends [att$=val]
+ * [ ] attribute-contains [att*=val]
+*** :not(...)
+*** any others?
+** namespace support(?)
+** Submit patch to cl-unification to add (enable/disable-template-reader) functions
+Submitted. Was it ever accepted? Man, I don't remember.
+** Submit patch to closure-html to add (enable/disable-reader) functions
+** non-css templates (e.g., for matching on text of element)?
+Maybe special-case string/regexp-templates, so for example
+: #t(html ("div" (#t(regexp "f(o+)bar") . ?div)))
+would match [<div>foooobar</div>]?
+
+: #t(html ("div" . #t(regexp "f(o+)bar" (?o))))
+might cause some difficulty, however--we should get a list of matched elements
+for the div selector, but the regexp variable (?o) can only match once (without
+some wacky environment merging, anyway).
+** Element structure templates
+For instance, sometimes it'd be nice to stuff the value of an attribute into a
+variable, like so:
+: (match #t(attr ("href" ?href) ("name" ?name)) "<a href='url' name='link'></a>"
+: (values href name)) =>
+: "url", "link"
+While it's certainly easy enough to do that using, say, XMLS-style lists, a
+general object-model-agnostic method would seem to be preferrable.
+** Layered functions so LHTML vs. XMLS support can be switched at runtime
diff -rN -u old-Oh, Ducks!/oh-ducks.asd new-Oh, Ducks!/oh-ducks.asd
--- old-Oh, Ducks!/oh-ducks.asd 2013-07-29 07:25:25.000000000 +0000
+++ new-Oh, Ducks!/oh-ducks.asd 2013-07-29 07:25:25.000000000 +0000
@@ -1,26 +1,48 @@
+#+(or fixme todo)
+(cerror "Continue anyway."
+ "The author of \"Oh, ducks!\" tends to use #+FIXME and #+TODO to ~
+ mark things as being in-progress. At least one of these exists ~
+ in *features*, which may cause unusual behavior.")
+
+(eval-when (:compile-toplevel :load-toplevel :execute)
+ (asdf:operate 'asdf:load-op 'asdf-system-connections))
+
(defpackage #:oh-ducks.system
(:use #:cl #:asdf))
(in-package #:oh-ducks.system)
-(asdf:defsystem oh-ducks
+(defsystem oh-ducks
:version "0"
:description "cl-unification templates using CSS-style selectors"
:maintainer "pinterface <pix@kepibu.org>"
:author "pinterface <pix@kepibu.org>"
:licence "BSD-style"
- ;; TODO: submit a patch for cl-unification to use
- ;; asdf-system-connections. Getting an unmodified version of
- ;; cl-unification to load the cl-ppcre stuff is a PITA.
- :depends-on (:cl-unification :cl-ppcre :cxml :closure-html :split-sequence)
+ :depends-on (:cl-unification-lib :cl-unification :cl-ppcre :split-sequence :alexandria)
:serial t
- ;; FIXME: ordering
:components ((:file "package")
(:file "regexp-template")
- #+(or) (:file "tests")
- (:module traversal
- :components
- ((:file "interface")
- (:file "lhtml" :depends-on ("interface"))
- (:file "pt" :depends-on ("interface"))))
+ (:module "traversal"
+ :components ((:file "interface")))
(:file "selectors")
- (:file "unification-templates")))
+ (:file "templates")
+ (:file "unify")
+ #+FIXME (:file "tests")))
+
+(defsystem-connection ducks+closure-html
+ :requires (:oh-ducks :closure-html)
+ :components ((:file "chtml")
+ (:module "traversal"
+ :components (#-lists-are-xmls (:file "lhtml")
+ (:file "pt")))))
+
+(defsystem-connection ducks+cxml
+ :requires (:oh-ducks :cxml)
+ :components ((:file "cxml")
+ (:module "traversal"
+ :components ((:file "dom")
+ #-lists-are-lhtml (:file "xmls")))))
+
+;; In case you're wondering, we check the inverse of the :lists-are-* keywords
+;; so, in the event you only load cxml (or chtml), and don't specify which
+;; format lists are expected to take, you get the appropriate list operation by
+;; default.
diff -rN -u old-Oh, Ducks!/package.lisp new-Oh, Ducks!/package.lisp
--- old-Oh, Ducks!/package.lisp 2013-07-29 07:25:25.000000000 +0000
+++ new-Oh, Ducks!/package.lisp 2013-07-29 07:25:25.000000000 +0000
@@ -1,3 +1,27 @@
+(defpackage #:oh-ducks.functional
+ (:import-from #:alexandria . #1=(
+ #:compose
+ #:curry
+ #:rcurry
+ #:when-let*))
+ (:export . #1#))
+
+(defpackage #:oh-ducks.traversal
+ (:use #:cl #:oh-ducks.functional)
+ (:export #:element-children
+ #:element-parent
+ #:element-attribute
+ #:element-type
+ #:element-content
+
+ #:element-id
+ #:element-classes
+ #:element-type-equal
+ #:element-ancestors))
+
(defpackage #:oh-ducks
- (:use #:cl #:unify)
- (:export #:lhtml))
+ (:use #:cl #:unify #:oh-ducks.functional #:oh-ducks.traversal)
+ (:export ;; template machinery
+ #:*default-parser*
+ #:html
+ ))
diff -rN -u old-Oh, Ducks!/regexp-template.lisp new-Oh, Ducks!/regexp-template.lisp
--- old-Oh, Ducks!/regexp-template.lisp 2013-07-29 07:25:25.000000000 +0000
+++ new-Oh, Ducks!/regexp-template.lisp 2013-07-29 07:25:25.000000000 +0000
@@ -31,11 +31,18 @@
(declare (ignore re-kwd))
(make-instance 'unify::regular-expression-template
:spec (list* 'unify::regexp
- (concatenate 'string "^(.*?)" regexp "$")
+ (cond
+ ((stringp regexp)
+ (concatenate 'string "^(.*?)" regexp "$"))
+ ((listp regexp)
+ `(:sequence :start-anchor
+ (:register (:non-greedy-repetition 0 nil :everything))
+ ,@regexp
+ :end-anchor))
+ (t (error "Unknown regexp format.")))
(append '(?&rest) vars)
keys))))
-
;; (match (#t(regexp+ "^f(o+)" (?o)) "fooooooobar") (values o &rest))
;; => "ooooooo", "bar"
diff -rN -u old-Oh, Ducks!/selectors.lisp new-Oh, Ducks!/selectors.lisp
--- old-Oh, Ducks!/selectors.lisp 2013-07-29 07:25:25.000000000 +0000
+++ new-Oh, Ducks!/selectors.lisp 2013-07-29 07:25:25.000000000 +0000
@@ -1,49 +1,8 @@
-#||
-Okay, here's how I figure selectors should work:
-* breadth-first traversal through the document
-* collect nodes (elements) which match the selector(s)
-
-Matching selectors:
-- The original plan was to start with the first selector in our
- list and work our way into the document.
-- Another plan might be to start with the last selector in our
- list and work our way up the document tree.
-- Yet another option would be to utilize the recursive structure
- of the document in our search, keeping track of which nodes
- match which selectors as we traverse into the document.
- Though, by that description, I'm not sure I'm clever enough to
- actually make it work.
-We have to work our way through the entire document structure
-anyway, which means starting from the outside and working our way
-in won't gain us any efficiency, as I had originally thought.
-
-For example, given a structure of
- (html
- (body
- (p ((class "foo")) "text")
- (p () (span ((class "bar")) "more text"))))
-and a selector of
- html p>span.bar
-we would walk the document tree asking first
- "Does this element have class 'bar'?"
-and only if that is true, continuing to ask
- "Is this a 'span' element?"
- "Is this element a child of a 'p' element?"
- "Is that 'p' element a descendant of an 'html' element?"
-
-I note, however, that a fully-reversed ordering should not be strictly
-necessary--we really only need reverse at the combinators. So we
-could also ask:
- "Is this a 'span' element?"
- "Is it of the 'bar' class?"
- "Is it a child of a 'p' element?"
- "Is that 'p' element a descendant of an 'html' element?"
-
-Hrm... how does ScrAPI do this? Or any of the other projects which
-offer element selection by CSS selector?
-||#
(in-package #:oh-ducks)
+(defvar *implicit-element* nil
+ "The element to be considered as an implicit element to be matched by combinators without a leading qualifier. E.g., \"> a\" will match <a> tags directly under *implicit-element*, and \"+ a\" will match <a> tags directly following *implicit-element*.")
+
#.(set-dispatch-macro-character #\# #\T 'unify::|sharp-T-reader|)
(defclass selector (unify::string-template)
@@ -66,67 +25,306 @@
(:method ((ob t)) nil))
(defmethod print-object ((selector combinator) stream)
- (format stream "#<combinator>"))
+ (format stream "#<~s ~s>" (class-name (class-of selector)) (matcher selector)))
(defclass child-combinator (combinator) ())
(defclass descendant-combinator (combinator) ())
(defclass adjacent-combinator (combinator) ())
(defclass sibling-combinator (combinator) ())
-#+FIXME ; is this the right name?
(defclass universal-selector (simple-selector) ())
(defclass type-selector (simple-selector) ())
(defclass id-selector (simple-selector) ())
(defclass class-selector (simple-selector) ())
+(defclass nth-child-selector (simple-selector) ())
+(defclass nth-last-child-selector (nth-child-selector) ())
+(defclass nth-of-type-selector (nth-child-selector) ())
+(defclass nth-last-of-type-selector (nth-of-type-selector) ())
+(defclass empty-selector (simple-selector) ())
+
+(defclass attribute-selector (simple-selector)
+ ((val :reader attribute-value :initarg :value)))
+(defclass attribute-present-selector (attribute-selector) ())
+(defclass attribute-equal-selector (attribute-selector) ())
+(defclass attribute-starts-with-selector (attribute-selector) ())
+
+(defmethod initialize-instance :after ((selector nth-child-selector)
+ &key (asign "+") a
+ (bsign "+") b
+ namedp)
+ (setf (slot-value selector 'arg)
+ (if namedp
+ (cons 2 (if (string-equal "odd" b) 1 0))
+ (cons (parse-integer (format nil "~a~a" asign (or a 1)))
+ (parse-integer (format nil "~a~a" bsign (or b 0)))))))
+
+(defmethod print-object ((selector universal-selector) stream)
+ (format stream "#<universal-selector>"))
(defmethod initialize-instance :after ((template combinator) &key)
(unless (slot-boundp template 'matcher)
(let ((selector (template-spec template)))
(setf (slot-value template 'matcher) (parse-selector (string-trim " " selector))))))
+(defclass %implicit-element-selector (selector) ())
+(defparameter %implicit-element-selector (make-instance '%implicit-element-selector))
+
+(defmethod print-object ((selector %implicit-element-selector) stream)
+ (print-unreadable-object (selector stream :type t)))
+
+(cl-ppcre:define-parse-tree-synonym \s*
+ (:non-greedy-repetition 0 nil :whitespace-char-class))
+(cl-ppcre:define-parse-tree-synonym \s+
+ (:greedy-repetition 1 nil :whitespace-char-class))
+(cl-ppcre:define-parse-tree-synonym sign
+ (:char-class #\+ #\-))
+(cl-ppcre:define-parse-tree-synonym sign?
+ (:greedy-repetition 0 1 sign))
+(cl-ppcre:define-parse-tree-synonym integer
+ (:greedy-repetition 1 nil :digit-class))
+(cl-ppcre:define-parse-tree-synonym name
+ (:greedy-repetition 1 nil (:char-class :word-char-class #\-)))
+(cl-ppcre:define-parse-tree-synonym $name
+ (:register name))
+(cl-ppcre:define-parse-tree-synonym an+b
+ (:sequence
+ (:register sign?) (:greedy-repetition 0 1 (:register integer))
+ #\n \s*
+ (:register sign?) \s* (:greedy-repetition 0 1 (:register integer))))
+(cl-ppcre:define-parse-tree-synonym b
+ (:register (:sequence sign? integer)))
+(cl-ppcre:define-parse-tree-synonym odd/even
+ (:register (:alternation "odd" "even")))
+
+;; FIXME: proper parsing (e.g., by using the W3C's provided FLEX and YACC bits).
(defun parse-selector (selector)
(match-case (selector)
;; combinators
- (#T(regexp$ "[ ]*[>][ ]*" ()) (list (make-instance 'child-combinator :matcher (parse-selector &rest))))
- (#T(regexp$ "[ ]+" ()) (list (make-instance 'descendant-combinator :matcher (parse-selector &rest))))
- ;; simple selector
- (#T(regexp$ "[#](\\w+)" (?id)) (cons (make-instance 'id-selector :arg id) (parse-selector &rest)))
- (#T(regexp$ "[\\.](\\w+)" (?class)) (cons (make-instance 'class-selector :arg class) (parse-selector &rest)))
- (#T(regexp$ "(\\w+)" (?type)) (cons (make-instance 'type-selector :arg type) (parse-selector &rest)))
- #+(or)
- (#T(regexp$ "\\*" ()) (cons (make-instance 'universal-selector) (parse-selector &rest)))))
+ (#T(regexp$ (\s* #\~ \s*) ())
+ (list (make-instance 'sibling-combinator :matcher (or (parse-selector &rest) %implicit-element-selector))))
+ (#T(regexp$ (\s* #\+ \s*) ())
+ (list (make-instance 'adjacent-combinator :matcher (or (parse-selector &rest) %implicit-element-selector))))
+ (#T(regexp$ (\s* #\> \s*) ())
+ (list (make-instance 'child-combinator :matcher (or (parse-selector &rest) %implicit-element-selector))))
+ (#T(regexp$ (\s+) ())
+ (list (make-instance 'descendant-combinator :matcher (or (parse-selector &rest) %implicit-element-selector))))
+ ;; simple selectors
+ ;; attribute selectors
+ (#T(regexp$ ("[" $name "]") (?attribute))
+ (cons (make-instance 'attribute-present-selector :arg attribute)
+ (parse-selector &rest)))
+ (#T(regexp$ ("[" $name "=" $name "]") (?attribute ?value))
+ (cons (make-instance 'attribute-equal-selector :arg attribute :value value)
+ (parse-selector &rest)))
+ (#T(regexp$ ("[" $name "^=" $name "]") (?attribute ?value))
+ (cons (make-instance 'attribute-starts-with-selector :arg attribute :value value)
+ (parse-selector &rest)))
+ ;; cyclic (An+B, n+B)
+ (#T(regexp$ (":nth-child(" \s* an+b \s* ")")
+ (?asign ?a ?bsign ?b))
+ (cons (make-instance 'nth-child-selector
+ :asign asign :a a
+ :bsign bsign :b b)
+ (parse-selector &rest)))
+ (#T(regexp$ (":nth-last-child(" \s* an+b \s* ")")
+ (?asign ?a ?bsign ?b))
+ (cons (make-instance 'nth-last-child-selector
+ :asign asign :a a
+ :bsign bsign :b b)
+ (parse-selector &rest)))
+ (#T(regexp$ (":nth-of-type(" \s* an+b \s* ")")
+ (?asign ?a ?bsign ?b))
+ (cons (make-instance 'nth-of-type-selector
+ :asign asign :a a
+ :bsign bsign :b b)
+ (parse-selector &rest)))
+ (#T(regexp$ (":nth-last-of-type(" \s* an+b \s* ")")
+ (?asign ?a ?bsign ?b))
+ (cons (make-instance 'nth-last-of-type-selector
+ :asign asign :a a
+ :bsign bsign :b b)
+ (parse-selector &rest)))
+ ;; absolute (B)
+ (#T(regexp$ (":nth-child(" \s* b \s* ")")
+ (?b))
+ (cons (make-instance 'nth-child-selector :a 0 :b b)
+ (parse-selector &rest)))
+ (#T(regexp$ (":nth-last-child(" \s* b \s* ")")
+ (?b))
+ (cons (make-instance 'nth-last-child-selector :a 0 :b b)
+ (parse-selector &rest)))
+ (#T(regexp$ (":nth-of-type(" \s* b \s* ")")
+ (?b))
+ (cons (make-instance 'nth-of-type-selector :a 0 :b b)
+ (parse-selector &rest)))
+ (#T(regexp$ (":nth-last-of-type(" \s* b \s* ")")
+ (?b))
+ (cons (make-instance 'nth-last-of-type-selector :a 0 :b b)
+ (parse-selector &rest)))
+ ;; named (odd, even)
+ (#T(regexp$ (":nth-child(" \s* odd/even \s* ")")
+ (?which))
+ (cons (make-instance 'nth-child-selector :namedp t :b which)
+ (parse-selector &rest)))
+ (#T(regexp$ (":nth-last-child(" \s* odd/even \s* ")")
+ (?which))
+ (cons (make-instance 'nth-last-child-selector :namedp t :b which)
+ (parse-selector &rest)))
+ (#T(regexp$ (":nth-of-type(" \s* odd/even \s* ")")
+ (?which))
+ (cons (make-instance 'nth-of-type-selector :namedp t :b which)
+ (parse-selector &rest)))
+ (#T(regexp$ (":nth-last-of-type(" \s* odd/even \s* ")")
+ (?which))
+ (cons (make-instance 'nth-last-of-type-selector :namedp t :b which)
+ (parse-selector &rest)))
+ ;; Everybody else
+ (#T(regexp$ (":first-child") ())
+ (cons (make-instance 'nth-child-selector :a 0 :b 1)
+ (parse-selector &rest)))
+ (#T(regexp$ (":last-child") ())
+ (cons (make-instance 'nth-last-child-selector :a 0 :b 1)
+ (parse-selector &rest)))
+ (#T(regexp$ (":only-child") ())
+ (list* (make-instance 'nth-child-selector :a 0 :b 1)
+ (make-instance 'nth-last-child-selector :a 0 :b 1)
+ (parse-selector &rest)))
+ (#T(regexp$ (":first-of-type") ())
+ (cons (make-instance 'nth-of-type-selector :a 0 :b 1)
+ (parse-selector &rest)))
+ (#T(regexp$ (":last-of-type") ())
+ (cons (make-instance 'nth-last-of-type-selector :a 0 :b 1)
+ (parse-selector &rest)))
+ (#T(regexp$ (":only-of-type") ())
+ (list* (make-instance 'nth-of-type-selector :a 0 :b 1)
+ (make-instance 'nth-last-of-type-selector :a 0 :b 1)
+ (parse-selector &rest)))
+ (#T(regexp$ (":empty") ())
+ (cons (make-instance 'empty-selector) (parse-selector &rest)))
+ (#T(regexp$ (#\# $name) (?id))
+ (cons (make-instance 'id-selector :arg id) (parse-selector &rest)))
+ (#T(regexp$ (#\. $name) (?class))
+ (cons (make-instance 'class-selector :arg class) (parse-selector &rest)))
+ (#T(regexp$ ($name) (?type))
+ (cons (make-instance 'type-selector :arg type) (parse-selector &rest)))
+ (#T(regexp$ (#\*) ())
+ (cons (make-instance 'universal-selector) (parse-selector &rest)))
+ (t (unless (string= selector "")
+ (error "Unable to to parse selector: ~s" selector)))))
+
+(defun subjects-in-list (selector element-list)
+ (reduce #'nconc
+ (mapcar (curry #'subjects-of selector)
+ element-list)))
-(defgeneric find-matching-elements (selector elements))
-
-(defmethod find-matching-elements (selector (elements list))
+(defun subjects-of (selector element)
(nconc
- (remove-if-not (lambda (el) (element-matches-p el selector)) elements)
- (reduce #'nconc
- (remove-if #'null
- (mapcar (lambda (element) (find-matching-elements selector (element-children element)))
- elements)))))
-
-(defmethod find-matching-elements (selector (elements t))
- (find-matching-elements selector (list elements)))
+ (when (subject-p selector element) (list element))
+ (subjects-in-list selector (element-children element))))
-(defgeneric element-matches-p (element selector))
+(defgeneric subject-p (selector element))
-(defmethod element-matches-p (element (selector type-selector))
+(defmethod subject-p ((selector type-selector) element)
(element-type-equal element (selector-arg selector)))
-(defmethod element-matches-p (element (selector id-selector))
+(defmethod subject-p ((selector id-selector) element)
(string= (element-id element) (selector-arg selector)))
-(defmethod element-matches-p (element (selector class-selector))
- (member (selector-arg selector)
- (element-classes element)
- :test #'string=))
+(defun an+b? (a b element siblings)
+ (when-let* ((pos (1+ (position element siblings :test #'eq))))
+ ;; pos = An + B
+ (cond
+ ;; pos = 0n + B
+ ((= 0 a) (= b pos))
+ ;; (pos - B)/A = n
+ (t (and (zerop (mod (- pos b) a))
+ (not (minusp (/ (- pos b) a))))))))
+
+(defmethod subject-p ((selector nth-child-selector) element)
+ (when-let* ((arg (selector-arg selector))
+ (parent (element-parent element)))
+ (an+b? (car arg) (cdr arg) element (element-children parent))))
+
+(defmethod subject-p ((selector nth-last-child-selector) element)
+ (when-let* ((arg (selector-arg selector))
+ (parent (element-parent element)))
+ (an+b? (car arg) (cdr arg) element (reverse (element-children parent)))))
+
+(defmethod subject-p ((selector nth-of-type-selector) element)
+ (when-let* ((arg (selector-arg selector))
+ (parent (element-parent element)))
+ (an+b? (car arg) (cdr arg) element
+ (remove-if-not (rcurry #'element-type-equal (element-type element))
+ (element-children parent)))))
+
+(defmethod subject-p ((selector nth-last-of-type-selector) element)
+ (when-let* ((arg (selector-arg selector))
+ (parent (element-parent element)))
+ (an+b? (car arg) (cdr arg) element
+ (reverse
+ (remove-if-not (rcurry #'element-type-equal (element-type element))
+ (element-children parent))))))
-(defmethod element-matches-p (element (selector list))
- (every (lambda (s) (element-matches-p element s)) selector))
+(defmethod subject-p ((selector empty-selector) element)
+ (= 0 (length (element-children element))))
-(defmethod element-matches-p (element (selector child-combinator))
- (element-matches-p (element-parent element) (matcher selector)))
+(defmethod subject-p ((selector class-selector) element)
+ (member (selector-arg selector)
+ (element-classes element)
+ :test #'string=))
-(defmethod element-matches-p (element (selector descendant-combinator))
- (some (lambda (a) (element-matches-p a (matcher selector))) (element-ancestors element)))
+(defmethod subject-p ((selector universal-selector) element)
+ (declare (ignore element selector))
+ t)
+
+(defmethod subject-p ((selector attribute-present-selector) element)
+ (element-attribute (selector-arg selector) element))
+
+(defmethod subject-p ((selector attribute-equal-selector) element)
+ (when-let* ((val (element-attribute (selector-arg selector) element)))
+ (string= val (attribute-value selector))))
+
+(defmethod subject-p ((selector attribute-starts-with-selector) element)
+ (when-let* ((val (element-attribute (selector-arg selector) element)))
+ (alexandria:starts-with-subseq (string-downcase (attribute-value selector)) (string-downcase val))))
+
+(defmethod subject-p ((selector %implicit-element-selector) element)
+ (eq element *implicit-element*))
+
+(defmethod subject-p ((selector list) element)
+ (every (rcurry #'subject-p element) selector))
+
+(defmethod subject-p ((selector child-combinator) element)
+ (subject-p (matcher selector) (element-parent element)))
+
+(defmethod subject-p ((selector descendant-combinator) element)
+ (some (curry #'subject-p (matcher selector)) (element-ancestors element)))
+
+(defmethod subject-p ((selector adjacent-combinator) element)
+ (let* ((parent (element-parent element))
+ (siblings (element-children parent))
+ (ourpos (position element siblings :test #'eq)))
+ (and ourpos
+ (> ourpos 0)
+ (subject-p (matcher selector) (elt siblings (1- ourpos))))))
+
+(defmethod subject-p ((selector sibling-combinator) element)
+ (let* ((parent (element-parent element))
+ (siblings (element-children parent))
+ (ourpos (position element siblings :test #'eq)))
+ (and ourpos
+ (> ourpos 0)
+ (find-if (curry #'subject-p (matcher selector)) siblings :end ourpos))))
+
+;; Hello excessively long name
+(defun terminating-implicit-sibling-combinator-p (selector)
+ (typecase selector
+ ((or sibling-combinator adjacent-combinator)
+ (typecase (matcher selector)
+ (%implicit-element-selector t)
+ (list (terminating-implicit-sibling-combinator-p (car (last (matcher selector)))))))
+ (combinator (terminating-implicit-sibling-combinator-p (matcher selector)))
+ (selector nil)
+ (null nil)
+ (list (terminating-implicit-sibling-combinator-p (car (last selector))))
+ (t nil)))
diff -rN -u old-Oh, Ducks!/templates.lisp new-Oh, Ducks!/templates.lisp
--- old-Oh, Ducks!/templates.lisp 1970-01-01 00:00:00.000000000 +0000
+++ new-Oh, Ducks!/templates.lisp 2013-07-29 07:25:25.000000000 +0000
@@ -0,0 +1,59 @@
+(in-package #:oh-ducks)
+
+(defclass css-selector-template (unify::expression-template)
+ ((parser :initarg :parser :initform nil) ;; subtype generally determines parser
+ (specifiers :reader specifiers) ;; list of (specifier . variable) and (specifier . template)
+ ))
+
+(defvar *model-handler-map* nil "A mapping between model types and handler functions.")
+(defun add-handler (model handler)
+ (push (cons model handler) *model-handler-map*))
+(defun get-handler-for-model (model)
+ (let ((handler (cdr (assoc model *model-handler-map*))))
+ (typecase handler
+ (null nil)
+ (function (funcall handler))
+ (symbol (funcall (symbol-function handler)))
+ (t handler))))
+
+(defvar *default-parser* nil "Determines the default parser when none is specified.")
+
+(defun %spec-includes-opts (spec)
+ (keywordp (first (second spec))))
+
+(defun combine-selectors (selector parent)
+ (let ((combinator (car (last selector))))
+ (cond
+ ((null parent)
+ selector)
+ ((combinator-p combinator)
+ (setf (slot-value combinator 'matcher) parent)
+ selector)
+ (t
+ (nconc selector (list (make-instance 'descendant-combinator :matcher parent)))))))
+
+(defun parse-specifiers (specs template parent)
+ (loop :for (css-specifier . rest) :in specs
+ :for selector = (combine-selectors (parse-selector css-specifier) parent)
+ :collect (cons selector
+ (cond
+ ((unify::template-p rest) rest)
+ ((unify::variablep rest) rest)
+ ((consp rest)
+ (make-instance (class-of template)
+ :spec (list* (first (template-spec template)) rest)
+ :css-specifiers rest
+ :parent selector))))))
+
+(defmethod initialize-instance :after ((template css-selector-template) &key css-specifiers parent)
+ (let* ((spec (template-spec template))
+ (specifiers-and-vars (or css-specifiers (if (%spec-includes-opts spec)
+ (cddr spec)
+ (rest spec)))))
+ (setf (slot-value template 'specifiers)
+ (parse-specifiers specifiers-and-vars template parent))))
+
+;; Don't bother trying to save :parser when compiling
+(defmethod make-load-form ((object css-selector-template) &optional env)
+ (declare (ignore env))
+ `(make-template ',(first (template-spec object)) ',(template-spec object)))
diff -rN -u old-Oh, Ducks!/tests.lisp new-Oh, Ducks!/tests.lisp
--- old-Oh, Ducks!/tests.lisp 2013-07-29 07:25:25.000000000 +0000
+++ new-Oh, Ducks!/tests.lisp 2013-07-29 07:25:25.000000000 +0000
@@ -1,39 +1,169 @@
(in-package #:oh-ducks)
+(named-readtables:in-readtable template-readtable)
;; FIXME: the switch to chtml:pt nodes means our #'equalp no longer
;; works.
+#+(or) (setq *default-parser* 'pt)
+
(equalp '(:div ((:id "id")) "I " (:i () "like") " cheese.")
- (match (#T(html ("#id" . ?div))
- "<div id=\"id\">I <i>like</i> cheese.</div>")
+ (match (#T(html (:model lhtml) ("#id" . ?div))
+ "<div id=\"id\">I <i>like</i> cheese.</div>")
;; FIXME: learn to distinguish between when there should only be one
;; result and when there should be many?
(car div)))
(equalp '((:div ((:class "red fish")) "one fish")
(:div ((:class "blue fish")) "two fish"))
- (match (#T(html (".fish" . ?divs)
+ (match (#T(html (:model lhtml)
+ (".fish" . ?divs)
(".pig" . ?pig))
"<div class='pig'>bricklayer</div><div class='red fish'>one fish</div><div class='blue fish'>two fish</div>")
;; pig doesn't affect the equalp...but does show separate things are separate
(values divs pig)))
(equalp '((:i () "not") (:i () "cheese"))
- (match (#T(html ("div" ("i" . ?i)))
- "<div>I do <i>not</i> like cheese.</div><div>I like <i>cheese</i>.</div>")
+ (match (#T(html (:model lhtml)
+ ("div" ("i" . ?i)))
+ "<div>I do <i>not</i> like cheese.</div><div>I like <i>cheese</i>.</div>")
i))
(equalp '((:i () "not"))
- (match (#T(html ("div>i" . ?i))
- "<div>I do <i>not</i> like cheese.</div><div><span>I like <i>cheese</i>.</span></div>")
+ (match (#T(html (:model lhtml)
+ ("div>i" . ?i))
+ "<div>I do <i>not</i> like cheese.</div><div><span>I like <i>cheese</i>.</span></div>")
i))
(equalp '((:i () "not"))
- (match (#T(html ("div" ("> i" . ?i)
+ (match (#T(html (:model lhtml)
+ ("div" (">i" . ?i)
+ ;("i" . #t(list ?j ?i))
("span>i" . ?span)))
- "<div>I do <i>not</i> like cheese.</div><div><span>I like <i>cheese</i>.</span></div>")
+ "<div>I do <i>not</i> like cheese.</div><div><span>I like <i>cheese</i>.</span></div>")
(values i span)))
-#+LATER
-(match (#t(lhtml ("div::content" . #t(regexp+ "^f(o+)" (?o))))
+(defun make-dom-document (child-node)
+ (make-instance 'rune-dom::document :children (rune-dom::make-node-list (list child-node))))
+
+(defun serialize (object)
+ (let ((document
+ (etypecase object
+ (rune-dom::document object)
+ (rune-dom::element (make-dom-document object))
+ (chtml:pt object)
+ (list object))))
+ (etypecase document
+ (rune-dom::document
+ (dom:map-document (cxml:make-string-sink :omit-xml-declaration-p t)
+ document))
+ (chtml:pt
+ (chtml:serialize-pt document (chtml:make-string-sink)))
+ (list (mapcar #'serialize document)))))
+
+(defmacro serialize-values (form)
+ `(let ((values (multiple-value-list ,form)))
+ (values-list (mapcar #'serialize values))))
+
+(equal '("<i>cheese</i>" "<i>cheese</i>")
+ (serialize-values
+ (match (#T(html (:model dom)
+ ("i" . #t(list ?j ?i))
+ ("span>i" . ?span))
+ "<div>I do <i>not</i> like cheese.</div><div><span>I like <i>cheese</i>.</span></div>")
+ (values i span))))
+
+(serialize-values
+ (match (#T(html (:model dom)
+ ("div:first-child" . ?div)
+ ("i:nth-child(1)" . ?i))
+ "<div>I do <i>not</i> <i>like</i> cheese.</div><div><span>I like <i>cheese</i>.</span></div>")
+ (values div i)))
+
+(serialize-values
+ (match (#T(html (:model dom)
+ ("div:nth-last-child(1)" . ?div)
+ ("div:last-child" . ?d2))
+ "<div>I do <i>not</i> <i>like</i> cheese.</div><div><span>I like <i>cheese</i>.</span></div>")
+ (values div d2)))
+
+(serialize-values
+ (match (#t(html (:model dom)
+ (":nth-last-of-type(2)" . ?first)
+ (":nth-of-type(2)" . ?last))
+ "<div><span>1</span><i>i</i><span>2</span><i>i</i></div>")
+ (values first last)))
+
+(match (#T(html (:model dom)
+ ("q" . ?div))
+ "<div>I do <i>not</i> <i>like</i> cheese.</div><div><span>I like <i>cheese</i>.</span></div>")
+ (values div))
+
+;; throws 'unification-failure
+(serialize-values
+ (match (#T(html (:model dom)
+ ("i:only-child" . ?i)
+ ("i:only-of-type" . ?i-type))
+ "<div>I do <i>not</i> <i>like</i> cheese.</div><div><span><i>I</i> like <i>cheese</i>.</span></div>")
+ (values i i-type)))
+
+(serialize-values
+ (match (#T(html (:model dom)
+ ("b + i" . ?i))
+ "<div>I <b>really</b> <i>like</i> cheese. Do you not <i>dislike</i> cheese?</div>")
+ (values i)))
+
+(serialize-values
+ (match (#T(html (:model dom)
+ ("b ~ i" . ?i))
+ "<div>I <i>really</i> <b>like</b> cheese. Do you <i>not</i> <i>dislike</i> cheese?</div>")
+ (values i)))
+
+(serialize-values
+ (match (#T(html (:model pt)
+ ("body :empty" . ?empty))
+ "<div><p><br></p><p>testing<i>i</i>testing</p></div>")
+ (values empty)))
+
+;; Sometimes, you want to match a thing inside a thing, in which case
+;; combinators should implicitly assume an unspecified right side means
+;; "whatever element I gave you".
+(serialize-values
+ (match (#T(html (:model dom)
+ ("q" . ?q))
+ "<div><i>ham</i> foo <q>bar <i>baz</i></q> quuz <i>spam</i></div>")
+ (match (#t(html ("> i" . ?i))
+ (first q))
+ i)))
+
+;; siblings will also match, thanks to a bit of ugly code
+(serialize-values
+ (match (#T(html (:model dom)
+ ("q" . ?q))
+ "<div><i>ham</i> foo <q>bar <i>baz</i></q> quuz <i>spam</i><q></q><i>not match</i></div>")
+ (match (#t(html ("+ i" . ?i))
+ (first q))
+ i)))
+
+(serialize-values
+ (match (#T(html (:model dom)
+ ("q" . ?q))
+ "<div> foo <q>outer q <i>baz <q>inner q</q></i></q> quuz</div>")
+ (match (#t(html ("q" . ?i))
+ (first q))
+ i)))
+
+(serialize-values
+ (match (#T(html (:model dom)
+ ("[id]" . ?ids))
+ "<div><i id=''>blank id</i>foo<b>no id</b>bar<i id='id'>id id</i></div>")
+ ids))
+
+(serialize-values
+ (match (#T(html (:model dom)
+ ("[id=foo]" . ?id))
+ "<div><i id='bar'>bar id</i><i>no id</i><i id='foo'>foo id</i></div>")
+ id))
+
+#+LATER?
+(match (#t(html ("div::content" . #t(regexp+ "^f(o+)" (?o))))
"<div>barbaz</div><div>fooooooobar</div>")
(values o &rest))
diff -rN -u old-Oh, Ducks!/traversal/dom.lisp new-Oh, Ducks!/traversal/dom.lisp
--- old-Oh, Ducks!/traversal/dom.lisp 1970-01-01 00:00:00.000000000 +0000
+++ new-Oh, Ducks!/traversal/dom.lisp 2013-07-29 07:25:25.000000000 +0000
@@ -0,0 +1,45 @@
+(in-package #:oh-ducks.traversal)
+
+(defmethod unify::occurs-in-p ((var symbol) (pat dom:element) env)
+ (declare (ignore var pat env))
+ nil)
+
+(defmethod unify:unify ((template oh-ducks::css-selector-template)
+ (document dom:document)
+ &optional (env (unify:make-empty-environment))
+ &key)
+ (unify:unify template (dom:document-element document) env))
+
+;;; general accessors
+
+(defmethod element-children ((element dom:element))
+ (remove-if-not #'dom:element-p (coerce (dom:child-nodes element) 'list)))
+
+(defmethod element-parent ((element dom:element))
+ (let ((parent (dom:parent-node element)))
+ (unless (typep parent 'dom:document)
+ parent)))
+
+(defmethod element-attribute ((attribute symbol) (element dom:element))
+ (element-attribute (string-downcase (symbol-name attribute)) element))
+(defmethod element-attribute ((attribute string) (element dom:element))
+ (when-let* ((attribute-node (dom:get-attribute-node element attribute)))
+ (dom:value attribute-node)))
+
+(defmethod element-type ((element dom:element))
+ (dom:tag-name element))
+
+(defmethod element-content ((element dom:element))
+ (mapcar (lambda (node)
+ (typecase node
+ (dom:element node)
+ (dom:text (dom:data node))
+ (t (error "Unsure what to do."))))
+ (coerce (dom:child-nodes element) 'list)))
+
+;;; special accessors in case something special needs to happen
+(defmethod element-id ((element dom:element))
+ (element-attribute "id" element))
+
+(defmethod element-classes ((element dom:element))
+ (split-sequence:split-sequence #\Space (element-attribute "class" element) :remove-empty-subseqs t))
diff -rN -u old-Oh, Ducks!/traversal/interface.lisp new-Oh, Ducks!/traversal/interface.lisp
--- old-Oh, Ducks!/traversal/interface.lisp 2013-07-29 07:25:25.000000000 +0000
+++ new-Oh, Ducks!/traversal/interface.lisp 2013-07-29 07:25:25.000000000 +0000
@@ -1,32 +1,39 @@
;;;; type-defines-accessors
;;;; Under this implementation strategy, elements would need only implement
;;;; accessors for traversing the node graph.
-(in-package #:oh-ducks)
+(in-package #:oh-ducks.traversal)
;;; general accessors
(defgeneric element-children (element)
- (:documentation "Returns a sequence of element's element-children."))
+ (:documentation "Returns a sequence of element's child tags."))
(defgeneric element-parent (element)
- (:documentation "Returns element's element-parent element."))
-(defgeneric element-attribute (element-attribute element)
- (:documentation "Returns the value of the element-attribute of element, or nil if no such element-attribute exists."))
+ (:documentation "Returns element's parent element."))
+(defgeneric element-attribute (attribute element)
+ (:documentation "Returns the value of the attribute of element, or nil if no such attribute exists."))
(defgeneric element-type (element)
- (:documentation "Returns the tag name (element-type) of element."))
+ (:documentation "Returns the tag name (type) of element."))
+(defgeneric element-content (element)
+ (:documentation "Returns a string containing the contents of the element, if it contains only textual nodes, or a sequence containing all of the element's child nodes (textual nodes as strings, tag nodes as whatever they'd be under #'element-children).")
+ (:method :around ((element t))
+ (let ((val (call-next-method)))
+ (if (every #'stringp val)
+ (reduce (curry #'concatenate 'string) val)
+ val))))
;;; special accessors in case something special needs to happen
(defgeneric element-id (element)
- (:documentation "Equivalent in spirit to (element-attribute :element-id element).")
+ (:documentation "Equivalent in spirit to (element-attribute :id element).")
(:method (element) (element-attribute :id element)))
(defgeneric element-classes (element)
- (:documentation "Equivalent in spirit to (element-attribute :class element), except it returns a sequence of individual element-classes.")
+ (:documentation "Equivalent in spirit to (element-attribute :class element), except it returns a sequence of individual classes.")
(:method (element)
(split-sequence:split-sequence #\Space (element-attribute :class element) :remove-empty-subseqs t)))
(defgeneric element-type-equal (element type)
- (:documentation "Equivalent in spirit to (string-equal (element-type element) element-type), but not obligated to work under the assumption of string-designators.")
+ (:documentation "Equivalent in spirit to (string-equal (element-type element) type), but not obligated to work under the assumption of string-designators.")
(:method (element type) (string-equal type (element-type element))))
(defgeneric element-ancestors (element)
diff -rN -u old-Oh, Ducks!/traversal/lhtml.lisp new-Oh, Ducks!/traversal/lhtml.lisp
--- old-Oh, Ducks!/traversal/lhtml.lisp 2013-07-29 07:25:25.000000000 +0000
+++ new-Oh, Ducks!/traversal/lhtml.lisp 2013-07-29 07:25:25.000000000 +0000
@@ -1,32 +1,54 @@
;;; WARNING: lhtml will conflict with any handler which also uses lists.
;;; xmls, for instance (though I think that's at least
;;; structurally compatible). Sorry, but that's the way it goes.
-(in-package #:oh-ducks)
+(in-package #:oh-ducks.traversal)
+
+(defvar *lhtml-family-tree* nil)
+
+(defun in-hash (key hash)
+ (multiple-value-bind (val present-p) (gethash key hash)
+ (declare (ignore val))
+ present-p))
+
+(defun %mark-parents (parent children)
+ (dolist (item children)
+ (setf (gethash item *lhtml-family-tree*) parent)
+ (%mark-parents item (element-children item))))
+
+;; WARNING: This won't produce sane results for nested (match)es, because we
+;; have no way to bind in a large enough scope.
+(defmethod unify:unify ((template oh-ducks::css-selector-template)
+ (element list)
+ &optional (env (unify:make-empty-environment))
+ &key)
+ (if (and *lhtml-family-tree*
+ (in-hash element *lhtml-family-tree*))
+ (call-next-method)
+ (let ((*lhtml-family-tree* (make-hash-table :test 'eq)))
+ (%mark-parents nil (list element))
+ (%mark-parents element (element-children element))
+ (call-next-method))))
;;; general accessors
(defmethod element-children ((element list))
- (cddr element))
+ (remove-if-not (lambda (x) (and (listp x) (keywordp (car x))))
+ (cddr element)))
-;; FIXME: bleh... may not even be worth trying to support this
-#+FIXME
(defmethod element-parent ((element list))
- (let ((parent (car *ancestors*)))
- (if (some (alexandria:curry #'eq element) (element-children parent))
+ (multiple-value-bind (parent present?)
+ (gethash element *lhtml-family-tree*)
+ (if present?
parent
(error "unable to determine parent"))))
-(defmethod element-parent ((element list))
- (error "cannot get parent"))
-#+FIXME
-(defmethod element-ancestors ((element list))
- *ancestors*)
-(defmethod element-ancestors ((element list))
- (error "cannot get ancestors"))
-
-(defmethod element-attribute ((element-attribute symbol) (element list))
- (cadr (assoc element-attribute (cadr element))))
-(defmethod element-attribute ((element-attribute string) (element list))
- (element-attribute (intern (string-upcase element-attribute) :keyword) element))
+
+(defmethod element-attribute ((attribute symbol) (element list))
+ (cadr (assoc attribute (cadr element))))
+(defmethod element-attribute ((attribute string) (element list))
+ (element-attribute (intern (string-upcase attribute) :keyword) element))
(defmethod element-type ((element list))
(car element))
+
+(defmethod element-content ((element list))
+ (cddr element))
diff -rN -u old-Oh, Ducks!/traversal/pt.lisp new-Oh, Ducks!/traversal/pt.lisp
--- old-Oh, Ducks!/traversal/pt.lisp 2013-07-29 07:25:25.000000000 +0000
+++ new-Oh, Ducks!/traversal/pt.lisp 2013-07-29 07:25:25.000000000 +0000
@@ -1,18 +1,32 @@
-(in-package #:oh-ducks)
+(in-package #:oh-ducks.traversal)
+
+(defmethod unify::occurs-in-p ((var symbol) (pat chtml:pt) env)
+ (declare (ignore var pat env))
+ nil)
;;; general accessors
(defmethod element-children ((element chtml:pt))
- (chtml:pt-children element))
+ (remove-if (compose (rcurry #'member '(:pcdata :comment) :test #'eq) #'chtml:pt-name)
+ (chtml:pt-children element)))
(defmethod element-parent ((element chtml:pt))
(chtml:pt-parent element))
(defmethod element-attribute ((element-attribute symbol) (element chtml:pt))
- (unless (eq :pcdata (chtml:pt-name element))
- (getf (chtml:pt-attrs element) element-attribute)))
+ (getf (chtml:pt-attrs element) element-attribute))
(defmethod element-attribute ((element-attribute string) (element chtml:pt))
(element-attribute (intern (string-upcase element-attribute) :keyword) element))
(defmethod element-type ((element chtml:pt))
(chtml:pt-name element))
+
+(defmethod element-content ((element chtml:pt))
+ (mapcar (lambda (node)
+ (cond
+ ((eq :pcdata (chtml:pt-name node))
+ (chtml:pt-attrs node))
+ (t node)))
+ (remove-if (curry #'eq :comment)
+ (chtml:pt-children element)
+ :key #'chtml:pt-name)))
diff -rN -u old-Oh, Ducks!/traversal/xmls.lisp new-Oh, Ducks!/traversal/xmls.lisp
--- old-Oh, Ducks!/traversal/xmls.lisp 1970-01-01 00:00:00.000000000 +0000
+++ new-Oh, Ducks!/traversal/xmls.lisp 2013-07-29 07:25:25.000000000 +0000
@@ -0,0 +1,58 @@
+;;; WARNING: This conflicts with lhtml.
+(in-package #:oh-ducks.traversal)
+
+(defvar *xmls-family-tree* nil)
+
+(defun in-hash (key hash)
+ (multiple-value-bind (val present-p) (gethash key hash)
+ (declare (ignore val))
+ present-p))
+
+(defun %mark-parents (parent children)
+ (dolist (item children)
+ (setf (gethash item *xmls-family-tree*) parent)
+ (%mark-parents item (element-children item))))
+
+;; WARNING: This won't produce sane results for nested (match)es, because we
+;; have no way to bind in a large enough scope.
+(defmethod unify:unify ((template oh-ducks::css-selector-template)
+ (element list)
+ &optional (env (unify:make-empty-environment))
+ &key)
+ (if (and *xmls-family-tree*
+ (in-hash element *xmls-family-tree*))
+ (call-next-method)
+ (let ((*xmls-family-tree* (make-hash-table :test 'eq)))
+ (%mark-parents nil (list element))
+ (%mark-parents element (element-children element))
+ (call-next-method))))
+
+(defmethod unify:unify ((document list) (template oh-ducks::css-selector-template)
+ &optional (env (unify:make-empty-environment))
+ &key)
+ (unify:unify template document env))
+
+;;; general accessors
+
+(defmethod element-children ((element list))
+ (remove-if-not (lambda (x) (and (listp x) (stringp (car x))))
+ (cddr element)))
+
+(defmethod element-parent ((element list))
+ (multiple-value-bind (parent present?)
+ (gethash element *xmls-family-tree*)
+ (if present?
+ parent
+ (error "unable to determine parent"))))
+
+#+(or)
+(defmethod element-attribute ((attribute symbol) (element list))
+ (cadr (assoc attribute (cadr element))))
+(defmethod element-attribute ((attribute string) (element list))
+ (cadr (assoc attribute (cadr element) :test #'string=)))
+
+(defmethod element-type ((element list))
+ (car element))
+
+(defmethod element-content ((element list))
+ (cddr element))
diff -rN -u old-Oh, Ducks!/unification-templates.lisp new-Oh, Ducks!/unification-templates.lisp
--- old-Oh, Ducks!/unification-templates.lisp 2013-07-29 07:25:25.000000000 +0000
+++ new-Oh, Ducks!/unification-templates.lisp 1970-01-01 00:00:00.000000000 +0000
@@ -1,88 +0,0 @@
-(in-package #:oh-ducks)
-;; FIXME: rather than having separate
-;; #t(pt-html ...)
-;; #t(lhtml ...)
-;; etc.
-;; syntaxes for every possible parser, have a single
-;; #t(html [(:parser parser-function)] ...)
-;; which uses the value of :parser to handle parsing. Or, if no
-;; parser is specified, requires an already-parsed document be passed
-;; in.
-
-(defclass css-selector-template (unify::expression-template)
- (#+(or)
- (parser :reader parser) ;; subtype determines parser
- (handler :reader handler) ;; cxml/closure-html handler
- (specifiers :reader specifiers) ;; list of (specifier . variable) and (specifier . template)
- ))
-
-(defclass xml-template (css-selector-template) ()) ;; parses using closure-xml
-
-(defclass html-template (css-selector-template) ()) ;; parses using closure-html
-
-(defclass lhtml-template (html-template) ())
-(defclass pt-template (html-template) ())
-
-(defmethod make-template ((kind (eql 'lhtml)) (spec cons))
- (make-instance 'lhtml-template :spec spec))
-
-(defmethod make-template ((kind (eql 'html)) (spec cons))
- (make-instance 'pt-template :spec spec))
-
-(defmethod initialize-instance :after ((template css-selector-template) &key css-specifiers parent &allow-other-keys)
- (let ((specifiers-and-vars (or css-specifiers (rest (template-spec template)))))
- (setf (slot-value template 'specifiers)
- (parse-specifiers specifiers-and-vars template parent))))
-
-(defun combine-selectors (selector parent)
- (let ((combinator (car (last selector))))
- (cond
- ((null parent)
- selector)
- ((combinator-p combinator)
- (setf (slot-value combinator 'matcher) parent)
- selector)
- (t
- (nconc selector (list (make-instance 'descendant-combinator :matcher parent)))))))
-
-(defun parse-specifiers (specs template parent)
- (loop :for (css-specifier . rest) :in specs
- :for selector = (combine-selectors (parse-selector css-specifier) parent)
- :collect (cons selector
- (cond
- ((unify::template-p rest) rest)
- ((unify::variablep rest) rest)
- ((consp rest)
- (make-instance (class-of template)
- :spec (list* (first (template-spec template)) rest)
- :css-specifiers rest
- :parent selector))))))
-
-(defmethod unify ((a css-selector-template) (b css-selector-template)
- &optional (env (make-empty-environment))
- &key &allow-other-keys)
- (error 'unification-failure
- :format-control "Do not know how to unify the two css-selector-templates ~S and ~S."
- :format-arguments (list a b)))
-
-(defmethod unify ((template css-selector-template) document
- &optional (env (make-empty-environment))
- &key &allow-other-keys)
- (loop :for (css-specifier . template) :in (specifiers template)
- :do
- (let ((val (find-matching-elements css-specifier document)))
- (cond
- ((unify::template-p template) (unify template val env))
- ((unify::variablep template) (unify::extend-environment template val env))
- (t (error "whoops: ~s, ~s" css-specifier template)))))
- env)
-
-(defmethod unify ((template lhtml-template) (document string)
- &optional (env (make-empty-environment))
- &key &allow-other-keys)
- (unify template (chtml:parse document (chtml:make-lhtml-builder)) env))
-
-(defmethod unify ((template pt-template) (document string)
- &optional (env (make-empty-environment))
- &key &allow-other-keys)
- (unify template (chtml:parse document (chtml:make-pt-builder)) env))
diff -rN -u old-Oh, Ducks!/unify.lisp new-Oh, Ducks!/unify.lisp
--- old-Oh, Ducks!/unify.lisp 1970-01-01 00:00:00.000000000 +0000
+++ new-Oh, Ducks!/unify.lisp 2013-07-29 07:25:25.000000000 +0000
@@ -0,0 +1,69 @@
+(in-package #:oh-ducks)
+
+(defmethod unify ((a css-selector-template) (b css-selector-template)
+ &optional (env (make-empty-environment))
+ &key)
+ (declare (ignore env))
+ (error 'unification-failure
+ :format-control "Do not know how to unify the two css-selector-templates ~S and ~S."
+ :format-arguments (list a b)))
+
+(defmethod unify ((template css-selector-template) document
+ &optional (env (make-empty-environment))
+ &key)
+ (declare (optimize debug))
+ (loop :for (css-specifier . template) :in (specifiers template)
+ :do (typecase template
+ ;; CSS selectors work backwards, not forwards
+ (css-selector-template
+ (unify template document env))
+ (t
+ (let* ((*implicit-element* document)
+ ;; FIXME: this is UGLY!
+ (val (cond
+ ((terminating-implicit-sibling-combinator-p css-specifier)
+ ;; search remaining siblings
+ (subjects-in-list
+ css-specifier
+ (rest
+ (member document
+ (when-let* ((parent (element-parent document)))
+ (element-children parent))
+ :test #'eq))))
+ ;; search subelements
+;;; FIXME: this assumes if someone passes us a node they want to find
+;;; subelements of that node. In the case of nested matches, that's probably
+;;; true, but it hardly seems fair to assume it. Really we want some sort of
+;;; descendant combinator to be sure, but the general one (#\Space) doesn't
+;;; exactly show up all that well. Somebody might assume " b" was the same as
+;;; "b" and get confused.
+ ((element-parent document)
+ (subjects-in-list css-specifier (element-children document)))
+ ;; root element includes itself
+ (t (subjects-of css-specifier document)))))
+ (cond
+ ((null val)
+ (error 'unification-failure
+ :format-control "Unable to unify ~s and ~s"
+ :format-arguments (list css-specifier template)))
+ ((unify::template-p template)
+ (unify template val env))
+ ((unify::variablep template)
+ (unify::var-unify template val env))
+ (t (error "Don't know what to do with selector ~s and template ~s." css-specifier template)))))))
+ env)
+
+(defmethod unify (document (template css-selector-template)
+ &optional (env (make-empty-environment))
+ &key)
+ (unify template document env))
+
+(defmethod unify ((template css-selector-template) (document string)
+ &optional (env (make-empty-environment))
+ &key)
+ (unify template (funcall (slot-value template 'parser) document) env))
+
+(defmethod unify ((template css-selector-template) (document pathname)
+ &optional (env (make-empty-environment))
+ &key)
+ (unify template (funcall (slot-value template 'parser) document) env))