After we succeeded parse the text element of HTML, we move on to a more complex element - comments.
Comments in HTML are interesting because they end with three characters: -->
. That is, if we read from the stream char-by-char instead of a line, then we would have a rare fun with looking forward two characters. Now we can just restore the pointer to the current character.
Creating a comment node is very simple:
(defun make-comment-node ()
(make-instance 'comment-node))
So to parse the comment first thing, what’s the first thing? The piano! Just kidding First of all, remember the position of index
in the string to be rolled back (oldindex
).
The beginning of the comment is easy to define — just the sequence: ["!--" something]
1, but with the final ->
it’s not so good.
We can not use the sequence [$@(any-text ch) "-->"]
, because a repeated comparison with any character $@(any-text ch)
will simply absorb the entire string without giving a chance to detect -->
.
The repetitive alternative ${"-->" @(any-text ch)}
is also not an option: although we are now able to detect the end of the comment, but we can not quit the repetition.
To work the comparison with -->
should not work That is, by finding -->
we remember the fact of detection in the variable eoc-found
2 and say that the comparison failed !nil
. Next, we will consume all the characters in a row only if -->
has not been found.
(parse-comment ()
"<!-- ??? -->"
(let (ch eoc-found (oldindex index))
(or (and (matchit
["!--"
{${ ["-->" !(setf eoc-found t) !nil]
[!(not eoc-found) @(any-text ch)]
} !eoc-found}])
(make-comment-node))
(progn (setf index oldindex) nil))))
Let’s check, replacing the call to parse-tex
in parse-html
with call to (cons (parse-comment) (princ index)))
:
* (ql:quickload 'toy-engine)
To load "toy-engine":
Load 1 ASDF system:
toy-engine
; Loading "toy-engine"
(TOY-ENGINE)
* (in-package :toy-engine)
#<PACKAGE "TOY-ENGINE">
* (defparameter *str* "!-- ''' This is a text< kj-- -> --> 123")
*STR*
* (length *str*)
42
* (parse-html *str*)
38
(#<COMMENT-NODE {1005025E03}> . 38)
* (pp->dot "comment-node.dot" (lambda () (pp-dom (car *))))
"}"
*
As you can see from index = 38
, the parser correctly absorbed the entire inside of the comment.