I also wanted some more formal check on the XML produced. How could I
integrate the definition of an XML file into this? An XML Schema file
describes what an XML file can and cannot have. So, I added (not really in this
release) one more
declarative instruction, XMLSchema
:
XMLSchema [<path to .xsd file>] [path <simple xPath expression> (, ....)*] [ignore case]
The XMLSchema
declaration tells the pre-compiler the schema file to check
against. Thus, as the tags and attributes are being parsed, the
pre-compiler can check that they follow the schema. At some point I will have to add
a DTD instruction to point to DTDs or rely on a DTD to XSD translator, of which
there are many.
The XMLSchema
declaration can be either a separate statement, usually
included at the beginning of the file or, at least, before the first tag is
output or as a modifier after a function declaration. Functions within a
program usually generate just segments of XML, not full documents, for example a table row,
thus you wouldn't expect a function to start by the root element but
would, instead, want to declare that it generates a <tr>
tag.
You indicate that by using the path
clause. This declaration serves two purposes, to check that the tags generated within
the function are of the right types and that the function is used within the
enclosing source where a <tr>
is expected. Several xPaths
can be
declared since a single function can generate, for example, either a <td>
or a <th>
tag, and they would both be acceptable. The xPath
expression will often be partial, for example, "td", "th"
,
not a full path from the document root, since a tag might be embedded an arbitrary number of levels within optional
tags.
The <path to .xsd file>
may be omitted if the path
is given, meaning
that the global schema is used, which is usually the case for functions. A
source file may look like this:
<?php XMLSchema "my/own/html.xsd" ignore case; function tableRow($record) XMLSchema path 'tr' { <tr { <td ? $record['fieldn']; // other columns } } <html { <body { &onLoad = 'setTimeout("updateClock()",1000)'; // ... etc // query the database <table { while ($record = mysql_fetch_assoc($recordSet)) { tableRow($record); } } } } ?>
In this example, the whole PHP page is expected to follow an imaginary
schema for HTML. The function tableRow
is just expected to produce a single
row out of the same schema ignoring upper or lower case, just as in the global
declaration. That function is used further down, and it has to be used
within a <table>
tag or the pre-compiler will complain.
So far, the examples given deal mostly with a sequential flow through the
program. Things start getting interesting with include files, conditionals
and loops. What is sequential in the source file might not be so at execution time,
for example, though the else
part of an if()
statement is
sequentially after the then
part source wise, they are at the same
level at
execution time. The number of occurrences is also important, a tag within an
if()
block must be declared with a minOccurs=0
attribute in the schema, while a tag within a while()
loop must also have a
maxOccurs=unbounded
while a missing or minOccurs=1
is fine for a
do --- while()
loop.
I doubt it will make sense to produce more than
warning messages since the final outcome of a piece of code cannot be guessed
from the source. It wouldn't make sense to abort the pre-compiler because
a tag produced within a while()
loop was not declared with a minOccurs=0
since
you might already know that the loop will always execute at least once.
Of course, none of this checking is possible if the tag or attribute name is given as an expression, since its value can seldom be predicted by analyzing the source code, one more reason to discourage generic expressions on tag names.
It is often the practice to have a couple of functions or small include files
to generate the header and trailer of a document. If it is HTML, a header()
function or include file would provide the beginning of the document up to the <body>
tag, while a footer()
function would take from the </body>
to the end of the document. These two functions would violate the block
structure. Nevertheless, solving this problem is not hard, either
you call a function which provides the enclosing sections, passing a function
providing the body contents as a callback function or, if programming OO-style,
you would inherit from a Document
class extending the
bodyContents()
method. In either case, the block structure is
preserved.
< Previous: Development |