Modifying the tree

So, finally, we reach the part that processes the extended instructions into standard PHP. The source is located in files pht.h and pht.cpp

I won't comment on all the code, which it already has comments within the source files, just on a couple of methods which give a good taste of how it is done.

Tree_visitor and Tree_transform classes are provided by PHC. The first one allows the full tree to be visited in an ordered way. Each has methods from which to inherit, corresponding to each kind of node. While Tree_visitor allows to visit each node and modify its contents, Tree_transform allows you to further change the tree itself, not just the contents of each node. I use both since in a first pass I gather some information about each PHP function defined in the file to pre-compile while in the second I have to restructure the tree.

The Convert_xml_elements class is the main conversion class and it inherits from Tree_transform. A section of the declarations of the class is shown here:

class Convert_xml_elements : public Tree_transform {
private:
	enum State {
		ANY_TAG,
		START_TAG_OPEN,
		START_TAG_CLOSED,
		TAG_CLOSED,   // never used, when the tag is closed, the stack is popped so it is never stored
		PI_OPEN,  // processing instruction
		PI_CLOSED  // never used, when the tag is closed, the stack is popped so it is never stored
	};
	stack<State> s;
.....

public:
	void pre_xml_element(AST_xml_element* in, AST_statement_list* out);
	void post_xml_element(AST_xml_element* in, AST_statement_list* out);
.....
};

I'll just show the methods that process an xml_element, that is a < tag instruction. Tree_transform provides hooks to visit the node before visiting the nodes further down the tree and after visiting them. In both cases, a reference to the node representing the given node is the first argument. The second argument is not a node of the same kind but an AST_statement_list. This provides amazing flexibility because if nothing is copied from in to out, the node is deleted, if you just copy (actually, push into the list) the in node to the out list, nothing changes but you can also add any other valid statements to the out list, either before, after or instead of the in node.

Notice the stack<State> s; which keeps track of the state of the XML part. The AST tree fully represents the PHP code, but the structure of the XML part is not fully reflected on the tree and, what is there is lost while doing the transformation so this stack indicates the state of the underlying XML, in particular which kind of tag is it and whether we are within the start tag, fully within the element or out of it. Actually, when we are out of an element, we don't get to record that since once closed, we pop the stack and return to the previous state, so those two 'closed' states are formally there, but never used. PI stands for 'Processing Instruction'.

void Convert_xml_elements::pre_xml_element(AST_xml_element* in, AST_statement_list* out) {
	
	// close any previous open start tag
	if (s.top() == START_TAG_OPEN) close_start_tag(out);
	// push the new state of the tags
	s.push(START_TAG_OPEN);
	
	// The start tag is left open since attributes might follow.
	out->push_back(echo("<"));
	print_element_name(in->xml_element_name,out);

	out->push_back(in);
}

So, in this method we are pre-processing the node corresponding to an XML element. First we check the top of the stack and if we see the previous tag is still open (supposedly waiting for further attributes) we close it. The close_start_tag private method inserts into the out statement list an echo ">"; and updates the state at the top of the stack. Then we push the new state that we are about to enter and add a statement to echo a "<".  The function print_element_name deals with the instructions to output the element name.

Once that is finished, the in node is pushed into the out statement list. This leaves the tree a little redundant. It contains both the original in node as well as the instructions that should have replaced it. That will be taken care in the 'post' call, but the 'post' will only be called if the node still remains there, so we just leave it as a placeholder to ensure the post method will be called and then we take it out.

Here is the post method.

void Convert_xml_elements::post_xml_element(AST_xml_element* in, AST_statement_list* out) {
	
	// if the statement list was empty, it is an empty element, such as <br /> or <hr />
	// notice that if the statement_list originally contained xml attributes, these have been output as echoes
	// and taken out of the statement list.
	if (in->statements->size() == 0) {
		// if it is still open, close it
		if (s.top() == START_TAG_OPEN) {
			out->push_back(echo("/>"));
			s.pop();
			// no need to do anything else
			return;
		}
	// if the statement list is not empty, then append the statements to the initial echo of the start tag.
	} else {
		out->push_back_all(in->statements);
	}
	
	// just to be sure, there shouldn't be any start tag open this far, meaning something like <html  
	if (s.top() == START_TAG_OPEN) close_start_tag(out);
	// likewise, there should be a full tag to close, like <html>
	switch(s.top()) {
		case START_TAG_CLOSED:
			break;  // This is fine, nothing to do
		case ANY_TAG:
			printf("\n<!-- %s[%d]:  ***** Warning: there might not be a  tag to close -->\n", in->get_filename()->c_str(),in->get_line_number());
			break;
		default:
			printf("\n<!-- %s[%d]:  ***** Error: no  tag to close -->\n", in->get_filename()->c_str(),in->get_line_number());
	}
	// So, just close the tag and pop whatever flags were there
	// TODO warning, if the tag name is given as a variable, the value might have changed in the statements in between
	// and the closing tag would not match the initial one
	out->push_back(echo("</"));
	print_element_name(in->xml_element_name,out);
	out->push_back(echo(">"));
	s.pop();
}

Remember that the < instruction is followed by a tag name and then a statement. The statement can be an empty statement (a single semicolon), a simple statement or a compound statement enclosed by curly braces. They are all represented within the AST tree as a AST_statement_list which might be empty (size == 0) contain a single element or more than one. So, the first thing is to check if it was an empty statement. If it was and the start tag is still open, we close it with a "/>". That would be the case for a <br />. Remember that in the 'pre' method we left the in node in the tree, now since we learned it contains no further statements we need nothing else from it so we return after popping out from the stack the state of the element we just closed.

If the size of the statement list is not zero, we push the statements right behind the echos that we made in the 'pre' method. Those statements are in in->statements, a member of the AST_xml_element object created automatically from the phc.tea source. By now, these statements, if they have any extended instructions,  have already been converted.

If the start tag is still open, it needs to be closed, before the end tag. (for reasons a little long to write here, it never happens). On the other hand, if there is no start tag in the stack, it's an error. The stack can also indicate a state called ANY_TAG. This is because a PHP function might have to close a tag opened elsewhere before it was called. Every time the pre-compiler finds a function definition, it sets the stack to ANY_TAG, because we can't know what the stack might contain when that method is called and whatever it contains when it is defined it does not matter at all. The pre-compiler issues a warning which, at this moment, has simply a different text.

Finally, since we are done with this tag, we close the element by calling print_element_name surrounded by statements to echo "</" ">" and then we pop the state from the stack.

Notice that we have pushed several echos and the statements under the tag statement but not the node representing the tag statement itself, which had been left behind in the 'pre' method so it got effectively deleted from the tree.

The private method print_element_name is used by both the element and attribute methods since they have the same naming conventions.  If the element xml_namespace contains NULL, it means there isn't any.  If there is it will check the flag ns_is_var, if true, it will assemble the tree nodes to echo a variable with the name given, if it is not it will simply echo the text of the name and finally a colon. For the tag name, it does basically the same.

void Convert_xml_elements::print_element_name(AST_xml_element_name* name,AST_statement_list* out) {
	if (name->xml_namespace) {
		if (name->ns_is_var) {
			out->push_back(echo(new AST_variable(new Token_variable_name(name->xml_namespace->value))));
		} else {
			out->push_back(echo(name->xml_namespace->value));
		}
		out->push_back(echo(":"));
	}
	if (name->is_var) {
		out->push_back(echo(new AST_variable(new Token_variable_name(name->tag_name->value))));
	} else {
		out->push_back(echo(name->tag_name->value));
	}
}
< Previous: Building the tree

Up

Next: Escaped echo >