Primitive operations revisited
delete nodes
Syntax:
delete node location
delete nodes location
The expression location
represents a sequence of nodes which are marked for deletion (the actual number of nodes does not need to match the keyword node or nodes).
insert nodes
Syntax:
insert (node|nodes) items into location
insert (node|nodes) items as first into location
insert (node|nodes) items as last into location
insert (node|nodes) items before location
insert (node|nodes) items after location
he expression location
must point to a single target node.
The expression items
must yield a sequence of items to insert relatively to the target node.
Notice that even though the keyword node or nodes is used, the inserted items can be non-node items. What happens actually is that the string values of non-node items are concatenated to form text nodes.
If either form of into is used, then the target node must be an element or a document. The items to insert are treated exactly as the contents of an element constructor.
For example if
$target
points to an empty element<CONT/>
,insert nodes (attribute A { 2.1 }, <child1/>, "text", 1 to 3) into $target
yields:
<CONT A="2.1"><child1/>text 1 2 3</CONT>
Therefore the same rules as in constructors apply: item order is preserved, a space is inserted between consecutive non-node items, inserted nodes are copied first, attribute nodes are not allowed after other item types, etc.
When the keywords as first (resp. as last) are used, the items are inserted before (resp. after) any existing children of the element.
For example if
$target
points to an element<parent><kid></parent>
insert node <elder/> as first into $target
yields:
<parent><elder/><kid></parent>
When the only keyword into is used, the resulting position is implementation dependent. It is only guaranteed that as first into and as last into have priority over into.
If before or after are used, any node type is allowed for the target node.
Attributes are a special case: regardless of the before or after keyword used, attributes are always inserted into the parent element of the target. The order of inserted attributes is unspecified. Name conflicts can generate errors.
replace node
Syntax:
replace node location with items
The expression location
must point to a single target node.
The expression items
must yield a sequence of items that will replace the target node.
Except for
document
andattribute
node types, the target node can be replaced by any sequence of items. The replacing items are treated exactly as the contents of an element/document constructor.For example if
$target
points to an element<P><kid/>some text</P>
,
replace node $target/kid with "here is"
yields:
<P>here is some text</P>
Attributes are a special case: they can only be replaced by an attribute node. Name conflicts can generate errors.
replace value of node
Syntax:
replace value of node location with items
Here the identity of the target node is preserved. Only its value or contents (for an element or a document) is replaced.
If the target is an element or a document node, then all its former children are removed and replaced. The replacing items are treated exactly as the contents of a text constructor (so all node items are replaced by their string-value).
For example if
$target
points to an element<P><kid/>some text</P>
,replace value of node $target with (<text>let's count: </text>, 1 to 3, "...")
yields:
<P>let's count: 1 2 3 ...</P>
So the element contents are replaced by a text node whose value is the concatenation of the string values of replacing items.
If the target node is a leaf node (attribute, text, comment, processing-instruction) then its string value is replaced by the concatenation of the string values of replacing items.
For example if $target
points to an element <P order="old">some text</P>
,
replace value of node $target/@order with (1 to 3, <ell>...</ell>)
yields:
<P order="1 2 3...">some text</P>
rename node
Syntax:
rename node location as name-expression
The expression location
must point to a single target element, attribute or processing-instruction.
The expression name-expression
must yield a single QName or string item.
For example if $target
points to an element <CONT A="a">some text</CONT>
rename node $target as qName("some.namespace", "CONTAINER"), rename node $target/A as "NEWA"
yields:
<ns1:CONTAINER NEWA="a" xmlns:ns1="some.namespace">some text</ns1:CONTAINER>
transform
Syntax:
copy $var := node [, $var2 := node2 ...] modify updating-expression return expression
Each node
expression is copied (at least virtually) and bound to a variable.
The updating-expression
contains or invokes one or several update primitives. These primitives are allowed to act only upon the copied XML trees, pointed by the bound variables. Therefore the transform expression has no side effect.
Before the return
expression is evaluated, all updates are applied to the copied trees. Typically the return
expression would be a bound variable, or a node constructor involving the bound variables, so it will yield the updated tree(s).
For example if $target
points to an element
copy $target := <CONT id="s1">some text</CONT> modify ( rename node $target as "SECTION", insert node <TITLE>The title</TITLE> as first into $target ) return element DOC { $target }
returns:
<DOC><SECTION id="s1"><TITLE>The title</TITLE>some text</SECTION></DOC>
The Problem of Invisible Updates
The fact that updates are applied only at the end of a script execution has two consequences on programming, one disturbing, one pleasant:
The disturbing consequence is that you don't see your updates until the end, therefore you cannot build on your changes to make other changes.
An example: suppose you have elements named PERSON. Inside a PERSON there can be a list of BID elements (representing bids made by this person), and you want the BID elements to be wrapped in a BIDS element. But initially the PERSON has no BIDS child.
Initially:
<PERSON id="p0234"> <NAME>Joe</NAME> </PERSON>
We want to insert <BID id="b0012">data</BID>
to obtain:
<PERSON id="p0234"> <NAME>Joe</NAME> <BIDS> <BID id="b0012">data</BID> <BIDS> </PERSON>
Classically, for example using the DOM, we would proceed in two steps:
If there is no BIDS element inside PERSON, then create one
then insert the BID element inside the BIDS element
In XQuery Update this would (incorrectly) be written like this:
declare updating function insert-bid($person, $bid) { if(empty($person/BIDS)) then insert node <BIDS/> into $person else (), insert node $bid as last into $person/BIDS }
Don't try that: it won't work! Why? Because the BIDS element will be created only at the very end, therefore the instruction insert ... as last into $person/BIDS
will not find any node matching $person/BIDS
, hence an execution error.
So what is a correct way of doing ? We need a self-sufficient solution for each of the two cases:
declare updating function insert-bid($person, $bid) { if(empty($person/BIDS)) then insert node <BIDS>{$bids}</BIDS> into $person else insert node $bid as last into $person/BIDS }
The pleasant consequence is that the document(s) on which you are working are stable during execution of your script. You can rest assured that you are not sawing the branch you are sitting on. For example you can quietly write:
for $x in collection(...)//X return delete node $x
This is perfectly predictable and won't stop prematurely. Or you can replicate an element after itself without risking looping forever:
for $x in collection(...)//X return insert node $x after $x
Mixing Updating and Non-updating Expressions
Updating Expressions are XQuery expressions that encompass the 5 updating primitives.
There are rules about mixing Updating and Non-updating Expressions:
First of all, let us remember that Updating Expressions do not return any value. They simply add an update request to a list. Eventually the updates in the list are applied at the end of a script execution (or at the end of the modify clause in the case of the transform expression).
Updating Expressions are therefore not allowed in places where a meaningful value is expected. For example the condition of a if, the right hand-side of a let :=, the in part of a for and so on.
Mixing Updating and Non-updating Expressions is not allowed in a sequence (the comma operator). Though technically feasible, it would not make much sense to mix expressions that return a value and expressions that don't (remember that the sequence operator returns the concatenation of the sequences returned by its components).
The
fn:error()
function and the empty sequence()
are special as they can appear both in Updating and in non-updating expressions.In the same way, the branches of a if or a typeswitch must be consistent: both Updating or both Non-updating. If both branches are Updating then the if itself is considered Updating, and conversely.
If the body of a function is an Updating Expression, then the function must be declared with the updating keyword. Example:
declare updating function insert-id($elem, $id-value) { insert node attribute id { $id-value } into $elem }
A call to such a function is itself considered an Updating Expression. Logically enough, an updating function returns no value and therefore is not allowed to declare a return type.
Order and Conflicts
Another consequence of the "Pending Updates" mechanism is that the order in which updates are specified is not important. In the following example you can without any issue delete the attribute Id
(pointed by $idattr
), and after use $idattr/..
(the parent ITEM element) for inserting! Or you could insert first and delete after.
for $idattr in doc("data.xml")//ITEM/@Id (: selection :) return ( (: updates :) delete node $idattr, insert node <NID>{string($idattr)}</NID> as first into $idattr/.. )
But because of that, some conflicting changes can produce unpredictable results. For example two rename of the same node are conflicting, because we do not know in which order they would be applied. Other ambiguous operations: two replace of the same node, two replace value (or contents) of the same node.
The XQUF specifications take care of forbidding such ambiguous updates. An error is generated (during the apply-updates stage) when such a conflict is detected.
A bit ironically, no error is generated for meaningless but non ambiguous conflicts, for example both renaming and deleting the same node (delete node has priority over other operations).