This proposal provides higher order functions in XQuery. It is based on the presentation by Michael Kay, with detailed fleshed out and modifications according to experience implementing a prototype of this functionality.
1 Function Items
1.1 Literal Function Items
1.2 Inline Functions
1.3 Dynamic Function Invocation
2 Type Checking
2.1 SequenceType Matching
2.2 Function Item Coercion
2.3 Static Typing
3 Additional Functions
3.1 fn:function-name
3.2 fn:function-arity
3.3 fn:partial-apply
4 Additional Modifications to XQuery 1.0
4.1 Modifications to Effective Boolean Value
4.2 Modifications to Atomization
4.3 Content of an Element Constructor
4.4 Modifications to fn:string()
4.5 Modifications to fn:deep-equal()
5 Use Cases
5.1 Sorting
5.2 Highest
5.3 Subsequence Before First
5.4 Analyze String
5.5 Checking for Cycles
5.6 Sum
5.7 Grouping
5.8 Sorted Map
A Full Grammar for Higher Order Functions
B Glossary (Non-Normative)
C Revision Log
The data model is extended with a function type as a direct subtype of item(). Subtypes of this function type
are partitioned into discrete types, each representing
function items that accept a particular number of arguments. Function
types which have the same arity can be subtypes of each other based on their
argument and return types. For example, the type function(item()) as xs:integer is subsumed by the type
function(item()) as item().
[Definition: A function item
is a value in the XQuery data model that represents a function. Function items are a subtype of item().]
[Definition: Function items can be
invoked, which is the act of calling the function that the function item represents.]
item()
|
+-------------+-------------+
| | |
node() xs:anyAtomicType function()
|
+----------------------+-----------------------------+-----------------
| | |
function() as item()* function(item()*) as item()* function(item()*, item()*) as item()* ...
|
+----------------+------------+----------------------
| |
function(item()*) as node()* function(item()*) as xs:anyAtomicType* ...
| |
function(item()*) as node()+ ...
|
...
Function items are returned using a literal function item, or an inline function. Function items are invoked using a dynamic function invocation.
PrimaryExpr ::=
...
| FunctionItemExpr
| DynamicFunctionInvocation
FunctionItemExpr ::= LiteralFunctionItem | InlineFunction
A function item consists of the following information:
Note:
For the moment we will assume that a function is not allowed to contain a reference to a mutable variable, although there can be good use cases for being able to do so.
LiteralFunctionItem ::= QName "#" IntegerLiteral
[Definition: A literal function item creates a function item that represents a named function.] [Definition: A named function is a function defined in the static context for the query. To uniquely identify a particular named function, both its name as a QName and its arity are required.]
If the QName in the literal function item has no namespace prefix, it is considered to be in the default function namespace.
If the expanded QName and arity in a literal function item do not match the name and arity of a function signature in the static context, a static error is raised [err:XPST0017].
The result of a literal function item is a single function item with the following properties:
Certain functions in the [Functions and Operators] specification have the apparent privilege of being polymorphic. These are denoted as accepting parameters of "numeric" type, or returning "numeric" type. Here "numeric" is a pseudonym for the four primitive numeric types xs:decimal, xs:integer, xs:float, and xs:double. The functions in question are:
fn:abs()
fn:ceiling()
fn:floor()
fn:round()
fn:round-half-to-even()
Note:
An implementation that supports static typing can choose to model the types of these functions more accurately if desired.
InlineFunction ::= "function" "(" ParamList? ")" ("as" SequenceType)? EnclosedExpr
[Definition: An inline function creates a function item that represents an anonymous function defined directly in the inline function itself.] An inline function specifies the names and SequenceTypes of the parameters to the function, the SequenceType of the result, and the body of the function.
Note:
The use of the "function(" notation for an inline function means that an unprefixed function call to a function with a local name of "function" will no longer be legal. This can be worked around by always calling such a function using a prefix.If a function parameter is declared using a name but no type, its default type is item()*. If the result type is omitted from a function declaration, its default result type is item()*.
The parameters of a function declaration are considered to be variables whose scope is the function body. It is a static error [err:XQST0039] for a function declaration to have more than one parameter with the same name.
The static context for the function body is inherited from the location of the inline function expression, with the exception of the static type of the focus (context item, context position, and context size) which is undefined.
The variables in scope for the function body include all variables representing the function parameters, as well as all variables that are in scope for the inline function expression.
Note:
Function parameter names can mask variables that would otherwise be in scope for the function body.The result of an inline function is a single function item with the following properties:
DynamicFunctionInvocation ::= FunctionItem "(" (ExprSingle ("," ExprSingle)*)? ")"
FunctionItem ::= FilterExpr
[Definition: A dynamic function invocation invokes a function item, calling the function it represents.] A dynamic function invocation consists of an expression that returns the function item and a parenthesized list of zero or more arguments.
If the function item expression does not return a sequence consisting of a single function item with the same arity as the number of specified arguments, a type error is raised [err:TBD].
A dynamic function invocation is evaluated as follows:
Note:
These rules are derived from the rules for function calls defined in Section 3.1.5 Function CallsXQ except for the addition of a rule to deal with the use of the variable values from the closure.There could be many ways to specify type checking against function items, and many of them have been discussed and explored during the process of writing this proposal. In the end I have decided to stick with the principal that SequenceType matching should only be defined against sequences of items, rather than against other SequenceTypes. SequenceTypes are only ever approximations of the real runtime type, so using a SequenceType subtype relationship would necessarily introduce inaccuracy or pessimism into the otherwise accurate and optimistic type system.
To allow queries to check for function item types, SequenceType notation is extended with a function type.
ItemType ::= KindTest | "item" "(" ")" | AtomicType | FunctionType | ParenthesizedItemType
FunctionType ::= "function" "(" ")" | "function" "(" (SequenceType ("," SequenceType)*)? ")" "as" SequenceType
ParenthesizedItemType ::= "(" ItemType ")"
The function type can specify the parameter types and return type of the function item.
SequenceType matching as defined in Section 2.5.4 SequenceType MatchingXQ is extended as follows:
function() matches any single function item.
In addition, wherever SequenceType matching occurs against a function ItemType an auxilliary transformation called function item coercion is subsequently applied. [Definition: Function item coercion wraps a function item in a new inline function with signature the same as the SequenceType being matched. This effectively delays the checking of the argument and return types until the function item is invoked.]
Function item coercion is always applied after SequenceType matching, and only applied if the SequenceType is a function ItemType other than function().
For each function item, $function, in the input sequence, function item coercion returns a new function item with the following properties:
$function.
$function.
$function with the arguments that were specifed at the new
function's invocation.
If the result of invoking the new function item would necessarily result in a type error, that error can be raised during function coercion. It is implementation dependent whether this happens or not.
These rules have the following consequences:
declare function local:filter($s as item()*, $p as function(xs:string) as xs:boolean) as item()*
{
$s[$p(.)]
};
let $f := function($a) { starts-with($a, "E") }
return
local:filter(("Ethel", "Enid", "Gertrude"), $f)
The function item $f has a static type of function(item()*) as item()*. When the local:filter() function
is called, the following occurs to the function item:
function(xs:string) as xs:boolean. Since $f is a single function item with an arity of 1,
SequenceType matching succeeds.
function(xs:string) as xs:boolean.
xs:string.
xs:string, which returns a xs:boolean.
xs:boolean.
xs:boolean is returned as the result of $p.
Note:
Although the semantics of function item coercion are specified in terms of wrapping the function items, static typing should be able to reduce the number of places where this is actually necessary.
A supplied function item with a signature function(A1, A2) as R obviously matches an expected
function type function(a1, a2) as r if A1 :> a1, A2 :> a2, and R <: r. When these conditions
are met it is unnecessary to perform function item coercion.
When these conditions are not satisfied, static typing has three options from which I have no special preference:
The SequenceType matching and function item coercion rules are defined in such a way that static typing implementations can continue to apply function conversion rules at compile time based on the static type of the function item.
fn:function-name($function as function()) as xs:QName?
Summary: Returns the name of the function item argument.
fn:function-arity($function as function()) as xs:integer
Summary: Returns the arity of the function item argument.
fn:partial-apply($function as function(), $arg as item()*) as function()
fn:partial-apply($function as function(), $arg as item()*, $argNum as xs:integer) as function()
Summary: Returns a function item derived from $function by binding the argument specified by $argNum to $arg.
If only two arguments are provided, $argNum defaults to 1. Parameters are numbered from 1 to the arity of the function item.
If $argNum is less than 1 or $function has an arity less than $argNum a dynamic error is raised [err:TBD].
The result of the function is a single function item with the following properties:
$function.
$function minus one.
$p, with a signature the same as $function, but removing the parameter specified by
$argNum. The result of calling the new function is the result of invoking $function
with the arguments from $p's invocation, inserting $arg as argument $argNum.
A few concepts and functions in XQuery 1.0 needs to be augmented to handle the addition of function items to the XQuery data model.
Although already clear in the XQuery 1.0 definition, calling fn:boolean() with an argument sequence that starts with a function item results in a type error [err:FORG0006].
It is not possible to meaningfully atomize a function item; therefore if the argument sequence to fn:data() contains a function item a type error is raised [err:TBD].
It is a type error for a content expression of an element constructor to contain a function item [err:TBD].
The fn:string() function should raise a type error if its argument contains a function item [err:TBD].
The fn:deep-equal() function raises a type error [err:TBD] if either of its argument sequences contain a function item.
Note:
Although most cases are covered by changing the rules for atomization and effective boolean value, we'll need to do a thorough scan of the specifications to check for anything else that would become under-specified with the introduction of function items.This simple use case implements a function that sorts a sequence using a user provided function to determine the sort key. This shows the basic value proposition for higher order functions - you can use a higher order function to parameterize any place that expects an expression.
declare function local:sort(
$seq as item()*,
$key as function(item()) as xs:anyAtomicType
) as item()*
{
for $a in $seq
order by $key($a)
return $a
};
local:sort(tokenize("The quick brown fox jumps over the lazy dog", " "),
function($a) {lower-case($a)})
Find the employees earning the highest salary.
Solution without higher order functions:
let $max := max(emp/salary) return emp[salary = $max]
Difficulty:
Higher-order solution:
In this case we've coded the higher-order function to use the same two-pass algorithm as the original.
declare function local:highest(
$input as item()*,
$f as function() as xs:anyAtomicType
) as item()*
{
let $max := max($input/$f())
return $input[$f() = $max]
};
let $emp :=(
<emp id="1"><salary>5</salary></emp>,
<emp id="2"><salary>15</salary></emp>,
<emp id="3"><salary>10</salary></emp>,
<emp id="4"><salary>15</salary></emp>
)
return
local:highest($emp/salary, number#0)/..
However, it is easy to re-implement the same generic function to use a more efficient recursive implementation if the need arises.
declare function local:highest_impl(
$input as item()*,
$f as function() as xs:anyAtomicType,
$max as item()*,
$f_max as xs:anyAtomicType*
) as item()*
{
if(empty($input)) then $max
else
let $head := $input[1]
let $tail := subsequence($input, 2)
let $f_head := $head/$f()
return
if($f_head > $f_max) then
local:highest_impl($tail, $f, $head, $f_head)
else if($f_head = $f_max) then
local:highest_impl($tail, $f, ($max, $head), ($f_max, $f_head))
else
local:highest_impl($tail, $f, $max, $f_max)
};
declare function local:highest(
$input as item()*,
$f as function() as xs:anyAtomicType
) as item()*
{
let $head := $input[1]
let $tail := subsequence($input, 2)
let $f_head := $head/$f()
return
local:highest_impl($tail, $f, $head, $f_head)
};
let $emp :=(
<emp id="1"><salary>5</salary></emp>,
<emp id="2"><salary>15</salary></emp>,
<emp id="3"><salary>10</salary></emp>,
<emp id="4"><salary>15</salary></emp>
)
return
local:highest($emp/salary, number#0)/..
Function items that act on the context item, position or size can easily be used to define predicates. For example this function will find the subsequence of a sequence that occurs before a caller specified predicate holds:
declare function local:before-first(
$input as item()*,
$pred as function() as item()*
) as item()*
{
subsequence($input, 1,
(for $i at $p in $input
where $i[$pred()]
return $p)[1] - 1)
};
local:before-first((<h1/>, <h1/>, <h2/>, <h3/>), function() { name() = 'h2' })
Suppose we want to take an input string "See [1] and [2]" and turn it into the tree equivalent of "See <c>1</c> and <c>2</c>". This can be achieved in XSLT using xsl:analyze-string (which is effectively a higher-order expression with custom syntax). It's not at all easy to achieve in XQuery. Without adding new syntax, we could provide a higher-order function analyze-string:
declare function analyze-string($in as xs:string, $regex as xs:string,
$action as function(xs:string, xs:boolean) as item()*) external;
The supplied function is called once to process each substring of the input: either a substring that matches the regex, or a substring that does not match, with the boolean argument distinguishing the two cases.
The above problem can then be solved with the call
analyze-string($input, "\[.*?\]",
function($substring as xs:string, $matches as xs:boolean) as item()* {
if ($matches) then <c>{$substring}</c>
else text{$substring}
}
)
Adding analyze-string to the code library would give XQuery a regex capability equivalent to XSLT's xsl:analyze-string instruction without the expense of defining new custom syntax. (As proposed here it doesn't provide the equivalent of regex-group() to analyze matched groups, but that could be added using another function callback).
Suppose you have an parts explosion in which parts have subparts; a part can have many subparts, and a subpart can be used in many parts, but a part cannot use itself. How do we check that there are no cycles in the data?
Let's suppose this is done using ID and IDREFS:
<part id="a123"> <subpart ref="x123" quantity="5"/> <subpart ref="p567" quantity="2"/> </part>
We can do this currently with a pair of functions like this:
declare function subparts($part as element(part)) as element(part)* {
for $s in $part/subpart/@ref return id($s)
}
declare function transitiveReferences($part as element(part),
$subpart as element(part),
$route as element(part)*) as
xs:boolean {
let $direct := subparts($part)
return
exists($direct intersect $subpart) or
(some $C in ($direct except $route)
satisfies transitiveReferences($C, $subpart, ($route, $C)))
}
//part[transitiveReferences(., ., ())]
The problem here is that the difficult part of the code (looking for cycles) is a general-purpose algorithm, but it is not reusable. We can make it reusable (so it can look for cycles in other kinds of data) by parameterizing it. The parameter is the function that follows a relationship. We can rewrite it as:
declare function transitiveReferences($from as node(),
$to as node(),
$route as node()*
$arc as function(node()) as node()*) as
xs:boolean {
let $direct := $arc($from)
return
exists($direct intersect $to) or
(some $C in ($direct except $route)
satisfies transitiveReferences($C, $to, ($route, $C), $arc))
}
declare function local:find-template($t as node()) as node()
{
for $c in $t//xsl:call-template/@name
return $template//xsl:template[@name = $c]
};
transitiveReferences($template, $template, (), local:find-template#1)
Continuing the theme of reusing generic algorithms, this use case shows a method of
implementing a sum function using the well-known tail-recursive generic function foldl().
It also show a good use case for the fn:partial-apply() function, which is used to specialize
generic algorithms to perform specific tasks.
declare function local:foldl($f, $z, $l)
{
if(empty($l)) then $z
else local:foldl($f, $f($z, $l[1]), subsequence($l, 2))
};
let $sum :=
fn:partial-apply(
fn:partial-apply(local:foldl#3, function($a, $b) { $a + $b }),
0)
return $sum(1 to 100)
Inline functions can also be useful for returning "sequences of sequences" by wrapping the inner sequences in a function item. With the flexibility of closures this can be extremely powerful - for example this function performs simple grouping, returning a sequence of function items to represent the groups:
declare function local:group(
$seq as item()*,
$key as function(item()) as xs:anyAtomicType
) as function() as item()**
{
for $a in distinct-values(for $a in $seq return $key($a))
return function() { $seq[$key(.) = $a] }
};
for $a in local:group((1, 6, 1, 2, 3, 4, 3, 2, 3), function($a) { $a mod 2 })
return <group>{ $a() }</group>
The following use case is an implementation of a sorted map structure. Both map insertion and map searching use an O(log n) binary search algorithm (assuming an XQuery implementation that uses constant time random access XDM sequences).
It is interesting to note how these functions use 0 arity function items to represent the sorted map as a "sequence of sequences". It is also interesting that through the use of function item arguments, the general purpose map processing algorithm map:process() can be used to implement map:contains(), map:get() and map:put() - in this way the binary search algorithm only has to be implemented once.
declare namespace map = "http://snelson.org.uk/functions/map";
declare function map:pair($key as item(), $value as item()*)
as function() as item()+
{
function() { $key, $value }
};
declare function map:key($pair as function() as item()+) as item()
{
$pair()[1]
};
declare function map:value($pair as function() as item()+) as item()*
{
subsequence($pair(), 2)
};
declare function map:process(
$map as (function() as item()+)*,
$key as item(),
$found as function(function() as item()+) as item()*,
$notfound as item()*,
$unused as function((function() as item()+)*) as item()*
) as item()*
{
if(empty($map)) then $notfound
else
let $length := count($map)
let $middle := $length idiv 2 + 1
let $pair := $map[$middle]
let $pair_key := $pair()[1]
return
if($pair_key eq $key) then (
$unused(subsequence($map, 1, $middle - 1)),
$found($pair),
$unused(subsequence($map, $middle + 1))
)
else if($pair_key gt $key) then (
map:process(subsequence($map, 1, $middle - 1), $key,
$found, $notfound, $unused),
$unused(subsequence($map, $middle))
)
else (
$unused(subsequence($map, 1, $middle)),
map:process(subsequence($map, $middle + 1), $key,
$found, $notfound, $unused)
)
};
declare function map:contains($map as (function() as item()+)*, $key as item())
as xs:boolean
{
map:process($map, $key, function($a) { true() }, false(),
function($a) { () })
};
declare function map:get($map as (function() as item()+)*, $key as item())
as item()*
{
map:process($map, $key, map:value#1, (), function($a) { () })
};
declare function map:put(
$map as (function() as item()+)*,
$key as item(),
$value as item()*
) as (function() as item()+)+
{
let $pair := map:pair($key, $value)
return
map:process($map, $key, function($a) { $pair }, $pair,
function($a) { $a })
};
let $map := map:put(map:put(map:put(map:put(map:put(map:put((),
"a", "aardvark"),
"z", "zebra"),
"e", ("elephant", "eagle")),
"o", "osterich"),
"t", "terrapin"),
"a", "antelope")
return (
map:get($map, "o"),
for $m in $map
return concat("key: ", map:key($m), ", value: (",
string-join(map:value($m), ", "), ")")
)
PrimaryExpr ::=
...
| FunctionItemExpr
| DynamicFunctionInvocation
FunctionItemExpr ::= LiteralFunctionItem | InlineFunction
LiteralFunctionItem ::= QName "#" IntegerLiteral
InlineFunction ::= "function" "(" ParamList? ")" ("as" SequenceType)? EnclosedExpr
DynamicFunctionInvocation ::= FunctionItem "(" (ExprSingle ("," ExprSingle)*)? ")"
FunctionItem ::= FilterExpr
ItemType ::= KindTest | "item" "(" ")" | AtomicType | FunctionType | ParenthesizedItemType
FunctionType ::= "function" "(" ")" | "function" "(" (SequenceType ("," SequenceType)*)? ")" "as" SequenceType
ParenthesizedItemType ::= "(" ItemType ")"
The combination of the variable values needed to execute a function and a method of invoking that function is often called a closure.
A dynamic function invocation invokes a function item, calling the function it represents.
A function item is a value in the XQuery data model that represents a function. Function items are a subtype of item().
Function item coercion wraps a function item in a new inline function with signature the same as the SequenceType being matched. This effectively delays the checking of the argument and return types until the function item is invoked.
An inline function creates a function item that represents an anonymous function defined directly in the inline function itself.
Function items can be invoked, which is the act of calling the function that the function item represents.
A literal function item creates a function item that represents a named function.
A named function is a function defined in the static context for the query. To uniquely identify a particular named function, both its name as a QName and its arity are required.