The Template-Based extractor also supports advanced templates with a feature called template injection, that allows reusing a defined template element multiple times within the same template. The template element can be injected anywhere in the template using its name which also helps in reducing the overall size of the template.
Following are the pre-requisites for injecting a template element within another template element:
- Use the name property to assign a name to the template element that is intended to be injected in another template.
- The template element that needs to be injected must be declared before the template element where it is intended to be injected.
Following properties need to be considered while using template injection:
- template: This property is used to inject a predefined template element into another template element. The value assigned to property "template" must match the given name to a specific template which is intented to be injected. This element must be declared before being mentioned in a property “template” and injected into another template.
- rootElement: This property refers to the name of a structure (or specific template element) whose all possible instances are intended to be extracted. This property can only be defined in the first XML element of the template specification. If it is not defined, then this element template is assumed to be the root template element that Template-Based Extractor will use to extract as many instances as possible from the input source.
Let's take an example to understand the template injection feature and its properties.
Following drop down section displays the text input source:
Click here to see the input
Hi John,
Following are the personal details of the new joinees.
name = "Owen Hans",
age = 24,
residency address {country: "Portugal", zip code: 47007, street: "Rua da Veiga", door number: 64} ,
work address {country: "France", zip code: 61200, street: "Rue aux chats", door number: 38}, owen64@hotmail.com ,
phone numbers: 9169083840 9233438121 9338731232
The following people have also been recruited and they will be joining us shortly..
name = "Robert King",
residency address {country: "Nueva York", zip code: 10128, street: "Walnutwood Street", door number: 180} ,
age = 25,phone numbers: 9064083841 9211437124,
work address {country: "Nueva York", zip code: 10009, street: "Berkshire St.", door number: 45}
Regards,
Lucy
Let's say, we create a template element ‘AddressStruct’ that will represent the address from this text input. Now, we want to use ‘AddressStruct’ template element multiple times inside another template element ‘EmployeeStruct’ to define and extract the working and residency address of an employee. So, instead of defining the ‘AddressStruct’ element two times in the same template, we can define it only once and then refer it multiple times through template injection.
The following advanced template definition defines how we can use template injection where the ‘AddressStruct’ template element is injected in the ‘EmployeeStruct’ template element.
<template rootElement="EmployeeStruct">
<element name ="AddressStruct" startingRegex="{" endingRegex="}" childrenSeparatorRegex=",">
<element name="country" startingRegex="country\s*:" extractFormat="quotedString"/>
<element name="zipcode" startingRegex="zip code\s*:" extractFormat="regex:[0-9]{5}-?([0-9]{4})?"/>
<element name="street" startingRegex="street\s*:" extractFormat="quotedString"/>
<element name="door number" startingRegex="door number\s*:" extractFormat="int"/>
</element>
<element name="EmployeeStruct" childrenSeparatorRegex=",">
<element name="name" startingRegex="name\s*=" extractFormat="quotedString"/>
<element name="age" startingRegex="age\s*=" extractFormat="int"/>
<element name="residency address" startingRegex="residency address">
<element template="AddressStruct"/>
</element>
<element name="work address" startingRegex="work address">
<element template="AddressStruct"/>
</element>
<element name="phone numbers" startingRegex="phone numbers\s*:">
<element name="phone number" extractFormat="regex:[0-9]{10}" occurs="1-*"/>
</element>
<element name="email" extractFormat="email" occurs="0-1"/>
</element>
</template>
From the above template definition, we can observe the following points:
- For injecting “AddressStruct” into “EmployeeStruct”, the "AddressStruct" is defined before the "EmployeeStruct” template element.
- The “AddressStruct” is injected in the “EmployeeStruct” template element using the template property.
- The rootElement property is assigned the value as “EmployeeStruct”, so that the Template-Based extractor considers the structure that defines the employee.
Note: In template injection, the change is required only in template definition. The approach for setting up and using the
TemplateBasedExtractor class remains the same.
Following image shows the parsed result in JSON string format after applying the template: