Description: The tag will be replaced with the content of the file that has been specified in the
identifier attribute.
NONE
Old format: #~NONE~#
New format:
Description: The tag is migrated because of historical reason and it will be replaced by an
empty string.
2.7.2 Database migration
Some of the tags described in the previous section are used as parameters identifiers for
database tables or fields. The old FEP templating engine used references to the Fulcrum DBS
database. The ICADC engine uses a relative mapping, not directly to a table name or a field
name.
In order to understand the database mapping used to migrate to the new DB, it is necessary to
identify the major databases or information structures from the system and the actual data
transferring steps:
Invitation to tender N° AO – 10017-annexes
Page 23 / 31
CPS
(Oracle)
Dissemination
Files
DBS
(Fulcrum)
ICA
(Oracle)
Figure 13 – Database relationships and data transfer
Where:
− CPS – the initial Oracle database used for updating operations.
− DBS – Search Engine Database developed on Fulcrum. The old template engine uses
this database. The structure of information is accessible via limited SQL, for
example: cannot join between tables.
− Dissemination Files – files generated from CPS to be input for DBS
− ICA – the new database, updated from the same initial CPS database. This is the
database that is used by the ICADC application.
In order to update the database references from DBS to ICA, it is necessary to understand the
intermediate steps of mapping, starting with the FEP templates mappings references (see
Figure 4):
− Templates to DBS. This creates the list of mappings that are necessary to see the
ICA correspondence
− DBS – Dissemination Files
− Dissemination Files – CPS
− CPS - ICA
The syntax of the database references from the old templates has a simple format:
TABLE_NAME.FIELD_NAME
This notation from the FEP templates is referred to the old simple Fulcrum flat tables and
could not be reengineered to use directly the new ICA database because of the complexity
behind of any entity model data.
In order to allow database access, a business layer that offers data retrieval using a similar
syntax (to the old DBS reference) will hide the mapping logic.
Invitation to tender N° AO – 10017-annexes
Page 24 / 31
New
Templates
Engine
Business
Layer
Hibernate
ICA
(Java API)
(Oracle)
Lucene
Apache Tomcat
(Java API)
Lucene
Index
Figure 14 – New engine database access
As seen above shows, the new ICADC templating engine uses frameworks in order to access
stored data. Comparing with the old Fulcrum engine that uses for both search and full data
retrieval the same source, the new engine detaches these 2 as follows:
− Lucene for fast search functionalities (Lucene creates a proprietary index structure
that is persisted by the file system)
− Hibernate to access ICA - Oracle RDBMS structure.
The business layer hides specific logic implemented by the Lucene and Hibernate and allows
data access functionally similar with the old Fulcrum mappings in order to minimize the
template migration effort.
The tool used to update the templates will use a XML configuration file in order to translate
the old Fulcrum reference with a new one, ICA based. The differences between the new
mapping and the old one are related to language interpretation/optimization. The language is
no longer be a part of the category name. The new engine “knows” automatically how to
retrieve data based on the general category name and the language.
The root element for this file is categories. For every category there is a category element that
has three attributes:
− dbs – the name of the old DBS category
− ica - the name of the ICA category
− language – the identifier of the language for ICA category related to the old DBS
reference
For every category element there is a mapping for each field, inside elements with the name
field with the attributes:
− dbs – the name of the old DBS field
− ica - the name of the ICA field
Invitation to tender N° AO – 10017-annexes
Page 25 / 31
…………………
…………………
……………………
Figure 15 – DBS-ICA mapping configuration
As seen above, an old EN_TABLE1 DBS table name will be migrated to a TABLE1 category.
But also an old DE_TABLE1 DBS table will be migrated to the same TABLE1 category in the
ICADC application. In order to identify the interface language, the new system will try to
localize it in different contexts, prioritizing according to the following rule:
1. request parameter UPL
2. the specific CALLER entry in the configuration file for UPL parameter
3. Mapping configuration file DBS to ICA
4. global section of the CALLERS configuration file
5. If the UPL is not defined in any of the previous contexts, the interface language is
considered to be English (EN)
2.7.3 Migration of the CALLERS configuration file
The FEP templating engine used a configuration file to maintain specific settings for each
possible caller and a global section that keeps the default values for the case when the specific
caller parameters are not specified.
In this configuration file there is a huge list of callers. For the ICADC engine, only a subset
from the initial list needs to be migrated dynamically. Considering this, only the callers that
need to be migrated will appear in the new configuration file. In the same time with this
operation, the files that are related to the configuration caller entry will be extracted in a new
structure and will be prepared to be used to minimize the space and the complexity of FEP’s
files structure. The files that will be migrated are:
− Starting point search forms (if available)
− Templates for results list
− Templates for document level details
− Files referred by FILELINK tag
Let’s take a sample entry in the old CALLERS configuration file:
Invitation to tender N° AO – 10017-annexes
Page 26 / 31
[MSS_NEWS_FR_FR]
TABLENAME=FR_NEWS
TEMPLATEPREFIX=MSS/FR/FR/
#SEARCH_PAGE=xxxx
SEARCH_TYPE=advanced
ACTION=R
DOC=1
RECORDS_DISPLAYED=10
RL_TMPL_TERM=FR_NEWS
LANGUAGE=FRENCH
DOC_TMPL_TERM=FR_NEWS
QM_EN_PGA_A=MS-FR C
USR_SORT=EN_QVD_A CHAR DESC
………….
Figure 16 – Caller entry in the old configuration file
This entry will be migrated in a XML format, taking as main element CALLER with name
attribute the name of the caller from the old configuration file. The child elements will use the
name of the key from the old configuration file.
NEWS
MSS/FR/FR/
R
10
FR_NEWS
FRENCH
FR_NEWS
MS-FR C
QVD_A CHAR DESC
……………………
Figure 17 – Caller entry in the new configuration file
During the migration process, only the parameters that are still necessary will be migrated.
For example SEARCH_TYPE refers to an older search method that was replaced in time by a
more advanced engine.
Another important thing is that the reference to the related table for this caller will be
migrated, based on the same mechanism described in the previous section. Also, will be
migrated the other fields that may be references to the tables or fields (For example,
EN_QVD_A from USR_SORT parameter will be migrated to QVD_A).
The commented field from the previous file will not appear in the new one.
(#SEARCH_PAGE=xxxx). The global section from the configuration file will be migrated in a
similar fashion like any other caller entry, but the main element will be named GLOBAL.
2.7.4 Migration of HTML page code to XHTML
JTidy is a Java port of HTML Tidy, a HTML syntax checker and a printer. It can be used as a
tool for cleaning up malformed and faulty HTML. This parser checks the validity of the
HTML code input by end-users and automatically tries to correct it. JTidy reads through the
input file and if it finds any mismatched or missing end tags it corrects them and outputs a
well-formed XML document. JTidy won't generate a cleaned up version when there are
problems that it can't be sure of how to handle. This tool may be used to automate the
Invitation to tender N° AO – 10017-annexes
Page 27 / 31
migration, but it will request operator attention. During the migration this tool generates
errors or warnings, depending on the situation. These events need to be analyzed by
somebody in order to see if the migrations were performed successfully.
A few examples how JTidy works:
− Missing or mismatched end tags are detected and corrected
heading
subheading
It will be mapped to
heading
subheading
− End tags in the wrong order are corrected
here is aspecial paragraph.
It will be mapped to
here is a special paragraph.
− Recovers from mixed up tags
heading
new paragraph bold text
some more bold text
It will be mapped to
heading
new paragraph bold text
some more bold text
− Getting the
in the right place:
heading
sub
heading
It will be mapped to
heading
sub
heading
Adding the missing "/" in end tags for anchors:
References
It will be mapped to
References
− Missing quotes around attribute values are added
− Unknown/proprietary attributes are reported
− Tags lacking a terminating '>' are spotted
Invitation to tender N° AO – 10017-annexes
Page 28 / 31
The major limitationsof JTidy are:
− It has limited support for XML
− Cannot recognize CDATA section
− Cannot recognize DTD subsets
These limitations will not reach the FEP templates, considering that the old HTML templates
files are simple and without advanced tags, but this tool even if it is very useful, may generate
design issues due the historical HTML interpreters and require operator attention. Because of
that, extensive testing is envisaged.
Invitation to tender N° AO – 10017-annexes
Page 29 / 31
2.8 Migration and testing
The migration from the FEP, as described before, cannot be done directly to the final standard
of templates. If the engine will be tested simultaneously for all aspects, then there will be a
risk to not identify correctly the source of an error. In order to avoid that, a two step
procedure has been elaborated:
The first step will allow testing of components that replace the functionality of the old ones:
− Migration of database references from DBS to ICA
− Custom tags migration (from old format to new format)
− Configuration file migration
− Request parameters interpretation for Document Detail
− Direct access to the database for Document Detail
− Lucene index creation
− Request parameters interpretation for Results List
− Search functionalities and results list processing
− Lucene index updating
The second step will allow testing of extensibility to new features:
− Templates transformation from HTML to XHTML
− Extensibility using XSLT
− Integration with other external components
After the first step of migration, the engine will provide a similar functionality to the FEP
application. The proof that the system was migrated successful will be that the new engine
will work exactly like the old system after the first step. This step may be done fully
automatic and without operator attention.
The second step requires operator attention. The tasks isolated in this second step require
modifications or improvements that cannot be done fully automatic. Each CALLER that was
migrated successful will have the MIGRATION_LEVEL parameter configured to activate the
extended functionalities.
For the new templates, the second level will be considered default. The first step is necessary
only to test the migration from the old engine for old templates. In this way, the testing period
will be reduced at minimum.
Invitation to tender N° AO – 10017-annexes
Page 30 / 31
2.9 Future migration to ICA2
The current architecture of this engine is modular and allows future improvements. In order to
preview the major improvement that is scheduled for the CORDIS architecture, the migration
to ICA2, it’s important to understand the development effort implied.
Web
browser
Request
Response
Apache
Tomcat
HTTP
Server
ICA2
Connector
Interface
Cocoon
Templates+
config files
Business
Layer
Hibernate
Layer
Lucene
Engine
ICA2
Content
Services
ICA
Database
Lucene
Index
Figure 18 – ICA2 alternative architecture
In order to migrate to the new architecture, as seen above, only the database layer requires
reengineering. The rest of the components will remain the same.
Invitation to tender N° AO – 10017-annexes
Page 31 / 31