Source tasks

The source tasks are necessary to any kind of workflow since they are the only one able to build the punnets. They are the starting point of the workflow. One workflow/map can have multiple source tasks.

The sources tasks will extract raw data. They can be of different shapes like CSV, TIFF, IBM, Alfresco, Excels and more.

Our connectors will easily retrieve your informations stored in ECMs. Check their functional purposes here.

- A SOURCE TASK CANNOT BE PRECEDED ON ANY OTHER TASK !

Alfresco extractor using CMIS technology

Through an SQL query, this alfresco extractor will use the CMIS technology to fetch the content, the metadata and the annotations of your documents from a given Alfresco repository

Mandatory settings

Key Type Description
Alfresco connection provider AlfrescoCMISConnectionProvider CMIS version must be 1.1
Key Type Description Default value
Property Helper PropertyHelper
SQL query to extract documents String
Ex/ SELECT * FROM cmis:document
Number of items per result page Integer Maximum number of results provided 1
Extract document properties Boolean true
Number of documents per punnet Integer 1
Keep folder structure within document Boolean requires extractProperties to be true true
Extract document content Boolean Does not work asynchronously false
Complete extractor module from AWS S3

This AWS extractor performs from a list of sources the extraction of your document content. Many options (suffix, prefix...) exist to optimally specify the documents you want to take into account

Mandatory settings

Key Type Description
AWS connection provider AWSConnectionProvider Must have AmazonS3FullAccess permission
Source buckets String list Buckets where folders are stored
Key Type Description
AWS start-after key String Absolute path of S3 object to start after
ARN key for KMS encryption String
Ex/ arn:aws:kms:us-west-2:111122223333:key/1234abcd-12ab-34cd-56ef-1234567890ab
AWS suffix String S3-object will be extracted if its key has such suffix
Source folders String list Folders in the S3 bucket(s) containing the files to migrate
AWS prefix String S3-object will be extracted if its key has such prefix
Empty punnet generator

This source builds a punnet list containing one or more empty documents. Each document will only contain its identifier : documentId. This punnet can then be enriched by other steps in the processing chain.

Mandatory settings

Key Type Description
Document IDs DocumentIdList Source list of documents to extract from their IDs
Key Type Description Default value
Document per punnet Integer Number of documents each punnet punnet must carry on
Ex/ The input file includes 10 lines meaning 10 document identifiers to extract. By setting this value to 2, Fast2 will create 5 punnets, each containing 2 documents
1
Complete extraction module from a CMOD environment

This task is used to extract documents in the Content-Manager On Demand ECM. One CMOD document is equivalent of 1 punnet of 1 document. Indexes, optional content and annotations will also be extracted. A WAL request is made to find the corresponding documentId in ImageServices. The metadata extraction is then carried out. Relative data are stored in each document of the punnet being processed.Note: All Image Services properties are exported systematically. This task is not a real source task. The documents to be extracted are identified by an BlankSource task generating a set of empty Punnets, i.e. containing only documents each bearing a document number (documentId) to extract.This task relies on the 'libCMOD.dll' library. This library must be in a directory of the Windows PATH. In the wrapper.conf or hmi-wrapper.conf file, activate the use of this library: wrapper.java.library.path. <increment> = ../libCMOD/dll32For the moment, only 32-bit libraries are configured

Mandatory settings

Key Type Description
CMOD connection provider CMODConnectionProvider
Key Type Description Default value
SQL query to extract documents String These requests are made on the indexes of CMOD documents
Ex/ ‘SELECT * FROM exampleTable WHERE Date = ‘2012-11-14’
Extract document annotations Boolean The document annotation will be extracted during the process false
Extract document content Boolean The document content will be extracted during the process false
Maximum results count Integer 2_000
Folders to extract String list
Complete extractor from Content Manager solution

Mandatory settings

Key Type Description
CM connection provider CMConnectionProvider
Key Type Description Default value
Extract standard system properties Boolean false
Extract advanced system properties from DKDDO object Boolean false
Maximum results returned by the query Integer Set to 0 to disable limiting number of results 0
SQL query String Select precisely documents you want to extract through a classic SQL query
Extract custom properties Boolean false
Query type (integer Integer See com.ibm.mm.beans.CMBBaseConstant for further details XPath (7)
CSV file parser

This task can be used to start a migration from a CSV file. By default, the first line of your file is considered as the column headers. Whether the column values are surrounded with double-quotes (_) or not, the CSVSource task will process either way.

Mandatory settings

Key Type Description
File scanner FileScanner Recovers your CSV files
Key Type Description Default value
CSV file path metadata String Punnet property name containing the CSV file path. Set to empty or null to disable
File name for error CSV file String This option might be useful when you need to have a specific file name where to register the lines in error of your CSV file. The name can both be linked to some workflow properties surrounded with ${...} (ex/ campaign, punnetId, etc) or hard-written. Warning: This value can be overwritten by the Associate CSV-error file with original CSV filename option lines_in_error.csv
New column names to set String list If empty, populated from first line)
Folder path for error CSV file String The error file will be stored in your system. You can choose where by configuring this very field. Here as well you can set the path either with workflow properties (${...}) or hard-write it ./csv_errors/
Number of lines to skip Integer This option helps to skip lines, meaning their data will not be processed. By default, only the 1st line is skipped considering it surely consists in the headers row
Ex/ In a file of 10 lines, putting ‘3’ in the input field will skip the 1st, 2nd and 3rd lines
1
Generate hash of CSV content Boolean The hash of the content will be generated and stored in the punnet among a property named hashData false
Column headers in first CSV file only Boolean Only read column definitions from the first parsed CSV file false
CSV encoding character set String UTF-8
CSV separator String Only the first character will be considered ;
Associate CSV-errors file with original CSV filename Boolean This checkbox allows you to match your error file with your original CSV file, just suffixing the original name with _KO. That way, if you use multiple files, all the lines in error will be grouped by file name. Using this option overwrite the File name for error CSV file, but still can be used in addition of the Folder path for error CSV file false
Stop at first error in CSV Boolean Fast2 will automatically be stopped at the first error encountered in the CSV false
Document property name containing CSV file path String Set to empty or null to disable
Move to path when finished String Consider using ${variable} syntax
Document per punnet Integer Number of documents each punnet punnet must carry on
Ex/ By setting this value to 2, each punnet created will contained 2 documents
1
Extra columns String list List of the form target=function:arg1:arg2:…
Complete extractor from FileNet 3•5

The FileNet35Source retrieves existing documents from the FileNet P8 3.5 ECM through a query. This punnet will contain the metadata of the recovered document, its content and annotations

Mandatory settings

Key Type Description
FileNet 3.5 connection provider FileNet35ConnectionProvider Connection parameters to the FileNet instance
SQL query String SQL query corresponding to the list of documents to extract
Key Type Description Default value
Attribute used for Document IDs String Name of the FileNet P8 3.5 attribute corresponding to the values ​​retrieved in the Document IDs list Id
Empty punnet when no result Boolean An empty punnet will be created even if the result of the query is null false
Documents per punnet Integer Number of documents each punnet punnet must carry on
Ex/ By setting this value to 2, each punnet created will contained 2 documents
1
Document IDs DocumentIdList Source list of documents to extract from their IDs
Complete extractor from FileNet P8

The FileNetSource source retrieves existing documents from the FileNet P8 5.x ECM through an SQL query. This punnet will contain the metadata of the recovered document, its content, security information and parent folders.

Mandatory settings

Key Type Description Default value
SQL query String SQL query corresponding to the list of documents to extract
FileNet connection provider FileNetConnectionProvider Connection parameters to the FileNet instance
Object store name String Name of the repository to extract from OS1
Key Type Description Default value
Number of entries per result page Integer Number of results returned per page by the FileNet P8 query 1000
Documents per punnet Integer Number of documents each punnet punnet must carry on
Ex/ By setting this value to 2, each punnet created will contained 2 documents
1
Extract object type properties Boolean The FileNet P8 metadata of the document which are Object type will be saved at the punnet level false
Extract FileNet system properties Boolean System metadata during extraction is saved at the punnet level false
Properties to extract String list List of FileNet P8 metadata of the document to be saved at the punnet level. Extract all properties into the document if leaved empty
Extract documents instance informations Boolean The fetchInstance method makes a round trip to the server to retrieve the property values of the ObjectStore object false
Extract FileNet security Boolean The security of the document will be saved at the punnet level false
Extract folders absolute path Boolean The absolute path of the folder inside the FileNet instance will be extracted during the process false
Extract document content Boolean The document content will be extracted during the process true
ImageServices WAL JNI-bridged Extractor

This task extracts documents from the Panagon Image Services ECM (indexes, optional content and annotations). One punnet of one document for each ECM document. However, it's not a real source task. The documents to be extracted are identified by a [BlankSource](#BlankSource) task generating a set of empty Punnets, i.e. containing only documents each bearing a document number (documentId) to extract.

Mandatory settings

Key Type Description
Password String Password of the aforementioned username
Connection organization String Organization name for the connection
Connection domain String Domain name of the connection
Username String Login with scope to access the docbase with proper rights
Key Type Description Default value
Annotations in ARender format Boolean Convert annotations to ARender format false
Annotation converter ParseISAnnotation Specific converter from IS format. Allow to resize the extracted annotations
Annotations in raw format Boolean Save annotation contents in raw format inside the punnet false
Version of libIDMIS String This task is based on the WAL library and on the specific Fast2 library ‘libIDMIS.dll’. This library must be in a directory of the Windows PATH. In the wrapper.conf or hmi-wrapper.conf file, activate the use of this library: wrapper.java.library.path. = ../libIDMIS/w32For the moment, only 32-bit libraries are configured libIDMIS-1.0.15
Test scenarios Boolean Empty testing stub instead of libIDMIS false
Connection terminal String Terminal name for the connection
Use opacity for annotations Boolean false
Unrecognized annotation file path String Path of the alternative annotation xml file for unrecognized annotation. If not specified the punnet will go in exception
Extract document content Boolean The document will be extracted with its content true
Extract document annotation Boolean The associated annotations will be extracted true
A generic broker for wildcarded punnet lists

This class will search for local files to analyze them from a defined path

Mandatory settings

Key Type Description
File scanner FileScanner Recovers your files
Key Type Description Default value
Fallback XML parsing Boolean If true, the file will be added as document content in the punnet when XML parsing fails. Consider adding this file as a regular file (not an XML) false
Skip parse exceptions Boolean The task does not throw an error when XML parsing fails. Do not stop parsing and resume to next candidate false
XSL Stylesheet path String The XSL stylesheet file to use when parsing XML files
Number of files per punnet Integer If the files are not in XML format, the punnet will contain as many documents as defined in this option 1
Skip XML parsing Boolean The XML file will not be parsed before being added to the punnet. Not recommended in most cases false
Allow any kind of file Boolean All types of files can be added. Otherwise, only XML-based Punnet descriptions are allowed true
Complete extractor from mail box

The MailSource task extracts messages from an e-mail box. Each extracted message will correspond to a punnet, one document per punnet

Mandatory settings

Key Type Description
MailBox connection provider MailBoxProvider
Key Type Description Default value
Header names String list List of header names (case-sensitive) to retrieve from the mail. Message-Id, Subject, From, To, Cc and Date are added by default
Start Id Integer Index from which the first message should be extracted 1
Search criterion SearchTerm Search criteria to filter the messages to extract. If filled, then the ‘Start Id’ and ‘Maximum number of mail’ fields will not be used
Update document with mail root folder name String Name of the metadata to add to the document. If filled, the full name of the source folder is indexed in this metadata. Set to null or empty to disable updating
Folders to scan String list List of files to scan in the mailbox. If filled, override root folder name from MailBox connection provider configuration
Maximum number of mail to extract Integer By default, Integer.MAX_VALUE = 2147483647 Integer.MAX_VALUE
Forbidden characters String List of characters to remove from Message-Id when building the DocumentId <>:"/|?*
Random punnet generator

Randomly produces punnets containing documents, metadata, content...

Mandatory settings

Key Type Description Default value
Maximum punnet number Integer Excluded 1000
Minimum punnet number Integer Included 1
Key Type Description Default value
Maximum document number Integer Excluded 1
Minimum content number Integer Included
Minimum metadata number Integer Included 1
Maximum number of metadata values Integer Included 6000
Minimum number of metadata values Integer Included 0
Maximum content number Integer Excluded
Maximum metadata number Integer Excluded 10
Minimum document number Integer Included 1
Complete extractor from SQL database

Mandatory settings

Key Type Description
SQL connection provider SQLQueryGenericCaller
SQL query String Select precisely documents you want to extract through a classic SQL query
Key Type Description
Property name to group by document String Column used to group lines by document
SQL mapping for punnet String/String map Mapping of SQL properties to punnet metadata. Use ‘punnetId’ for Punnet Id
Property name to group by punnet String Column used to group lines by punnet
SQL mapping for document String/String map Mapping of SQL properties to document metadata. Use ‘documentId’ for Document Id