AlfrescoRestSource

Alfresco extractor using Alfresco REST protocol

This task relies on the Alfresco public REST API (with v1.0.4 of the Alfresco REST client) to retrieve documents and metadata into a given Alfresco instance

Mandatory settings

Key

Type

Description

CMIS query

String

Query used to retrieve the objects from Alfresco

Ex/ SELECT * FROM cmis:document WHERE cmis:name LIKE ‘test%’

Alfresco connection provider

AlfrescoRESTConnectionProvider

Optional settings

Key Type Description Default value

Max item to return per call Integer Set the paging max items threshold to specify the number of Alfresco objects to retrieve per call. 100

Fields to extract

String

The less the better ! Only the ‘id’ is necessary to start the migration workflow. Separate the different values with a comma, no space. Use properties from com.alfresco.client.api.common.constant.PublicAPIConstant library.

Ex/ id,name

id

AlfrescoSource

Alfresco extractor using CMIS technology

Through an SQL query, this alfresco extractor will use the CMIS technology to fetch the content, the metadata and the annotations of your documents from a given Alfresco repository

Mandatory settings

Key Type Description

SQL query to extract documents

String

Fast2 will retrieve all documents, folder, references, items and metadata matching this query. If the query is exhaustively specifying data to extract, uncheck the ‘Extract document properties’. The data cmis:objectId will be mandatory.

Ex/ SELECT * FROM cmis:document

Alfresco connection provider AlfrescoCMISConnectionProvider CMIS version must be 1.1

Optional settings

Key	Type	Description	Default value
Property Helper	PropertyHelper
Number of items per result page	Integer	Maximum number of results provided	1
Extract document properties	Boolean		true
Number of documents per punnet	Integer		1
Keep folder structure within document	Boolean	requires extractProperties to be true	true
Extract document content	Boolean	Does not work asynchronously	false

AWSSource

Complete extractor module from AWS S3

This AWS extractor performs from a list of sources the extraction of your document content. Many options (suffix, prefix…) exist to optimally specify the documents you want to take into account

Mandatory settings

Key	Type	Description
AWS connection provider	AWSConnectionProvider	Must have AmazonS3FullAccess permission
Source buckets	String list	Buckets where folders are stored

Optional settings

Key

Type

Description

AWS start-after key

String

Absolute path of S3 object to start after

ARN key for KMS encryption

String

Ex/ arn:aws:kms:us-west-2:111122223333:key/1234abcd-12ab-34cd-56ef-1234567890ab

AWS suffix

String

S3-object will be extracted if its key has such suffix

Source folders

String list

Folders in the S3 bucket(s) containing the files to migrate

AWS prefix

String

S3-object will be extracted if its key has such prefix

BlankSource

Empty punnet generator

This source builds a punnet list containing one or more empty documents. Each document will only contain its identifier : documentId. This punnet can then be enriched by other steps in the processing chain.

Mandatory settings

Key	Type	Description
Document IDs	DocumentIdList	Source list of documents to extract from their IDs

Optional settings

Key

Type

Description

Default value

Document per punnet

Integer

Number of documents each punnet punnet must carry on

Ex/ The input file includes 10 lines meaning 10 document identifiers to extract. By setting this value to 2, Fast2 will create 5 punnets, each containing 2 documents

1

CMODSource

Complete extraction module from a CMOD environment

This task is used to extract documents in the Content-Manager On Demand ECM. One CMOD document is equivalent of 1 punnet of 1 document. Indexes, optional content and annotations will also be extracted. A WAL request is made to find the corresponding documentId in ImageServices. The metadata extraction is then carried out. Relative data are stored in each document of the punnet being processed.Note: All Image Services properties are exported systematically. This task is not a real source task. The documents to be extracted are identified by an BlankSource task generating a set of empty Punnets, i.e. containing only documents each bearing a document number (documentId) to extract.This task relies on the ‘libCMOD.dll’ library. This library must be in a directory of the Windows PATH. In the wrapper.conf or hmi-wrapper.conf file, activate the use of this library: wrapper.java.library.path. = ../libCMOD/dll32For the moment, only 32-bit libraries are configured

Mandatory settings

Key	Type	Description
CMOD connection provider	CMODConnectionProvider
Folders to extract	String list	List of CMOD folders which will be scanned. Additional level(s) of filter can be used with the SQL query down below.

Optional settings

Key Type Description Default value

SQL query to extract documents

String

Enter here the WHERE clause used to filter documents. Since this request is made on the indexes of CMOD documents, the property used to filter out the documents need to be indexed in CMOD prior to any extraction.

Ex/ WHERE Date = ‘2012-11-14’

Extract document annotations Boolean The document annotation will be extracted during the process false

Number of documents per punnet Integer 1

Extract document content Boolean The document content will be extracted during the process false

Maximum results count Integer 2000

CMSource

Complete extractor from Content Manager solution

Mandatory settings

Key	Type	Description
CM connection provider	CMConnectionProvider
SQL query	String	Select precisely documents you want to extract through a classic SQL query

Optional settings

Key	Type	Description	Default value
Extract advanced system properties from DKDDO object	Boolean		false
Extract standard system properties	Boolean		false
Maximum results returned by the query	Integer	Set to 0 to disable limiting number of results	0
Extract custom properties	Boolean		false
Query type	Integer	See com.ibm.mm.beans.CMBBaseConstant for further details. Default value is XPath (7)	7

CSVSource

CSV file parser

This task can be used to start a migration from a CSV file. By default, the first line of your file is considered as the column headers. Whether the column values are surrounded with double-quotes ( ) or not, the CSVSource task will process either way. If you need to force the document ID for the whole process, use the metadata documentId.

Mandatory settings

Key Type Description

CSV paths

String list

List of paths to CSV files to be parsed. Check out the following examples for allowed formats

Ex/
C:/samples/myDocument.csv
C:\samples\myDocument.csv
C:\\samples\\myDocument.csv
"C:\samples\myDocument.csv"
C:/samples/${map}.csv

Optional settings

Key Type Description Default value

CSV file path metadata String Punnet property name containing the CSV file path. Set to empty or null to disable

File name for error CSV file String This option might be useful when you need to have a specific file name where to register the lines in error of your CSV file. The name can both be linked to some workflow properties surrounded with ${...} (ex/ campaign, punnetId, etc) or hard-written. Warning: This value can be overwritten by the Associate CSV-error file with original CSV filename option lines_in_error.csv

New column names to set String list If empty, populated from first line

Folder path for error CSV file String The error file will be stored in your system. You can choose where by configuring this very field. Here as well you can set the path either with workflow properties (${...}) or hard-write it ./csv_errors/

Number of lines to skip

Integer

This option helps to skip lines, meaning their data will not be processed. By default, only the 1st line is skipped considering it surely consists in the headers row

Ex/ In a file of 10 lines, putting ‘3’ in the input field will skip the 1st, 2nd and 3rd lines

1

Generate hash of CSV content Boolean The hash of the content will be generated and stored in the punnet among a property named hashData false

Continue on fail Boolean If enabled, the following errors will not trigger an exception:
- CSV file does not exist
- CSV file is empty (no line)
- CSV file has only headers and no line for documents.

Note that if you give 5 CSV paths and the number 3rd is in error, only the Fast2 logs will provide information regarding the failing CSV file.

Column headers in first CSV file only Boolean Only read column definitions from the first parsed CSV file false

File encoding String CSV encoding character set UTF-8

CSV separator String Only the first character will be considered ;

Associate CSV-errors file with original CSV filename Boolean This checkbox allows you to match your error file with your original CSV file, just suffixing the original name with ‘_KO’. That way, if you use multiple files, all the lines in error will be grouped by file name. Using this option overwrite the File name for error CSV file, but still can be used in addition of the Folder path for error CSV file false

Stop at first error in CSV Boolean Fast2 will automatically be stopped at the first error encountered in the CSV false

File scanner (Deprecated) FileScanner THIS OPTIONS IS DEPRECATED, consider using the ‘CSV paths’ instead.

Column of document ID String Column header of the metadata to set as the document ID documentId

Document property name containing CSV file path String Set to empty or null to disable

Move to path when finished String Consider using ${variable} syntax

Document per punnet

Integer

Number of documents each punnet punnet will carry

Ex/ By setting this value to 2, each punnet created will contained 2 documents

1

Extra columns String list List of the form target=function:arg1:arg2:…

DctmSource

Complete extractor from Documentum

This connector will extract basic information from the source Documentum repository. Since Documentum architecture involves particular port and access management, a worker should be started on the same server where Documentum is running.

Make sure to check the basic requirements at the setup for Documentum on the official Fast2 documentation.

Mandatory settings

Key Type Description

Connexion information to Documentum Repository DctmConnectionProvider

The DQL Query to run to fetch documents

String

The less attributes you fetch, the faster the query will be executed on the Documentum side.

Ex/ SELECT r_object_id FROM dm_document WHERE ...

Optional settings

Key	Type	Description	Default value
Batch size	Integer	If size is <1, the size will be defined from the Documentum server-side.	50

FileNet35Source

Complete extractor from FileNet 3•5

The FileNet35Source retrieves existing documents from the FileNet P8 3.5 ECM through a query. This punnet will contain the metadata of the recovered document, its content and annotations

Mandatory settings

Key	Type	Description
FileNet 3.5 connection provider	FileNet35ConnectionProvider	Connection parameters to the FileNet instance
SQL query	String	SQL query corresponding to the list of documents to extract

Optional settings

Key

Type

Description

Default value

Attribute used for Document IDs

String

Name of the FileNet P8 3.5 attribute corresponding to the values retrieved in the Document IDs list

Id

Empty punnet when no result

Boolean

An empty punnet will be created even if the result of the query is null

false

Documents per punnet

Integer

Number of documents each punnet punnet must carry on

Ex/ By setting this value to 2, each punnet created will contained 2 documents

1

Document IDs

DocumentIdList

Source list of documents to extract from their IDs

FileNetSource

Complete extractor from FileNet P8

The FileNetSource source retrieves existing documents from the FileNet P8 5.x ECM through an SQL query. This punnet will contain the metadata of the recovered document, security information and parent folders.

Mandatory settings

Key	Type	Description
Object store name	String list	Name of the repository to extract from
SQL query	String	SQL query corresponding to the list of documents to extract
FileNet connection provider	FileNetConnectionProvider	Connection parameters to the FileNet instance

Optional settings

Key

Type

Description

Default value

Documents per punnet

Integer

Number of documents each punnet punnet must carry on

Ex/ By setting this value to 2, each punnet created will contained 2 documents

1

Number of entries per result page

Integer

Number of results returned per page by the FileNet P8 query

1000

Extract object type properties

Boolean

The FileNet P8 metadata of the document which are Object type will be saved at the punnet level

false

Extract FileNet system properties

Boolean

System metadata during extraction is saved at the punnet level

false

Properties to extract

String list

Exhaustive list of FileNet metadata to extract. If empty, all properties will be extracted.

Extract documents instance informations

Boolean

The fetchInstance method makes a round trip to the server to retrieve the property values of the ObjectStore object

false

Extract FileNet security

Boolean

The security of the document will be saved at the punnet level

false

Extract folders absolute path

Boolean

The absolute path of the folder inside the FileNet instance will be extracted during the process

false

Throw error if no result

Boolean

Throw exception when SQL Query finds no result.

LocalSource

A generic broker for wildcarded punnet lists

This class will search for local files to analyze them from a defined path

Mandatory settings

Key	Type	Description
File scanner	FileScanner	Recovers your files

Optional settings

Key	Type	Description	Default value
Fallback XML/Json parsing	Boolean	If true, the file will be added as document content in the punnet when XML parsing fails. Consider adding this file as a regular file (not an XML)	false
Skip parse exceptions	Boolean	The task does not throw an error when XML parsing fails. Do not stop parsing and resume to next candidate	false
XSL Stylesheet path	String	The XSL stylesheet file to use when parsing XML files
Number of files per punnet	Integer	If the files are not in XML format, the punnet will contain as many documents as defined in this option	1
Allow any kind of file	Boolean	All types of files can be added. Otherwise, only XML-based Punnet descriptions are allowed	true
Skip XML parsing	Boolean	The XML file will not be parsed before being added to the punnet. Not recommended in most cases	false
Maximum number of files scanned	Integer	If this field is completed, the number of files scanned will not exceed the value filled in. Leave empty to retrieve all files matching input pattern filter

MailSource

Complete extractor from mail box

The MailSource task extracts messages from an e-mail box. Each extracted message will correspond to a punnet, one document per punnet

Mandatory settings

Key	Type	Description
MailBox connection provider	MailBoxProvider

Optional settings

Key Type Description Default value

Search in Headers

String

Enter a pair of header and pattern to search separated by a colon :.

Ex/ cc:copy

Header names String list List of header names (case-sensitive) to retrieve from the mail. Message-Id, Subject, From, To, Cc and Date are added by default

Start Id Integer Index from which the first message should be extracted 1

Update document with mail root folder name String Name of the metadata to add to the document. If filled, the full name of the source folder is indexed in this metadata. Set to null or empty to disable updating

Folders to scan String list List of files to scan in the mailbox. If filled, override root folder name from MailBox connection provider configuration

AND condition for search Boolean Checking this options will only retrieve messages matching all search conditions possible (unread messages, text in header, body or subject). If unchecked, the ‘OR’ operand will be applied.

Forbidden characters String List of characters to remove from Message-Id when building the DocumentId <>:"/|?*

Only unread messages Boolean

Search in Subject String

Search in Body String

RandomSource

Random punnet generator

Randomly produces punnets containing documents, metadata, content…

Mandatory settings

Key	Type	Description	Default value
Number of punnet to generate	Integer	If ‘minimum punnet number’ is set, this value here will be considered as the higher threshold	1000

Optional settings

Key	Type	Description	Default value
Maximum document number	Integer	Excluded	1
Minimum content number	Integer	Included
Minimum metadata number	Integer	Included	1
Minimum punnet number	Integer	If not set, the number of generated punnets will be exactly the number set at ‘Number of punnets to generate’
Maximum number of metadata values	Integer	Included	6000
Minimum number of metadata values	Integer	Included	0
Maximum content number	Integer	Excluded
Maximum metadata number	Integer	Excluded	10
Minimum document number	Integer	Included	1

SQLSource

Complete extractor from SQL database

Extract and map to punnet or document layout specified properties

Mandatory settings

Key	Type	Description
SQL connection provider	SQLQueryGenericCaller
SQL query	String	Select precisely documents you want to extract through a classic SQL query

Optional settings

Key	Type	Description	Default value
Property name to group by document	String	Column used to group lines by document. If used set an ‘ORDER BY’ in your sql query
SQL mapping for punnet	String/String map	Mapping of SQL properties to punnet metadata. Use ‘punnetId’ for Punnet Id
Allow duplicates data	Boolean
Property name to group by punnet	String	Column used to group lines by punnet. If used set an ‘ORDER BY’ in your sql query
SQL mapping for document	String/String map	Mapping of SQL properties to document metadata. Use ‘documentId’ for Document Id, otherwise the first column will be used as documentId
Push remaining, non-mapped columns as document properties	Boolean		true

Source tasks

AlfrescoRestSource

Mandatory settings

Optional settings

AlfrescoSource

Mandatory settings

Optional settings

AWSSource

Mandatory settings

Optional settings

BlankSource

Mandatory settings

Optional settings

CMODSource

Mandatory settings

Optional settings

CMSource

Mandatory settings

Optional settings

CSVSource

Mandatory settings

Optional settings

DctmSource

Mandatory settings

Optional settings

FileNet35Source

Mandatory settings

Optional settings

FileNetSource

Mandatory settings

Optional settings

LocalSource

Mandatory settings

Optional settings

MailSource

Mandatory settings

Optional settings

RandomSource

Mandatory settings

Optional settings

SQLSource

Mandatory settings

Optional settings

ZipSource

Mandatory settings

Optional settings