Content source tasks

The Content source tasks are used alongside with source tasks. You first need the source task to scan and identify all the documents and data to extract, then you will be able to extract the content faster (with more dedicated threads or workers) to fulfill this resource-demanding step.


Alfresco content extractor using CMIS technology

This alfresco extractor will use the CMIS technology to fetch your document content from a given Alfresco repository

Mandatory settings

Key Type Description
Alfresco connection provider AlfrescoCMISConnectionProvider CMIS version must be 1.1
Key Type Description Default value
Property Helper PropertyHelper
Extract document content Boolean true
Basic content extractor from Content Manager

This class is dedicated to the extraction of content for the Content Manager solution. You'll have the possiblity to extract annotations custom properties or even logs

Mandatory settings

Key Type Description
CM connection provider CMConnectionProvider
Key Type Description Default value
Extract standard system properties Boolean true
Extract advanced system properties from DKDDO object Boolean true
Extract document annotation Boolean false
Extract note logs Boolean false
Default page height, used when extracting annotations Float 842.0f
Extract custom properties Boolean true
Extract note logs as annotations Boolean false
Annotation converter CMAnnotationConverter
Extact history logs Boolean true
Default page width, used when extracting annotations Float 595.0f
Save annotations as XFDF instead of raw CM format Boolean true
Extract document content Boolean true
Basic CMOD content extractor

Mandatory settings

Key Type Description
CMOD Connection Settings CMODConnectionProvider
Key Type Description Default value
Pattern to store resource files String ${resourceId}
Extract content from FileNet 3•5

Use this task to retrieve content of documents to extract from a given FileNet instance. This task needs to be preceeded by a FileNet35Source task.

Mandatory settings

Key Type Description
FileNet 3.5 connection provider FileNet35ConnectionProvider Connection parameters to the FileNet instance
Key Type Description Default value
Ignore documents with zero-sized content Boolean Document without any content will not be processed false
Extract document content from FileNet P8

This task is not a real source task. The documents to be extracted are identified by an BlankSource task generating a set of 'empty' Punnets, i.e. containing only documents each bearing a document number (documentId) to extract.

Mandatory settings

Key Type Description Default value
FileNet connection provider FileNetConnectionProvider Connection parameters to the FileNet instance
Object store name String Name of the repository to extract from OS1
Key Type Description Default value
Extract FileNet system properties Boolean System metadata during extraction is saved at the punnet level false
Properties to extract String list List of FileNet P8 metadata of the document to be saved at the punnet level. Extract all properties into the document if leaved empty
Extract FileNet security Boolean The security of the document will be saved at the punnet level false
Extract folders absolute path Boolean The absolute path of the folder inside the FileNet instance will be extracted during the process false
Extract object type properties Boolean The FileNet P8 metadata of the document which are Object type will be saved at the punnet level false
Extract document content Boolean The document content will be extracted during the process true
Parse FWTF (Fixed Width Text File) with external content to a punnet description

An MDO file is a flat file defined such as: each line corresponds to a document and each line contains information about the document The extraction of information from each line is based on a CSV configuration file, which provides the name of the metadata to be inserted into the punnet document, as well as its characteristics. It consists of the following columns, separated by a comma: - Field: name of the metadata to add - Length: length of the metadata. If the value is greater than this length, then it will be truncated. If the value is lower, it will be completed by spaces on the right - Offset: position in MDO file - Mandatory: Y / N - Occurs: number of occurrences allowed for the field. The successive values ​​of the field will then be added to the values ​​of the metadata (respecting the Length parameter for each one) - Type: Type of metadata to add to the punnet document The MDOParserExternalContent task is used to retrieve external content for each document. To do this, the name of the column defining the content path is specified in the task settings.

Mandatory settings

Key Type Description
MDO format specification file path String CSV configuration absolute file path containing MDO format specification
Key Type Description Default value
File scanner FileScanner Recovers your files
Date format String Date format used in MDO file. Must be the same for each line of the document yyyy-MM-dd
Property name containing path content String Name of the field in the configuration file that contains the path to the content. If not filled, the content will not be saved in the punnet
Dataline property name String Name of the metadata that will contain the MDO line read. If not specified, the line read will not be saved in the punnet
Create one punnet for each document of FWTF Boolean If true then a punnet with one document will be created for each entry in the MDO file. Otherwise, one punnet will be created containing as many documents as there are entries in the MDO file false
contentLocationAbsolute Boolean
Last punnet property name String Data name indicating which punnet is the last of document in punnet. If null, data isn’t added in punnet. For multipunnet case only
FWTF (Fixed Width Text File) parser with internal content

Like the MDOParserExternalContent task, the MDOParserExternalContent source allows you to parse each line of the MDO file in Punnet. The difference between these two tasks is that the content is stored inside the MDO itself. The start and end of the content is defined by a tag specified in the task settings

Mandatory settings

Key Type Description
MDO format specification file path String CSV configuration absolute file path containing MDO format specification
Key Type Description Default value
File scanner FileScanner Recovers your files
Date format String Date format used in MDO file. Must be the same for each line of the document yyyy-MM-dd
End tag String End tag property name signifying the end of the content
Dataline property name String Name of the metadata that will contain the MDO line read. If not specified, the line read will not be saved in the punnet
Create one punnet for each document of FWTF Boolean If true then a punnet with one document will be created for each entry in the MDO file. Otherwise, one punnet will be created containing as many documents as there are entries in the MDO file false
Last punnet property name String Data name indicating which punnet is the last of document in punnet. If null, data isn’t added in punnet. For multipunnet case only
Original text content property name String Data name containing original text content. If null, data isn’t added in the punnet