Talend: How to use tFileOutputMSDelimited (multi-schema output)

This is a simple guide to show how you can use Talend to generate a complex delimited file (e.g, combination of parent and related child records) by using the tFileOutputMSDelimited component. In this scenario, the sample output that we will be generated is:

 

Parent Row 1,P1,This is sample description,2013-08-01
Child Row 1,C1,P1
Child Row 2,C2,P1

Parent Row 2,P2,This is sample description,2013-09-01
Child Row 3,C3,P2
Child Row 4,C4,P2

 

Step 1:

Before we use the tFileOutputMSDelimited component to generate the designated output, we will need to prepare 2 files (we assume that you already know how to do this using Talend 😀 ):

 

One that contains the unique parent rows, for example:

Parent Name,Parent ID,Description,Date
Parent Row 1,P1,This is sample description,2013-08-01
Parent Row 2,P2,This is sample description,2013-09-01

 

And one that contains all the related child rows with parent id, for example:

Child Name,Child ID,Parent Ref ID
Child Row 1,C1,P1
Child Row 2,C2,P1
Child Row 3,C3,P2
Child Row 4,C4,P2

 

Please note that, the Parent ID in the child file is very important as it will be used to match the parent row.

 

Step 2:

Create a job to read the Parent file and Child file by using the tFileInputDelimited component as shown below.

 

These are the schemas that we configured in the parentFile and childFile tFileInputDelimited component respectively

 

 

Step 3:

Add a tFileOutputMSDelimited component to the job and link the main output of the parentFile and childFile component to it.

 

 

Step 4:

Now, let’s configure the tFileOutputMSDelimited component to look like this:

 

 

The first schema which appear as “tFileOutputMSDelimited_1” is referring to the parentFile schema. Below will be the explanation of the configuration:

  • Parent row – you should leave this as empty because this schema will be the parent data
  • Parent key column – you should leave this as empty because this schema will be the parent data
  • Key column – this will be the Parent ID column name that is specified in the parentFile schema shown in Step 3. This is important as it will be used as the unique identifier for child row.

 

The second schema which appear as “row2” is the childFile schema. Below will be the explanation of the configuration:

  • Parent row – set it to row1, this is to tell Talend to park all the related child record under the parent record from row1
  • Parent key column – this will be the same as the key column of the parent row
  • Key column – this should be the Parent Ref ID field in the child schema instead of the Child ID. The reason being is that Talend will use this as the unique key to group all the related child into a list.

 

Once you have done the configuration above, you are good to go.

Get in touch today to see how WDCi can help your business.