Talend Tips: Processing Large Data when Using tUniqRow

In Talend, tUniqRow is a useful component which allows you to filter out only the distinct unique row from a set of data. While using this, you may hit into Java Heap Space issue (java.lang.OutOfMemoryError) if the data to be processed is very large (millions of rows).

 

One possible solution to this is by increasing the JVM so you can process more records.

 

JVMSetting

 

However, there is a limit to this setting depending on the Physical Memory available in the system. Setting it too high will cause performance issue to the system.

 

Another way is to use the Use of disk setting in the tUniqRow advanced settings. By using this setting, data will be stored temporarily in the local disk drive and Talend will process the data using the files instead.

 

This approach is somehow more efficient and will prevent Talend from using excessive system memory to process the data.

 

tUniqRow