LogoQuantium

Joiner

The Joiner module allows you to combine multiple data files into a single consolidated file. This straightforward but powerful module is essential for combining fragmented datasets or merging results from different processing operations.

Joiner is particularly useful when working with data split across multiple files or when consolidating results from different data sources before further processing.

Options

File or folder containing the files you want to join together.

part1.txt
part2.txt
part3.txt

When selecting a folder, Joiner will combine all compatible files within that folder.

Usage Guide

Follow these steps to effectively use the Joiner module:

Select Source Files

Choose the file or folder containing the data you want to combine.

When selecting a folder, all compatible files in that folder will be combined.

Configure Deduplication

Decide whether to enable deduplication:

  • Enable Dedup if you want to ensure no duplicate entries exist in the output
  • Disable Dedup for faster processing when duplicates are acceptable or unlikely

Execute Joining

Run the module to combine the selected files into a single output file.

Verify Results

Check the output directory for the combined file and verify it contains all the expected data.

Example Use Cases

When working with files that have been split (e.g., from the Splitter module):

  1. Select the folder containing all split parts
  2. Enable Dedup if necessary (typically not needed for previously split files)
  3. Run the module

The result will be a reconstructed file containing all the data from the split parts.

Format Consistency

For best results, ensure that all files being joined use the same format and structure. Combining files with different formats may lead to inconsistent data in the output.

Performance Considerations

  • Without Deduplication: Joining files without deduplication is very fast and uses minimal memory, as it simply concatenates files.
  • With Deduplication: Enabling deduplication requires additional memory and processing time, proportional to the total size of the input files.

Memory Usage Tip

If you're joining very large files with deduplication enabled, consider adjusting the deduplication strategy in the main settings to "LowMemory" if your system has limited RAM.

  • Splitter - For dividing large files into smaller parts
  • AntiPublic - For more advanced deduplication across databases

On this page