Joiner
The Joiner module allows you to combine multiple data files into a single consolidated file. This straightforward but powerful module is essential for combining fragmented datasets or merging results from different processing operations.
Joiner is particularly useful when working with data split across multiple files or when consolidating results from different data sources before further processing.
Options
File or folder containing the files you want to join together.
When selecting a folder, Joiner will combine all compatible files within that folder.
Usage Guide
Follow these steps to effectively use the Joiner module:
Select Source Files
Choose the file or folder containing the data you want to combine.
When selecting a folder, all compatible files in that folder will be combined.
Configure Deduplication
Decide whether to enable deduplication:
- Enable Dedup if you want to ensure no duplicate entries exist in the output
- Disable Dedup for faster processing when duplicates are acceptable or unlikely
Execute Joining
Run the module to combine the selected files into a single output file.
Verify Results
Check the output directory for the combined file and verify it contains all the expected data.
Example Use Cases
When working with files that have been split (e.g., from the Splitter module):
- Select the folder containing all split parts
- Enable Dedup if necessary (typically not needed for previously split files)
- Run the module
The result will be a reconstructed file containing all the data from the split parts.
Format Consistency
For best results, ensure that all files being joined use the same format and structure. Combining files with different formats may lead to inconsistent data in the output.
Performance Considerations
- Without Deduplication: Joining files without deduplication is very fast and uses minimal memory, as it simply concatenates files.
- With Deduplication: Enabling deduplication requires additional memory and processing time, proportional to the total size of the input files.
Memory Usage Tip
If you're joining very large files with deduplication enabled, consider adjusting the deduplication strategy in the main settings to "LowMemory" if your system has limited RAM.
Related Modules
- Splitter - For dividing large files into smaller parts
- AntiPublic - For more advanced deduplication across databases