Filter
The Filter module allows you to extract or exclude lines from files based on specific patterns. With its support for regular expressions, this module offers powerful capabilities for isolating exactly the data you need from large datasets.
This module is particularly useful when you need to extract specific types of data from mixed sources or clean files by removing entries that match certain patterns.
Options
File or folder containing the files you want to filter.
Usage Guide
Follow these steps to effectively use the Filter module:
Select Source Files
Choose the file or folder containing the data you want to filter.
Define Filter Pattern
Enter a regular expression pattern that matches the lines you want to extract (or exclude if using Invert).
Choose Filter Mode
Decide whether to:
- Keep matching lines (default behavior)
- Keep non-matching lines (enable the Invert option)
Execute Filtering
Run the module to process your files according to the specified pattern and mode.
Verify Results
Check the output directory for the filtered files and verify they contain the expected data.
Example Use Cases
To extract lines containing specific email domains:
- Set Filter to
(gmail|yahoo|hotmail).com
- Leave Invert disabled
- Run the module
This will extract only lines containing "gmail.com", "yahoo.com", or "hotmail.com".
Regular Expression Resources
If you're not familiar with regular expressions, consider using online resources like regex101.com to build and test your patterns before applying them to large datasets.
Related Modules
- UlpSorter - For more advanced pattern-based sorting
- UlpCleaner - For specialized cleaning of url:login:password files