Python - Search words using multiprocessing and mmap


  1. End result. File names containing search words.

    data/sample7.txt, data/sample9.txt, data/sample3.txt
  2. Create data folder and sample text files in the folder.

  3. Create a new python file and paste below code. Update url to the webhook eg)

    import mmap
    from multiprocessing import Pool
    from pathlib import Path
    from typing import List
    search_words=['Taehee Choi', 'taeheechoi']
    def file_list() -> List[str]:
        return [str(path) for path in Path().glob('data/*.*')] # path for all files in data folder
    def search_word(file: str) -> str:
        with open(file) as f:
            with mmap.mmap(f.fileno(), length=0, access=mmap.ACCESS_READ) as mm:
                for word in search_words:
                    if mm.find(bytes(word, 'utf-8')) != -1:
                        return # return name only <_io.TextIOWrapper name='data/sample9.txt' mode='r' encoding='utf-8'>
    def main():
        file_found = []
        with Pool(4) as pool: # 4 worker processes
            file_found +=, file_list())
        file_name_found = ', '.join([file for file in file_found if file]) 
    if __name__ == '__main__':
  4. Run Ctrl + F5 or Right click >> "Run Python File in Terminal" within VS Code.



taeheechoi © 2023