python - Prioritizing Greenlet workers for parallel read/writes/db access -
i need read 3 csv files of size 10gb each , write parts of file 3 files. in middle, there minor conditions involved , mongo query(6bil collection, indexed) each row.
i thinking using gevent pools task not sure how prioritize read tasks on writes,ensuring read finished before writers exit out.
i dont want block writers until read finished.
- i can spawn 3 readers put in queue.
- i can spawn 20-25 io-processors read queue, mongodb call , write writer queue.
- i can spawn 3 writers read write queue , write files.
- i can joinall on pool.
now can keep queue timeout in io-processors , writers. ensure of readers have put complete data in queue? or possible put join on readers @ end of io-processors?
in short, want learn if there optimal approach use perform task efficiently.
Comments
Post a Comment