python - Prioritizing Greenlet workers for parallel read/writes/db access -


i need read 3 csv files of size 10gb each , write parts of file 3 files. in middle, there minor conditions involved , mongo query(6bil collection, indexed) each row.

i thinking using gevent pools task not sure how prioritize read tasks on writes,ensuring read finished before writers exit out.

i dont want block writers until read finished.

  • i can spawn 3 readers put in queue.
  • i can spawn 20-25 io-processors read queue, mongodb call , write writer queue.
  • i can spawn 3 writers read write queue , write files.
  • i can joinall on pool.

now can keep queue timeout in io-processors , writers. ensure of readers have put complete data in queue? or possible put join on readers @ end of io-processors?

in short, want learn if there optimal approach use perform task efficiently.


Comments

Popular posts from this blog

java - Intellij Synchronizing output directories .. -

git - Initial Commit: "fatal: could not create leading directories of ..." -