|
1
|
- Only think as far as PRAGMA9, for now
- Assume there will be total 4 applications
- Biogrid, QMMD, Savannah, NewApplication
- If your compute nodes have private IP only, subtract Biogrid (total 3)
- Divide the number of nodes equally among them
- Use ACL (user access list) to dedicate
- Tell Biogrid application driver the names of the dedicated nodes (for
job submission to the nodes directly)
- Before the NewApplication is ready to run, attach the ACLs of all the
fault-tolerant applications to the nodes allocated for the
NewApplication
- See an illustrated example for initial setup on next slide
|
|
2
|
|
|
3
|
- Remove QMMD and Savannah ACLs from rocks-62.q, rocks-63.q, rocks-64.q
- Attach the new application ACL to the 3 nodes (queues)
- kill any running jobs on these nodes (queues)
- See the changed setup on next slide
|
|
4
|
|
|
5
|
- I think this should work for all our current applications. If you see
any problem, please discuss.
- I think this can be implemented on our cluster (rocks-52.sdsc.edu and
rocks-47.sdsc.edu). Can your site implement this? If not, is there an
alternative setup can accomplish the same effect?
- Welcome all comments. Thanks! J
|