MultiGrid Interoperation TDDFT/Ninf-G Experiment
- TDDFT/Ninf-G Overview
- TDDFT Requirements
- MultiGrid Resources
- Account application
- The user had have a PRAGMA account before the Multi-Grid experiment.
- The user requested a TeraGrid account on the Web.
- The account was creaated on only UC/ANL site for the demo without the formal process.
- Deployment
- The user wrote a system requirement for the application.
- The GOC sent the requirement to the PRAGMA sites and the UC/ANL.
- After notification, the user confirmed the requirement is met on the site.
- The user upload an input data file (212 MB) to each site.
- The user upload an application program written in Fortran 90.
- The user compiled the program on the site.
- Test on each site
- Application level test
- The user checked if the program works normally on a signle node.
- Local scheduler level test
- The user tested a simple job submission to a local scheduler.
- Globus level test
- The user tested a simple GRAM job submission from the client node to the cluster.
- Ninf-G level test
- The user tested the TDDFT/Ninf-G/Globus application using the client node on AIST and a single node on the site.
- Experiments
- The client always runs on ume.hpcc.jp (AIST).
- Target molecules of TDDFT: Ligand-protected Au_13
- 122 tasks in one loop
- Input data per task: 4.87 MB
- Ouput data per task: 3.25 MB
- 2/10 10:34 JST - 10:56 (22 minutes)
- 6 servers: 4 SDSC, 2 UC/ANL, 2 AIST
- 1220 successful RPCs and 2 errors
- The server programs on UC/ANL went down after 15 minutes because max_wall_time was not set by the user.
- 2/11 00:06 JST - 2/16 01:13 (5 days and 1 hour and 7 minutes)
- 14 servers: 6 SDSC, 6 UC/ANL, 2 AIST
- 464162 successful RPCs and 6 errors
- UC/ANL servers was queued for 3 and half hours in the beginning.
- job_startTimout of Ninf-G was disabled.
- The server programs on UC/ANL went down after 50-hours-running because of max_wall_time.
- 8/16 17:44 JST - 8/17 02:11 (8 hours and 27 minutes)
- 4 servers: 4 OSG
- 1220 successful RPCs and no error
- This was a test run on the OSG cluster and the result was successful.