Start Multi-Grid Interoperation
Experiment with TDDFT Run
Lessons Learned
Many scientists work across different grid projects and their applications should not be restricted by grid boundaries. Expanding collaborations among grids is important and beneficial to global scientific communities. But how can different grids interoperate and how can we make it easier for scientists to use multiple grids? To learn the issues and solutions, PRAGMA and TeraGrid initiated the first step in Multi-grid interoperation experiment, under the umbrella of the Multi-grid Interoperation https://forge.gridforum.org/projects/mgi [Charter document by C. Catlett and M. Satsuoka] activity in the Global Grid Forum.
As a principle within PRAGMA, we let applications drive interoperation. After some discussions between PRAGMA and TeraGrid, we selected the TDDFT (Time Dependent Density Functional Theory - a quantum chemistry application) as the first application to run across PRAGMA Grid and TeraGrid.
Within a week, we were able to start TDDFT run on 4 heterogeneous sites across both grids, thus achieving interoperations. Our experiences show that a level of interoperability is neither automatic nor unattainable.
Through this experiment, we have learned many valuable lessons as grid infrastructure supporters and grid application users. We also gained useful insights as middleware developers which will help to improve and advance future grid middleware development. In addition, Multi-Grid Testbed brought different grids together to work and learn from each other. All grids involved benefit greatly from this experiment.
We summarize our experiences and lessons learned in 3
sections below: People Involved, Process and Time, and Lessons Learned. For
more details about the Multi-Grid Interoperation testbed and experiment, please
see http://pragma-goc.rocksclusters.org/pragma-doc/multigrid.html.
We envision this as the first step to broader interoperation experiences with production grids involved in the Multi-grid Interoperation activity. We believe that the lessons we learned in this first step will help us more easily engage other grids.
People Involved
PRAGMA Grid
UCSD/SDSC, USA: Peter Arzberger, Phil Papadopoulos, Mason Katz,
AIST, Japan: Yoshio Tanaka, Yusuke Tanimura
KU, Thailand: Putchong Uthayopas,
Somsak Sriprayoonsakul
TeraGrid:
ANL, USA: Charlie Catlett, Dane Skow, JP Navarro
Process and Time
How did we start and conducted this experiment? Here are the steps and time frames:
01/27/06 - 01/31/06: Application drivers prepare and publish application requirements
02/03/06 - 02/04/06: Application drivers apply for user accounts on each grid
01/31/06 - 02/04/06: each site setup user accounts
01/31/06 - 02/08/06: each site implements the application requirements
02/01/06 - 02/08/06: Application drivers test user account access on each site
02/01/06 - 02/08/06: Application drivers deploy then test the application on each site
02/09/06 - 02/09/06: Application drivers start the application run**** (Interoperation)
02/09/06 - : Start discussion and deployment of grid
monitoring software - SCMSWeb, for cross grid monitoring
Lessons Learned
Human Communication
Issues: Team numbers
reside in 3 different time zones with up to 17 hours time difference, real-time
communication is difficult and costly.
Solutions: Change
working hours - U.S. team members work at night. Use skype and email as main
communication tools.
Trust and Access
Issues: Trust
between different certificate authorities are not readily established and may
take time to formally change the policy.
Solution: Temporary solution to establish localized trust.
Lessons: IGTF needs
to take effect for all the major grid operations. Not just to have trust
agreements at high-level, but actually provide easy way for all trusting
entities to install all IGTF trusted certificates. A grid is interested in
interoperate with other grid should make sure that its trust policy and
practice allow the trust of the other grid.
Application
requirements
Issues: Software
stack and versions are different amongst grids and from application
requirements. As the pool of interoperable grids grows, we are likely to run
into conflicting requirements between grid applications.
Solution: Applicatioins
should minimize assumptions and prerequisites, or include pre-requisites as
much as possible.
On the other hand, software stack providers need to adapt
their stacks to multi-grid environments: not assume Linux, RPM, and only one
version of package at a time, try to meet as many requirements as possible.
Issues: For new applications, users may learn new details about what their applications really need during deployment process. New grid environment may require modification and clarification to application requirements.
Solution:
Provided committed support, ready to install and upgrade software as needed.
Keep close communications between the system administrators and users, to
understand how the application would work in a different grid environment in
order to deploy software correctly.
Differences among
Grids
Issues: Different system configurations and software stack/configuration present difficulties for users and middleware developers alike, and may cause problem to some applications.
Sample cases:
"job_type = multiple"
behaves differently (solved via testing)
Calculation results are different (under investigation)
Different count attributes are required on different grids (modify middleware code - we have the middleware developer in the team)
Solution: Timely help/support to aid user with application deployment and execution.
Lessons Learned: Users need help
initially to learn the main differences. Further collaborative research and
development is required to extend and to standardize grid interfaces to meet
broader requirements and to enable interoperations among different grids.
Issues: Different grid monitoring software and lack of interfaces among them prevent easy cross-grid monitoring.
Lessons Learned: Further collaborative research and development is required to build and to standardize grid monitoring interface, in order to bridge different grid monitoring software and to enable interoperations among different grids.