BGW Consortium Day at Watson
Objective:
Create an opportunity for members of the Consortium to run both
Applications and Computer Science projects on a 20 rack Blue Gene System.
- Potential to do break-through Science
- Potential for improving Scaling, performance and Software
development
Timing:
Quarterly – First session October 26 and 27, 2005
- Second session March 29 & 30, 2006
- Third session July 19 & 20 2006
- Fourth session October 18 & 19 2006
- Fifth session March 21 & 22 2007
- Sixth session July 25 & 26 2007
- Seventh session October 31 & November 1 2007
- Call for nominations 09/18/07
- Final submission date 10/02/07
- Selection of candidates 10/05/07
- User preparations begin 10/22/07
- Applications Day 10/31/07
- Computer Science 11/01/07
Opportunity:
IBM will provide 2 days of
access to Blue Gene Watson for Application scaling runs and Computer
Science projects at IBM’s Watson Research
Center. (Blue Gene at Watson – 114 Tflops, 20 racks).
Day 1 (October 31) is for Application
scaling runs; day 2 (November 2) is for Computer Science projects.
Candidate selection:
Up to four proposals each will be selected for the Applications and the Computer Science sessions.
Time Line:
Approximately ten days prior to the user’s session, the user will be contacted by an IBM advisor to undertake workshop preparation. The user’s programs and data will be downloaded onto the user’s BGW account; the user’s storage and I/O needs will be verified. As needed, other preparations will be undertaken to ensure that the user is ready to run on the day of the event.
On the day of the event, users will begin with test runs that scale their programs to run on 1, 4, and 16 racks. Users will be on-site at IBM’s Watson Research Center where IBM advisors will be present to assist with any problems encountered. All test runs will be completed during the morning of the user’s session. Assuming the test runs are successful, each candidate will then be given a three-hour allocation on 20 racks.
Candidates who are successful
on the 20 rack runs will be given an
opportunity to apply for additional time. Proposals for the additional time
will be submitted after the initial runs are
complete.
Restrictions:
All BGW Days participants must have a signed and executed BGW Access Agreement in place to participate; the participant’s employer and IBM will be the signatories of this agreement. Sample agreements are available from Fred Mintzer, mintzer@us.ibm.com.
IBM is not able to accept any ITAR (International Traffic in Arms Regulations) materials onto BGW. See http://www.pmddtc.state.gov/itar_index.htm . Proposals which would entail placing ITAR materials on BGW will be rejected; please do not submit any such proposal.
Applications:
1. Proposers must have come
to an Applications workshop and ported codes to BG
- Candidates may also come from attending SDSC, NIWS or Edinburgh
Workshops.
- Members not attending these sessions must have ported and ran their codes on
One of the Consortium systems: ( ANL, SDSC, LLNL, NIWS, Edinburgh, Astron, NCAR)
2. Proposers fill
out Applications proposal form and
submit to peer review.
3. Peer review of proposals to select four
Application proposals for the runs on BGW.
* User must run on the ANL 1000 node system to establish some level of
legitimacy that
their codes have ported properly and will run.
* User will be expected to sign an end user agreement with IBM.
* Users must agree to publish the results. This responsibility includes preparing a short
report that may be
posted on the BG/L Consortium’s web site.
Applications Form: http://www.bgconsortium.org/bgwconsortium.htm
Computer Science:
1. Proposers fill out
Computer Science proposal form
2. Peer review to select four Computer Science proposals for operation on BGW.
*. Projects can be for scaling, performance ( I/O, file systems, MPI, etc ) and
for tools
(TAU,PETSc, Jumpshot, etc )
* User must run on the ANL 1000 node system to establish some level of legitimacy that
Their codes have ported properly and will run.
* User will be expected to sign an end user agreement with IBM.
* Users must agree to publish the results. This responsibility includes preparing a short
report that may be
posted on the BG/L Consortium’s web site.
Computer Science Applications Form:
http://www.bgconsortium.org/bgwconsortium.htm
Peer review:
The peer review committee
will be comprised of ANL and IBM
employees, equally represented.
Project Assistance:
Once chosen, advisors will be assigned to each individual selected. The advisors will be from Argonne National Laboratory and will be supported by IBM advisors depending on the application or Computer Science project chosen.
Access to IBM system:
For remote BGW access, the user will need to join a knowledgeable IBM employee in that employee’s office. The employee will provide a workstation with software prepared for BGW access and will provide consulting on access to, and use of, BGW. The community of knowledgeable BGW users, remote to IBM Watson, is small; opportunities for remote access are similarly small.
Access to BGW will require the user to have three different IBM user accounts: one for security, one for remote connection and one for BGW usage. IBM will set up the accounts in advance.
Data may be entered through the employee’s workstation if it is on DVD, then transferred into the user’s BGW workspace. Alternately, data may be transferred into the user’s BGW workspace via a staging server; a knowledgeable IBM employee will be required to do the transfer. This should be done in advance of the workshop to avoid any transfer bottlenecks.
For local BGW access at IBM’s Watson Research Center, the user will join a knowledgeable IBM employee in the employee’s office. The employee, who will provide a workstation with software prepared for BGW access and will provide consulting on access to, and use of, BGW.
Access to BGW will require the user to have two different IBM user accounts: one for security and one for BGW usage. IBM will set up the accounts in advance.
Data may be entered through the employee’s workstation if it is on DVD, then transferred into the user’s BGW workspace. Alternately, data may be transferred into the user’s BGW workspace via a staging server; a knowledgeable IBM employee will be required to do the transfer. This should be done in advance of the workshop to avoid any transfer bottlenecks.
Local access at Watson is strongly preferred. However, please note that IBM provides no travel support to BGW Days participants.
The Following is the IBM BGW system configuration:
Hardware:
Blue Gene Watson (BGW) is a twenty-rack Blue Gene System located at IBM’s Thomas J Watson Research Center in Yorktown Heights, NY. There are 1024 nodes per rack with each node containing two (2) 700 MHz processors and 512 MB of memory. Each rack has 16 I/O nodes.
BGW has 32 TB of storage for user-generated data. IBM will earmark 4 TB of storage for workshop attendees. However, since the remote data access is bandwidth-limited, IBM recommends each user limit data in or out to 20 GB. Storage used by workshop attendees will be reclaimed when the workshop has completed.
The networks supported for applications are: MPI
BGW can only be accessed through IBM’s internal computer network, which is managed to strict security guidelines; external access must use approved IBM software. The bandwidth for remote access into the local network that supports BGW is 1 gb / second.
The front End nodes for compiling and job scheduling are IBM PowerPC servers with Linux
BGW standard software includes IBM Linux CNK, IBM C/C++ and Fortran compilers and run time facility, MPI (MPICH), GNU toolchain, Posix-compliant system calls
Administrative Detail:
Information about IBM’s Watson Research Center (Directions, Lodging and Eating) is given on their web site. http://www.watson.ibm.com/general_info_ykt.shtml Use the information for the Yorktown facility; this is where BGW is located.
- The closest hotel is the Mount Kisco Holiday Inn
- Contact Fred Mintzer if additional information required