Compilers I
Mathematical Applications
Compilers II
Signal and Image Processing
Collective Communication
Memory Hierarchy and I/O
Algorithms I
Operating Systems and Scheduling
Algorithms II
Multiprocessor Performance Evaluation
Databases and Sorting
Performance Prediction and Evaluation
Software Distributed Shared Memory
Scientific Simulation
Fault Tolerance
Performance and Debugging Tools
Distributed Systems
Industrial Track - Reconfigurable Systems
Industrial Track - Environments, Tools,
     and Evaluation Methods

please wait for entire file to load before selecting a session name

Keynote Speakers:
David E. Culler, University of California at Berkeley - What Is So Different About Cluster Architectures?
Jim Gray, Microsoft Research - Parallel Data Access and Parallel Execution in a World of CyberBricks
Greg Papadopoulos, Sun Microsystems - The Future of Scalable Systems: The Interplay of Architecture and Management

Panel Discussion:
Data Intensive vs. Scientific Computing: Will the Twain Meet for Parallel Processing?
Moderator: Vipin Kumar, University of Minnesota


Nearly Optimal Algorithms for Broadcast on d-Dimensional All-Port and Wormhole-Routed Torus
Jyh-Jong Tsay, Wen-Tsong Wang, National Chung Cheng University

Minimizing Total Communication Distance of a Time-Step Optimal Broadcast in Mesh Networks
Songluan Cang, Jie Wu, Florida Atlantic University

Hiding Communication Latency in Data Parallel Applications
Vivek Garg, David E. Schimmel, Georgia Institute of Technology

Protocols for Non-Deterministic Communication over Synchronous Channels
Erik D. Demaine, University of Waterloo

Broadcast-Efficient Algorithms on the Coarse-Grain Broadcast Communication Model with Few Channels
Koji Nakano, Nagoya Institute of Technology, Stephan Olariu, James L. Schwing, Old Dominion University

Optimal All-to-Some Personalized Communication on Hypercubes
Y. Charlie Hu, Rice University

Compilers I

Compiler Optimization of Implicit Reductions for Distributed Memory Multiprocessors
Bo Lu, John Mellor-Crummey, Rice University

Local Enumeration Techniques for Sparse Algorithms
Gerardo Bandera, Pablo P. Trabado, Emilio L. Zapata, University of Malaga - Campus of Teatinos

Optimizing Data Scheduling on Processor-In-Memory Arrays
Yi Tian, Edwin H.-M. Sha, Chantana Chantrapornchai, Peter M. Kogge, University of Notre Dame

An Expression-Rewriting Framework to Generate Communication Sets for HPF Programs with Block-Cyclic Distribution
Gwan-Hwan Hwang, Jenq Kuen Lee, National Tsing-Hua University

A Generalized Framework for Global Communication Optimization
M. Kandemir, Syracuse University, P. Banerjee, A. Choudhary, Northwestern University, J. Ramanujam, Louisiana State University, N. Shenoy, Northwestern University

Evaluation of Compiler and Runtime Library Approaches for Supporting Parallel Regular Applications
Dhruva R. Chakrabarti, Northwestern University, Antonio Lain, Hewlett Packard Labs, Prithviraj Banerjee, Northwestern University

Mathematical Applications

Preliminary Results from a Parallel MATLAB Compiler
Michael J. Quinn, Alexey Malishevsky, Nagajagadeswar Seelam, Yan Zhao, Oregon State University

Jacobi Orderings for Multi-Port Hypercubes
Dolors Royo, Antonio Gonzalez, Miguel Valero-Garcia, Universitat Politecnica de Catalunya

Automatic Differentiation for Message-Passing Parallel Programs
Paul Hovland, Christian Bischof, Argonne National Laboratory

Processor Lower Bound Formulas for Array Computations and Parametric Diophantine Systems
Peter Cappello, Omer Egecioglu, University of California at Santa Barbara

A Flexible Class of Parallel Matrix Multiplication Algorithms
John Gunnels, Calvin Lin, Greg Morrow, Robert van de Geijn, University of Texas at Austin

Caching-Efficient Multithreaded Fast Multiplication of Sparse Matrices
Peter D. Sulatycke, Kanad Ghose, State University of New York (Binghamton)


Permutation Capability of Optical Multistage Interconnection Networks
Yuanyuan Yang, University of Vermont, Jianchao Wang, GTE Laboratories, Yi Pan, University of Dayton

HIPIQS: A High-Performance Switch Architecture using Input Queuing
Rajeev Sivaram, Ohio State University, Craig B. Stunkel, IBM T.J. Watson Research Center, Dhabaleswar K. Panda, Ohio State University

On the Bisection Width and Expansion of Butterfly Networks
Claudson F. Bornstein, Carnegie Mellon University, Ami Litman, Technion, Bruce M. Maggs, Carnegie Mellon University, Ramesh K. Sitaraman, University of Massachusetts, Tal Yatzkar, Technion

Multiprocessor Architectures Using Multi-Hop Multi-OPS Lightwave Networks and Distributed Control
David Coudert, Afonso Ferreira, LIP ENS Lyon, Xavier Munoz, UPC

Distributed, Dynamic Control of Circuit-Switched Banyan Networks
Chuck Salisbury, Rami Melhem, University of Pittsburgh

A Case for Aggregate Networks
Raymond R. Hoare, Henry G. Dietz, Purdue University

Compilers II

An Enhanced Co-Scheduling Method Using Reduced MS-State Diagrams
R. Govindarajan, Supercomputer Education and Research Center and Indian Institute of Science, N.S.S. Narasimha Rao, Indian Institute of Science, E.R. Altman, IBM T.J. Watson Research Center, Guang R. Gao, University of Delaware

Predicated Software Pipelining Technique for Loops with Conditions
Dragan Milicev, Zoran Jovanovic, University of Belgrade

The Generalized Lambda Test
Weng-Long Chang, Chih-Ping Chu, Jesse Wu, National Cheng Kung University

Experimental Study of Compiler Techniques for Scalable Shared Memory Machines
Yunheung Paek, New Jersey Institute of Technology, David A. Padua, University of Illinois at Urbana-Champaign

Register-Sensitive Software Pipelining
Amod K. Dani, Indian Institute of Science, V. Janaki Ramanan, R. Govindarajan, Supercomputer Education and Research Center and Indian Institute of Science

Analyzing the Individual/Combined Effects of Speculative and Guarded Execution on a Superscalar Architecture
M. Srinivas, Silicon Graphics Inc., Alexandru Nicolau, University of California at Irvine

Signal and Image Processing

NOW Based Parallel Reconstruction of Functional Images
Frank Munz, T. Stephan, U. Maier, T. Ludwig, A. Bode, S. Ziegler, S. Nekolla, P. Bartenstein, M. Schwaiger, Nuklearmedizinische Klinik und Poliklinik des Klinikums rechts der Isar

An Improved Output-size Sensitive Parallel Algorithm for Hidden-Surface Removal for Terrains
Neelima Gupta, Sandeep Sen, Indian Institute of Technology (New Delhi)

Design, Implementation and Evaluation of Parallel Pipelined STAP on Parallel Computers
Alok Choudhary, Northwestern University, Wei-keng Liao, Donald Weiner, Pramod Varshney, Syracuse University, Richard Linderman, Mark Linderman, Air Force Research Laboratory

The VEGA Moderately Parallel MIMD, Moderately Parallel SIMD, Architecture for High Performance Array Signal Processing
Mikael Taveniku, Ericsson Microwave Systems AB and Chalmers University of Technology, Anders Ahlander, Ericsson Microwave Systems AB and Halmstad University, Magnus Jonsson, Halmstad University, Bertil Svensson, Halmstad University and Chalmers University of Technology

Medical Image Processing and Visualization on Heterogeneous Clusters of Symmetric Multiprocessors using MPI and POSIX Threads
Christoph Giess, Achim Mayer, Harald Evers, Hans-Peter Meinzer, Deutsches Krebsforschungszentrum

A Quantitative Code Analysis of Scientific Systolic Programs: DSP Vs. Matrix Algorithms
R. Sernec, BIA D.o.o., M. Zajc, J.F. Tasic, University of Ljubljana

Collective Communication

Tree-Based Multicasting in Wormhole-Routed Irregular Topologies
Ran Libeskind-Hadas, Dominic Mazzoni, Ranjith Rajagopalan, Harvey Mudd College

NoWait-RPC: Extending ONC RPC to a Fully Compatible Message Passing System
Thomas Hopfner, Franz Fischer, Georg Faerber, Technische Universitat Munchen

Efficient Barrier Synchronization Mechanism for the BSP Model on Message-Passing Architectures
Jin-Soo Kim, Soonhoi Ha, Chu Shik Jhon, Seoul National University

Performance and Experience with LAPI -- a New High-Performance Communication Library for the IBM RS/6000 SP
Gautam Shah, IBM Power Parallel Systems, Jarek Nieplocha, Pacific Northwest National Laboratory, Jamshed Mirza, Chulho Kim, IBM Power Parallel Systems, Robert Harrison, Pacific Northwest National Laboratory, Rama K. Govindaraju, Kevin Gildea, Paul DiNicola, Carl Bender, Pacific Northwest National Laboratory

Total-Exchange on Wormhole k-ary n-cubes with Adaptive Routing
Fabrizio Petrini, International Computer Science Institute

Managing Concurrent Access for Shared Memory Active Messages
Steven S. Lumetta, David E. Culler, University of California at Berkeley

Memory Hierarchy and I/O

Design and Implementation of a Parallel I/O Runtime System for Irregular Applications
Jaechun No, Syracuse University, Sung-soon Park, Anyang University, Jesus Carretero, Universidad Politecnica de Madrid, Alok Choudhary, Northwestern University, Pang Chen, Sandia National Laboratory

Using PI/OT to Support Complex Parallel I/O
Ian Parsons, Jonathan Schaeffer, Duane Szafron, Ron Unrau, University of Alberta

Code Transformations for Low Power Caching in Embedded Multimedia Processors
C. Kulkarni, IMEC, F. Catthoor, H. De Man, IMEC and Katholieke Universiteit Leuven

Memory Hierarchy Management for Iterative Graph Structures
Ibraheem Al-Furaih, Syracuse University, Sanjay Ranka, University of Florida

High-Performance External Computations Using User-Controllable I/O
Jang Sun Lee, A.I. Section ETRI, Sunghoon Ko, Syracuse University, Sanjay Ranka, University of Florida, Byung Eui Min, A.I. Section ETRI

Pin-down Cache: A Virtual Memory Management Technique for Zero-copy Communication
Hiroshi Tezuka, Francis O'Carroll, Atsushi Hori, Yutaka Ishikawa, Real World Computing Partnership

Algorithms I

Synthesis of a Systolic Array Genetic Algorithm
G.M. Megson, I.M. Bland, University of Reading

Vector Prefix and Reduction Computation on Coarse-Grained, Distributed-Memory Parallel Machines
Seungjo Bae, Dongmin Kim, Syracuse University, Sanjay Ranka, University of Florida

Solving the Maximum Clique Problem using PUBB
Yuji Shinano, Science University of Tokyo, Tetsuya Fujie, Tokyo Institute of Technology, Yoshiko Ikebe, Ryuichi Hirabayashi, Science University of Tokyo

A Scalable VLSI Architecture for Binary Prefix Sums
R. Lin, SUNY Genesco, K. Nakano, Nagoya Institute of Technology, S. Olariu, Old Dominion University, M.C. Pinotti, I.E.I. C.N.R., J.L. Schwing, Old Dominion University, A.Y. Zomaya, University of Western Australia

Emulating Direct Products by Index-Shuffle Graphs
Bojana Obrenic, Queens College and Graduate Center of CUNY

A Comparative Study of Five Parallel Genetic Algorithms Using The Traveling Salesman Problem
Lee Wang, Anthony A. Maciejewski, Howard Jay Siegel, Purdue University, Vwani P. Roychowdhury, UCLA


A New Self-Routing Multicast Network
Yuanyuan Yang, University of Vermont, Jianchao Wang, GTE Laboratories

Optimal Contention-Free Unicast-Based Multicasting in Switch-Based Networks of Workstations
Ran Libeskind-Hadas, Dominic Mazzoni, Ranjith Rajagopalan, Harvey Mudd College

Multicasting and Broadcasting in Large WDM Networks
Weifa Liang, University of Queensland, Hong Shen, Griffith University

Optimally Locating a Structured Facility of a Specified Length in a Weighted Tree Network
Shan-Chyun Ku, Biing-Feng Wang, National Tsing Hua University

Deterministic Routing of h-relations on the Multibutterfly
Andrea Pietracaprina, Universita di Padova

An Efficient Counting Network
Costas Busch, Brown University, Marios Mavronicolas, University of Cyprus

Operating Systems and Scheduling

Partitioned Schedules for Clustered VLIW Architectures
Marcio Merino Fernandes, University of Edinburgh, Josep Llosa, Universitat Politecnica de Catalunya, Nigel Topham, University of Edinburgh

Dynamic Processor Allocation with the Solaris Operating System
Kelvin K. Yue, Sun Microsystems Inc., David J. Lilja, University of Minnesota

Thread-based vs Event-based Implementation of a Group Communication Service
Shivakant Mishra, Rongguang Yang, University of Wyoming

Performance Sensitivity of Space-Sharing Processor Scheduling in Distributed-Memory Multicomputers
Sivarama P. Dandamudi, Hai Yu, Carleton University

Efficient Fine-Grain Thread Migration with Active Threads
Boris Weissman, Benedict Gomes, University of California at Berkeley and International Computer Science Institute, Jurgen W. Quittek, International Computer Science Institute, Michael Holtkamp, Technical University of Hamburg-Harburg

Clustering and Reassignment-Based Mapping Strategy for Message-Passing Architectures
M.A. Senar, A. Ripoll, A. Cortes, E. Luque, Universitat Autonoma de Barcelona

Algorithms II

Asymptotically Optimal Randomized Tree Embedding in Static Networks
Keqin Li, State University of New York (New Paltz)

Resource Placements in 2D Tori
Bader Almohammad, Bella Bose, Oregon State University

An O((log log n)^2) Time Convex Hull Algorithm on Reconfigurable Meshes
Tatsuya Hayashi, Koji Nakano, Nagoya Institute of Technology, Stephan Olariu, Old Dominion University

Toward a Universal Mapping Algorithm for Accessing Trees in Parallel Memory Systems
Vincenzo Auletta, Universita di Salerno, Sajal K. Das, University of North Texas, Amelia De Vivo, Universita di Salerno, M. Cristina Pinotti, I.E.I. Consiglio Nazionale delle Ricerche, Vittorio Scarano, Universita di Salerno

Sharing Random Bits with No Process Coordination
Marius Zimand, Georgia Southwestern State University

Lower Bounds on Communication Loads and Optimal Placements in Torus Networks
M. Cemil Azizoglu, Omer Egecioglu, University of California at Santa Barbara

Multiprocessor Performance Evaluation

Impact of Switch Design on the Application Performance of Cache Coherent Multiprocessors
Laxmi N. Bhuyan, H. Wang, R. Iyer, Texas A&M University, A. Kumar, Intel Corporation

Parallel Tree Building on a Range of Shared address Space Multiprocessors: Algorithms and Application Performance
Hongzhang Shan, Jaswinder Pal Singh, Princeton University

Configuration Independent Analysis for Characterizing Shared-Memory Applications
Gheith A. Abandah, Edward S. Davidson, University of Michigan

Experimental Validation of Parallel Computation Models on the Intel Paragon
Ben H.H. Juurlink, University of Paderborn

Comparing the Optimal Performance of Different MIMD Multiprocessor Architectures
Lars Lundberg, Hakan Lennerstad, University of Kariskrona/Ronneby

The Design of COMPASS: An Execution Driven Simulator for Commercial Applications Running on Shared Memory Multiprocessors
Ashwini K. Nanda, IBM T.J. Watson Research Center, Yiming Hu, University of Rhode Island, Moriyoshi Ohara, IBM Tokyo Research Lab, Caroline D. Benveniste, Mark E. Giampapa, Maged Michael, IBM T.J. Watson Research Center


An Efficient RMS Admission Control and Its Application To Multiprocessor Scheduling
Sylvain Lauzac, Rami Melhem, Daniel Mosse, University of Pittsburgh

Guidelines for Data-Parallel Cycle-Stealing in Networks of Workstations
Arnold L. Rosenberg, University of Massachusetts

Low Memory Cost Dynamic Scheduling of Large Coarse Grain Task Graphs
Michel Cosnard, LORIA-INRIA Loraine, Emmanuel Jeannot, Laurence Rougeot, LIP ENS de Lyon

Benchmarking the Task Graph Scheduling Algorithms
Yu-Kwong Kwok, Ishfaq Ahmad, Hong Kong University of Science and Technology

A Performance Evaluation of CP List Scheduling Heuristics for Communication Intensive Task Graphs
Benjamin S. Macey, Albert Y. Zomaya, University of Western Australia

Utilization and Predictability in Scheduling the IBM SP2 with Backfilling
Dror G. Feitelson, Ahuva Mu'alem Weil, Hebrew University of Jerusalem
filename at:

Databases and Sorting

High Performance Data Mining Using Data Cubes on Parallel Computers
Sanjay Goil, Alok Choudhary, Northwestern University

An Efficient Parallel Algorithm for High Dimensional Similarity Join
Khaled Alsabti, Syracuse University, Sanjay Ranka, University of Florida, Vineet Singh, Hitachi America Ltd.

Sorting on Clusters of SMP's
David R. Helman, Joseph JaJa, University of Maryland

An $AT^{2}$ Optimal Mapping of Sorting onto the Mesh Connected Array without Comparators
Ju-wook Jang, Sogang University

ScalParC: A New Scalable and Efficient Parallel Classification Algorithm for Mining Large Datasets
Mahesh V. Joshi, George Karypis, Vipin Kumar, University of Minnesota

Improved Concurrency Control Techniques for Multi-dimensional Index Structures
K.V. Ravi Kanth, F. David Serena, Ambuj K. Singh, University of California at Santa Barbara

Performance Prediction and Evaluation

A Clustered Approach to Multithreaded Processors
Venkata Krishnan, Josep Torrellas, University of Illinois at Urbana-Champaign

C++ Expression Templates Performance Issues in Scientific Computing
Federico Bassetti, New Mexico State University and Scientific Computing Group CIC-19, Kei Davis, Dan Quinlan, Scientific Computing Group CIC-19

Aggressive Dynamic Execution of Multimedia Kernel Traces
Benjamin Bishop, Robert Owens, Mary Jane Irwin, Pennsylvania State University

Performance Prediction in Production Environments
Jennifer M. Schopf, Francine Berman, University of California at San Diego

Predicting the Running Time of Parallel Programs by Simulation
Radu Rugina, Klaus E. Schauser, University of California at Santa Barbara

Software Distributed Shared Memory

Compile-time Synchronization Optimizations for Software DSMs
Hwansoo Han, Chau-Wen Tseng, University of Maryland

An Efficient Logging Scheme for Lazy Release Consistent Distributed Shared Memory System
Taesoon Park, Sejong University, Heon Y. Yeom, Seoul National University

Update Protocols and Iterative Scientific Applications
Pete Keleher, University of Maryland

Characterizations for Java Memory Behavior
Alex Gontmakher, Assaf Schuster, Technion

Locality and Performance of Page- and Object-Based DSMs
Bryan Buck, Pete Keleher, University of Maryland

Optimistic Synchronization of Mixed-Mode Simulators
Peter Frey, Radharamanan Radhakrishnan, Harold W. Carter, Philip A. Wilsey, University of Cincinnati

Scientific Simulation

Airshed Pollution Modeling: A Case Study in Application Development in an HPF Environment
Jaspal Subhlok, Peter Steenkiste, James Stichnoth, Peter Lieu, Carnegie Mellon University

Design of a FEM Computation Engine for Real-Time Laparoscopic Surgery Simulation
Alex Rhomberg, Rolf Enzler, Markus Thaler, Gerhard Troester, Eidgenossische Technische Hochschule

SIMD and Mixed-Mode Implementations of a Visual Tracking Algorithm
Mark B. Kulaczewski, Howard Jay Siegel, Purdue University

The Implicit Pipeline Method
John B. Pormann, John A. Board Jr., Donald J. Rose, Duke University

Rendering Computer Animations on a Network of Workstations
Timothy A. Davis, Edward W. Davis, North Carolina State University

Fault Tolerance

Hyper-Butterfly Network: A Scalable Optimally Fault Tolerant Architecture
Wei Shi, Pradip K. Srimani, Colorado State University

Scheduling Algorithms Exploiting Spare Capacity and Tasks' Laxities for Fault Detection and Location in Real-Time Multiprocessor Systems
K. Mahesh, G. Manimaran, C. Siva Ram Murthy, Indian Institute of Technology, Arun K. Somani, University of Washington

The Robust-Algorithm Approach to Fault Tolerance on Processor Arrays: Fault Models, Fault Diameter, and Basic Algorithms
Behrooz Parhami, Chi-Hsiang Yeh, University of California at Santa Barbara

Fault-Tolerant Switched Local Area Networks
Paul LeMahieu, Vasken Bohossian, Jehoshua Bruck, California Institute of Technology

Performance and Debugging Tools

Trace-Driven Debugging of Message Passing Programs
Michael Frumkin, Robert Hood, Louis Lopez, NASA Ames Research Center

Predicate Control for Active Debugging of Distributed Programs
Ashis Tarafdar, Vijay K. Garg, University of Texas at Austin

VPPB - A Visualization and Performance Prediction Tool for Multithreaded Solaris Programs
Magnus Broberg, Lars Lundberg, Hakan Grahn, University of Kariskrona/Ronneby

Parallel Performance Visualization Using Moments of Utilization Data
T.J. Godin, Michael J. Quinn, C.M. Pancake, Oregon State University

Distributed Systems

Optimizing Parallel Applications for Wide-Area Clusters
Henri E. Bal, Aske Plaat, Mirjam G. Bakker, Peter Dozy, Rutger F. H. Hofman, Vrije Universiteit

Prioritized Token-Based Mutual Exclusion for Distributed Systems
Frank Mueller, Humboldt-Universitat zu Berlin

Adaptive Quality Equalizing: High-Performance Load Balancing for Parallel Branch-and-Bound Across Applications and Computing Systems
Nihar R. Mahapatra, State University of New York at Buffalo, Shantanu Dutt, University of Illinois at Chicago

Memory Space Representation for Heterogeneous Network Process Migration
Kasidit Chanchio, Xian-He Sun, Louisiana State University

Industrial Track - Reconfigurable Systems

WILDFIRE(tm) Heterogeneous Adaptive Parallel Processing System
Bradley K. Fross, Senior WILDFIRE Application Engineer, Dennis M. Hawver, Principal Design Engineer, James B. Peterson, Principal Design Engineer
Annapolis Micro Systems, Inc.

ACEcard(tm): A High Performance Architecture for Run-Time Reconfiguration
Don Davis, Manager (Strategic Engineering), Jonathan Harris
TSI TelSys, Inc.

A Hardware / Software Co-Design System using Configurable Computing Technology
John Schewel, Vice President of Sales & Marketing
Virtual Computer Corporation

Industrial Track - Environments, Tools, and Evaluation Methods

DEEP: A Development Environment for Parallel Programs
Brian Brode, Vice President, Chris Warber, Senior Analyst, James Bonang, Software Engineer
Pacific-Sierra Research Corporation

Rapid Development of Real-Time Systems Using RTExpress
Milissa Benincasa, Senior Software Engineer, Richard Besler, Senior Software Engineer, Diane Brassaw, Senior Software Engineer, Ralph L. Kohler Jr., Program Manager (Air Force Research Laboratory)
Integrated Sensors, Inc.

Evaluating ASIC, DSP, and RISC Architectures for Embedded Applications
Marc Campbell, Technical Lead (High Performance Computing)
Northrop Grumman

The Effect of the Router Arbitration Policy on ServerNet(tm) Topolgies
Vladimer Shurbanov, Research Assistant (Boston University), Dimiter R. Avresky, Associate Professor (Boston University), Robert Horst, Technical Director
Tandem Computers, a Compaq Company