Session 1 - Algorithmic Paradigms and Primitives
Session 2 - Latency Tolerance and Performance Modeling
Session 3 - Communication, Run-Time Systems
Session 4 - Scalable Computing
Session 5 - Communication and Protocols for Clusters
Session 6 - Communication Libraries
Session 7 - Routing and Broadcasting I
Session 8 - Miscellaneous Architecture
Session 9 - Advanced Software for Applications Support
Session 10 - Routing and Broadcasting II
Session 11 - Scientific Engineering Systems
Session 12 - Performance
Session 13 - Mesh Architecture
Session 14 - Signal Processing
Session 15 - Program Optimization, Resource Allocation, Scheduling
Session 16 - Load Balancing and Distributed Computing
Session 17 - Data Mining and Databases
Session 18 - Compilers
Session 19 - Biological and Discrete Systems
Session 20 - Real-Time Simulation and Load Balancing
Session 21 - Miscellaneous Software
Session 22 - Industrial Track

please wait for entire file to load before selecting a session name

Session 1 - Algorithmic Paradigms and Primitives
The Characterization of Data-Accumulating Algorithms
Stefan D. Bruda; Selim G. Akl

Prefix Computations on Symmetric Multiprocessors
David R. Helman; Joseph JáJá

Reducing I/O Complexity by Simulating Coarse Grained Parallel Algorithms
Frank Dehne; David Hutchinson; Anil Maheshwari; Wolfgang Dittrich

Lower Bounds on the Loading of Degree-2 Multiple Bus Networks for Binary-Tree Algorithms
Hettihe P. Dharmasena; Ramachandran Vaidyanathan

A Time-Optimal Solution for the Path Cover Problem on Cographs
Koji Nakano; Stephan Olariu; Albert Y. Zomaya

Parallel Matrix Multiplication on a Linear Array with a Reconfigurable Pipelined Bus System
Keqin Li; Victor Y. Pan

Session 2 - Latency Tolerance and Performance Modeling
Improving Collective I/O Performance Using Threads
Phillip M. Dickens; Rajeev Thakur

Linear Aggressive Prefetching: A Way to Increase the Performance of Cooperative Caches
T. Cortes; J. Labarta

Hiding Communication Latency in Reconfigurable Message-Passing Environments
Ahmad Afsahi; Nikitas J. Dimopoulos

The Impact of Memory Hierarchies on Cluster Computing
Xing Du; Xiaodong Zhang

A Factorial Performance Evaluation for Hierarchical Memory Systems
Xian-He Sun; Dongmei He; Kirk W. Cameron; Yong Luo

A Performance Model of Speculative Prefetching in Distributed Information Systems
N. J. Tuah; M. Kumar; S. Venkatesh

Session 3 - Communication, Run-Time Systems
Run-Time Selection of Block Size in Pipelined Parallel Programs
David K. Lowenthal; Michael James

Reducing Parallel Overheads Through Dynamic Serialization
Michael J. Voss; Rudolf Eigenmann

Using Channels for Multimedia Communication
David May; Henk L. Muller

The Paderborn University BSP (PUB) Library - Design, Implementation and Performance
Olaf Bonorden; Ben Juurlink; Ingo von Otto; Ingo Rieping

A Capabilities Based Communication Model for High-Performance Distributed Applications: The Open HPC++ Approach
Shridhar Diwan; Dennis Gannon

Session 4 - Scalable Computing
Average-Case Analysis of Isospeed Scalability of Parallel Computations on Multiprocessors
Keqin Li; Xian-He Sun

Fully-Scalable Fault-Tolerant Simulations for BSP and CGM
Sung-Ryul Kim; Kunsoo Park

Coarse Grained Parallel Maximum Matching In Convex Bipartite Graphs
Prosenjit Bose; A. Chan; Frank Dehne; M. Latzel

Experimental Evaluation of QSM, a Simple Shared-Memory Model
Brian Grayson; Michael Dahlin; Vijaya Ramachandran

Session 5 - Communication and Protocols for Clusters
A Consistent History Link Connectivity Protocol
Paul S. LeMahieu; Jehoshua Bruck

Performance Evaluation of ServerNet SAN under Self-Similar Traffic
D. R. Avresky; V. Shurbanov; R. Horst; P. Mehra

Low-Latency Message Passing on Workstation Clusters Using SCRAMNet
Vijay Moorthy; Matthew G. Jacunski; Manoj Pillai; Peter P. Ware; Dhabaleswar K. Panda; Thomas W. Page Jr.; P. Sadayappan; V. Nagarajan; Johns Daniel

Cashmere-VLM: Remote Memory Paging for Software Distributed Shared Memory
Sandhya Dwarkadas; Nikolaos Hardavellas; Leonidas Kontothanassis; Rishiyur Nikhil; Robert Stets

The Computational Co-op: Gathering Clusters Into a Metacomputer
Walfredo Cirne; Keith Marzullo

Reducing System Overheads in Home-based Software DSMs
Weiwu Hu; Weisong Shi; Zhimin Tang

Session 6 - Communication Libraries
Exploiting Global Structure for Performance on Clusters
Stephen R. Donaldson; Jonathan M. D. Hill; David B. Skillicorn

Implementing Efficient MPI on LAPI for IBM RS/6000 SP Systems: Experiences and Performance Evaluation
Mohammad Banikazemi; Rama K. Govindaraju; Robert Blackmore; Dhabaleswar K. Panda

PM-PVM: A Portable Multithreaded PVM
Claudio M. P. Santos; Julio S. Aude

tmPVM - Task Migratable PVM
C. P. Tan; W. F. Wong; C. K. Yuen

A Ubiquitous Message Passing Interface Implementation in Java: jmpi
Kivanc Dincer

Session 7 - Routing and Broadcasting I
On-Demand Multicast Routing Scheme and Its Algorithms
Te-Chou Su; Jia-Shung Wang

Fault-Tolerant Routing Algorithms for Hypercube Networks
Keiichi Kaneko; Hideo Ito

Dynamic Interval Routing on Asynchronous Rings
Danny Krizanc; Flaminia L. Luccio; Rajeev Raman

Optimally Scaling Permutation Routing on Reconfigurable Linear Arrays with Optical Buses
Jerry L. Trahan; Anu G. Bourgeoiss; Yi Pan; Ramachandran Vaidyanathan

Session 8 - Miscellaneous Architecture
A Comparison of Router Architectures for Virtual Cut-Through and Wormhole Switching in a NOW Environment
J. Duato; A. Robles; F. Silla; R. Beivide

Dynamically Scheduling the Trace Produced During Program Execution into VLIW Instructions
Alberto Ferreira de Souza; Peter Rounce

Segment Directory Enhancing the Limited Directory Cache Coherence Schemes
Jong Hyuk Choi; Kyu Ho Park

Shuffle Memory System
Kichul Kim

An Efficient VLSI Architecture Parallel Prefix Counting With Domino Logic
Rong Lin; Koji Nakano; Stephan Olariu; Albert Y. Zomaya

Session 9 - Advanced Software for Applications Support
The Performance of Coordinated and Independent Checkpointing
Luis M. Silva; João Gabriel Silva

Automatic Array Alignment in Parallel Matlab Scripts
Igor Z. Milosavljević; Marwan A. Jabri

Implementation of NAS Parallel Benchmarks in High Performance Fortran
Michael Frumkin; Haoqiang Jin; Jerry Yan

Parallel Program Archetypes
Berna L. Massingill; K. Mani Chandy

Distributed, Scalable, Dependable Real-Time Systems: Middleware Services and Applications
Lonnie R. Welch; Binoy Ravindran; Paul V. Werme; Michael W. Masters; Behrooz A. Shirazi; Prashant A. Shirolkar; Robert D. Harrison; Wayne Mills; Tuy Do; Judy Lafratta; Shafqat M. Anwar; Steve Sharp; Terry Sergeant; George Bilowus; Mark Swick; Jim Hoppel; Joe Caruso

OpenMP for Networks of SMPs
Y. Charlie Hu; Honghui Lu; Alan L. Cox; Willy Zwaenepoel

Session 10 - Routing and Broadcasting II
Oblivious Deadlock-Free Routing in a Faulty Hypercube
Jin Suk Kim; Eric Lehman; Tom Leighton

Sparse Hypercube - A Minimal k-Line Broadcast Graph
Satoshi Fujita; Arthur M. Farley

All-to-All Broadcast on Switch-Based Clusters of Workstations
Matt Jacunski; P. Sadayappan; D. K. Panda

VBMAR: Virtual Network Load Balanced Minimal Adaptive Routing
Xicheng Liu; Timothy J. Li; Wen Gao

Session 11 - Scientific Engineering Systems
Portable Parallel Programming for the Dynamic Load Balancing of Unstructured Grid Applications
Rupak Biswas; Sajal K. Das; Daniel Harvey; Leonid Oliker

A Parallel Algorithm for Singular Value Decomposition as Applied to Failure Tolerant Manipulators
Tracy D. Braun; Anthony A. Maciejewski; Howard Jay Siegel

A Parallel Adaptive version of the Block-based Gauss-Jordan Algorithm
N. Melab; E-G. Talbi; S. Petiton

Sparse Matrix Block-Cycle Redistribution
Gerardo Bandera; Emilio L. Zapata

A New Approach to Parallel Dynamic Partitioning for Adaptive Unstructured Meshes
Gerd Heber; Rupak Biswas; Guang R. Gao

An Object-Oriented Environment for Sparse Parallel Computation on Adaptive Grids
Salvatore Filippone; Michele Colajanni; Dario Pascucci

Session 12 - Performance
A Network Status Predictor to Support Dynamic Scheduling in Network-Based Computing Systems
JunSeong Kim; David J. Lilja

Performance of an Infrastructure for Worldwide Parallel Computing
Thomas T. Kwan; Daniel A. Reed

BRISK: A Portable and Flexible Distributed Instrumentation System
Aleksandar M. Bakić; Matt W. Mutka; Diane T. Rover

An Efficient Logging Algorithm for Incremental Replay of Message-Passing Applications
Franco Zambonelli; Robert H. B. Netzer

Lazy Logging and Prefetch-Based Crash Recovery in Software Distributed Shared Memory Systems
Angkul Kongmunvattana; Nian-Feng Tzeng

Visualization and Performance Prediction of Multithreaded Solaris Programs by Tracing Kernel Threads
Magnus Broberg; Lars Lundberg; Haakan Grahn

Session 13 - Mesh Architecture
Better Deterministic Routing on Meshes
Jop F. Sibeyn

Efficient Parallel Algorithms for Selection and Multiselection on Mesh-Connected Computers
Hong Shen

Constant-Time Algorithm for Medial Axis Transform on the Reconfigurable Mesh
Amitava Datta

2.5n-Step Sorting on n*n Meshes in the Presence of o(n1/2) Worst-Case Faults
Chi-Hsiang Yeh; Behrooz Parhami; Hua Lee; Emmanouel A. Varvarigos

The Recursive Grid Layout Scheme for VLSI Layout of Hierarchical Networks
Chi-Hsiang Yeh; Behrooz Parhami; Emmanouel A. Varvarigos

Session 14 - Signal Processing
Multi-Threaded Design and Implementation of Parallel Pipelined STAP on Parallel Computers with SMP Nodes
Wei-keng Liao; Alok Choudhary; Donald Weiner; Pramod Varshney

A Parallel Phoneme Recognition Algorithm Based on Continuous Hidden Markov Model
Sang-Hwa Chung; Min-Uk Park; Hyung-Soon Kim

Load Adaptive Algorithms and Implementations for the 2D Discrete Wavelet Transform on Fine-Grain Multithreaded Architectures
Ashfaq A. Khokhar; Gerd Heber; Parimala Thulasiraman; Guang R. Gao

Application of Parallel Processors to Real-Time Sensor Array Processing
David R. Martinez

Mapping Media Streams onto a Network of Servers
Reinhard Lüling

A Systolic Algorithm to Process Compressed Binary Images
Fikret Ercal; Mark Allen; Hao Feng

Session 15 - Program Optimization, Resource Allocation, Scheduling
Optimizations for Language-Directed Computational Steering
Jeffrey Vetter; Karsten Schwan

Optimization Rules for Programming with Collective Operations
Sergei Gorlatch; Christoph Wedler; Christian Lengauer

A Flexible Clustering and Scheduling Scheme for Efficient Parallel Computation
S. Chingchit; M. Kumar; L. N. Bhuyan

Mechanisms for Just-in-Time Allocation of Resources to Adaptive Parallel Programs
Arash Baratloo; Ayal Itzkovitz; Zvi Kedem; Yuanyuan Zhao

Exploiting Application Tunability for Efficient, Predictable Parallel Resource Management
Fangzhe Chang; Vijay Karamcheti; Zvi Kedem

Supporting Priorities and Improving Utilization of the IBM SP Scheduler Using Slack-Based Backfilling
David Talby; Dror G. Feitelson

Session 16 - Load Balancing and Distributed Computing
Guidelines for Data-Parallel Cycle-Stealing in Networks of Workstations, II: On Maximizing Guaranteed Output
Arnold L. Rosenberg

LLB: A Fast and Effective Scheduling Algorithm for Distributed-Memory Systems
Andrei Radulescu; Arjan J. C. van Gemund; Hai-Xiang Lin

Parallel Load Balancing for Problems with Good Bisectors
Stefan Bischof; Ralf Ebner; Thomas Erlebach

Asynchronous Group Mutual Exclusion in Ring Networks
Kuen-Pin Wu; Yuh-Jzer Joung

Randomized Initialization Protocols for Packet Radio Networks
Tatsuya Hayashi; Koji Nakano; Stephan Olariu

Session 17 - Data Mining and Databases
An Optimal Disk Allocation Strategy for Partial Match Queries on Non-Uniform Cartesian Product Files
Sajal K. Das; M. Cristina Pinotti

Parallel Out-of-Core Divide-and-Conquer Techniques with Application to Classification Trees
Mahesh K. Sreenivas; Khaled Alsabti; Sanjay Ranka

P-EDR: An Algorithm for Parallel Implementation of Parzen Density Estimation from Uncertain Observations
P. E. López-de-Teruel; J. M. García; M. Acacio; O. Cánovas

A Fast Multithreaded Out-of-Core Visualization Technique
Peter D. Sulatycke; Kanad Ghose

Design and Implementation of a Scalable Parallel System for Multidimensional Analysis and OLAP
Sanjay Goil; Alok Choudhary

Infrastructure for Building Parallel Database Systems for Multi-dimensional Data
Chialin Chang; Renato Ferreira; Alan Sussman; Joel Saltz

Session 18 - Compilers
A New Memory-Saving Technique to Map System of Affine Recurrence Equations (SARE) onto Distributed Memory Systems
Alessandro Marongiu; Paolo Palazzari

A Novel Compilation Framework for Supporting Semi-Regular Distributions in Hybrid Applications
Dhruva R. Chakrabarti; Prithviraj Banerjee

Compiler Analysis to Support Compiled Communication for HPF-like Programs
Xin Yuan; Rajiv Gupta; Rami Melhem

PARADIGM (version 2.0): A New HPF Compilation System
Pramod G. Joisha; Prithviraj Banerjee

Marshaling/Demarshaling as a Compilation/Interpretation Process
Christian Queinnec

Session 19 - Biological and Discrete Systems
Parallel Algorithms for 3D Reconstruction of Asymmetric Objects from Electron Micrographs
Robert E. Lynch; Dan C. Marinescu; Hong Lin; Timothy S. Baker

Large scale simulation of parallel molecular dynamics
Pierre-Eric Bernard; Thierry Gautier; Denis Trystram

A Parallel Algorithm for Bound-Smoothing
Kumar Rajan; Narsingh Deo

Parallel Biological Sequence Comparison Using Prefix Computations
Srinivas Aluru; Natsuhiko Futamura; Kishan Mehrotra

Large Scale Simulation of Particulate Flows
Ahmed H. Sameh; Vivek Sarin

Session 20 - Real-Time Simulation and Load Balancing
EDD Algorithm Performance Guarantee for Periodic Hard-Real-Time Scheduling in Distributed Systems
Maurizio A. Bonuccelli; M. Claudia Clò

A Robust Adaptive Metric for Deadline Assignment in Heterogeneous Distributed Real-Time Systems
Jan Jonsson

A Communication Latency Hiding Parallelization of a Traffic Flow Simulation
Charles Michael Johnston; Anthony Theodore Chronopoulos

Relaxing Causal Constraints in PDES
Narayanan V. Thondugulam; Dhananjai Madhava Rao; Radharamanan Radhakrishnan; Philip A. Wilsey

Rate of Change Load Balancing in Distributed and Parallel Systems
Luis Miguel Campos; Isaac Scherson

An Efficient Dynamic Load Balancing using the Dimension Exchange Method for Balancing of Quantized Loads on Hypercube Multiprocessors
Hwakyung Rim; Ju-wook Jang; Sungchun Kim

Session 21 - Miscellaneous Software
Cascaded Execution: Speeding Up Unparallelized Execution on Shared-Memory Multiprocessors
Ruth E. Anderson; Thu D. Nguyen; John Zahorjan

COWL: Copy-On-Write for Logic Programs
Vítor Santos Costa

Dynamic Grain-Size Adaptation on Object Oriented Parallel Programming - The SCOOPP Approach
João Luís Sobral; Alberto José Proença

Implementation of a Virtual Time Synchronizer for Distributed Databases
Azzedine Boukerche; Sajal K. Das; Ajoy Datta; Timothy E. LeMaster

A Graph Based Framework to Detect Optimal Memory Layouts for Improving Data Locality
Mahmut Kandemir; Alok Choudhary; J. Ramanujam; Prithviraj Banerjee

Hyperplane Partitioning: An Approach to Global Data Partitioning for Distributed Memory Machines
S. R. Prakash; Y. N. Srikant

Session 22 - Industrial Track
FPGA-Based Architecture for High Speed Serial Processing (file unavailable)
Paul Kowalewski; Robert L. Donaldson

Delivering on Standards: Balancing Portability and Performance
John Robinson

IP Validation for FPGAs using Hardware Object Technology
Steve Casselman; John Schewel; Christophe Beaumont