Abstract
A procedure for designing and fabricating holograms which are used for generating free space optical interconnections is described. The design process uses the Gerchberg-Saxton algorithm and a random search (simulated annealing) method. Fabrication is performed by electron beam lithography and acetone etch of a polymethyl methacrylate film. The use of holograms fabricated by this method in an optoelectronic processing module is discussed. The interconnect patterns required to implement switching and data processing devices are discussed, including a crossbar switch, an adder, and a multiplier. Holograms which implement these interconnect patterns have been fabricated.

1.0 Introduction
Optical interconnects are the subject of much current research because they offer many advantages over conventional electronic interconnects. For example, optical interconnects have the potential of lowering the energy consumed per bit for point-to-point interconnections because optical interconnects are capacitance free, and because one bit may practically be represented by fewer than 1,000 photons [1,2]. The absence of capacitance means that the energy used for optical data transmission is, to the first order, independent of distance [3,4]. A second advantage is that optical interconnects will increase the total number of possible interconnects per chip, thereby allowing a greater interconnect density than is possible with electronics [5]. Finally, because of the absence of capacitance, optical interconnects offer the ability to implement high fan-out/high fan-in switching and processing architectures [6]. The maximum practical fan-in and fan-out of high-speed CMOS logic gates is approximately five [7]; fan-in and fan-outs of ~100 may be possible with optics [6].

In addition to simple point-to-point data communications, the performance of optically interconnected circuits may be enhanced by using smart interconnects which route data to selected locations on the destination chip [6,8]. Most significantly, the combination of optical interconnects, optical receivers, and optical transmitters may be used as a distinct method of performing calculations [9]. In Section 2 of this paper, we review the high performance optoelectronic computing (HPOC) module, which is a practical implementation of optically interconnected logic gates. The optical interconnects essential to the operation of the HPOC module are generated by diffractive optical interconnect elements (DOIEs), which are computer generated transmission holograms. In Section 3, we describe a program which has been written to design the DOIEs which will generate an arbitrary interconnection pattern. In Section 4, which composes the bulk of this paper, we will describe algorithms which allow the performance of some basic data processing and switching functions in the HPOC module. We will also describe the design of the DOIEs which are necessary to implement these algorithms.

2.0 High Performance Optoelectronic Computing Module
The current HPOC module design [6,8,9] comprises an 8 x 8 array of GaAs/AlGaAs vertical-cavity surface-emitting lasers (VCSELs), which operate at ~850 nm; an 8 x 8 array of DOIEs; and an 8 x 8 detector/receiver/laser driver (smart pixel) array which can be fabricated from either GaAs or Si-CMOS. As shown in Figure 1, light from one VCSEL is incident on each DOIE, which routes the light to a set of detectors determined by the interconnect pattern generated by the DOIE.

The arrays of VCSELs, DOIEs and smart pixels which compose the HPOC module cause it to function as a Boolean matrix-tensor multiplier. The HPOC module performs the calculation \( \mathbf{F} = \mathbf{A} \otimes \mathbf{X} \), where \( \mathbf{X} = (x_{ij}) \); \( i = 1, 2,...,I \); \( j = 1, 2,...,J \) is the input matrix; \( \mathbf{F} = (f_{kl}) \); \( k = 1, 2,...,K \); \( l = 1, 2,...,L \) is the output matrix; and \( \mathbf{A} = (a_{ijkl}) \) is the control...
where \( \mathbf{X} \) follows a Boolean matrix-tensor addition:

\[
\text{cascade performs a Boolean matrix-tensor multiplication,}
\]

The two-stage module represents complete or partial instructions depending on the pattern of interconnects generated by the elements of the DOIE array. A second pass through the system would allow the minterms to be summed to form partial or complete instructions. An example of this is shown in Figure 1, in which two HPOC modules are cascaded. Light from VCSELs, which are driven by the smart pixels of the first HPOC module, is used as the input to the second HPOC module. The second HPOC module performs the double summation of the minterms computed in the first HPOC module. The output \( \mathbf{Y} = (y_{mn}) \), of the second HPOC module represents complete or partial instructions depending on the length of the instruction encoded into the diffractive optical elements in each HPOC module. The two-stage cascade performs a Boolean matrix-tensor multiplication, followed by a Boolean matrix-tensor addition:

\[
\begin{align*}
f_{kl} = & \prod_{i=1}^{I} \prod_{j=1}^{J} a_{ijkl} x_{ij} = \sum_{i=1}^{I} \sum_{j=1}^{J} a_{ijkl} x_{ij}, \\
y_{mn} = & \sum_{k=1}^{K} \sum_{l=1}^{L} \left( \prod_{i=1}^{I} \prod_{j=1}^{J} a_{ijkl} x_{ij} \right) b_{klmn}
\end{align*}
\]

where \( \mathbf{B} = (b_{klmn}) \) is the control tensor of the second HPOC module. The two-stage cascade generates the sum-of-products formulation required by Shannon’s generalized digital computation theory [10].

3. DOIE Design and Fabrication

DOIEs for use in the HPOC modules were designed with a proprietary software tool called SPIDER (software package for interconnect design, evaluation, and reconstruction). SPIDER is written in C and uses the X Windows protocols for graphics interfacing. It operates successfully on two platforms: a DEC Alpha running Ultras and a SUN Sparc running Solaris. The software package consists of a graphical user interface that allows the designer to layout the interconnect pattern, and to add additional degrees of freedom to the output pattern in order to allow for hologram solutions that satisfy the phase-only hologram constraint. SPIDER is used in conjunction with a software tool that converts a set of Boolean equations to an interconnection map.

Once the desired interconnect pattern is defined, one of two iterative design procedures may be used to determine the phase function of a hologram that will produce the desired interconnect. These two procedures are the Gerchberg-Saxton (GS) method [11-13] and the random search (RS) method (essentially simulated annealing at zero temperature). The GS method is much faster than RS and produces good solutions with high diffraction efficiency. Often, these solutions will be adequate, but improved performance can usually be achieved by running RS on the solution obtained with GS. This is especially true for certain structured interconnect patterns for which GS cannot fully utilize the available degrees of freedom. In this case, RS can be used to “break” the symmetry of the structured output, thereby finding a solution that puts some light into all the required locations. Rerunning GS with the RS solution as a starting point yields significant improvements in performance. The ability to use both GS and RS is essential to obtaining adequate results in more demanding design problems. Since both GS and RS are iterative techniques, some criterion must be used to stop the design cycle. The software allows the user to specify a fixed number of iterations, or to stop iterating based on the achievement of some figure of merit, which may be based on, for example, the diffraction efficiency or root-mean square (RMS) error in the output. The package provides continually updated estimates of the diffraction efficiency and RMS error while the design process is running.

An additional design criterion of HPOC module interconnection patterns is ensuring that all the interconnects are equally efficient, i.e., that they generate diffracted spots of equal intensity. This is important as it allows the 1/2 bit threshold to be set at the maximum level, which minimizes the bit error rate of the HPOC module. Equal intensity of
diffracted spots was achieved with a two step design method. Initially, each hologram was designed independently of the others to obtain a high efficiency solution. Next, the efficiency of all holograms was reduced to match that of the least efficient hologram. This was achieved by including a diffraction efficiency term in the merit function for RS, and perturbing the design towards the desired solution. When significant changes had been made, GS was used to find a low RMS error solution at the target efficiency. This design procedure worked very well; it was possible to have the calculated efficiency of all interconnects lie within +/- 5% of the mean.

In some cases, a hologram that has an exact phase-only solution is required. SPIDER incorporates software that allows the hologram to be specified as a sampled version of the analytic phase-only solution. Information is provided to the user indicating whether aliasing will occur in this sampled function. Fresnel lenses and blazed gratings can be designed with this approach.

DOIEs were fabricated at Jet Propulsion Laboratory (JPL) by etching a thin film of polymethyl methacrylate (PMMA) which had been spin-coated onto a quartz substrate [14,15]. Etching alters the thickness of the PMMA layer so that it imposes a phase modulation in the range 0 to 2\(\pi\) on a plane optical wavefront which passes through the PMMA. The etch depth was controlled by electron beam lithography. Exposure to an energetic beam of electrons breaks the PMMA chemical bonds. PMMA which has been damaged in this manner etches more rapidly than undamaged PMMA. The depth to which bond-breaking occurs is determined by the energy of the electron beam. After exposure to the electron beam, the PMMA film was etched in acetone. The electron beam lithography/acetone etch process is capable of producing structures (pixels) which are 1 \(\mu\)m on edge and which have 64 depth levels. In the DOIEs fabricated for this program, the pixels are 4 \(\mu\)m on edge. The DOIEs are 512 \(\mu\)m on edge, have 64 phase levels, and are designed to operate with 850 nm illumination.

### 4. DOIE Results

Using SPIDER, a wide variety of holograms was designed and fabricated, including DOIEs for devices such as 4-bit and 8-bit adders; 4 x 4 and 8 x 8 crossbar switches; and a 3-bit multiplier. Holograms corresponding to the other passive optical elements of the HPOC module, i.e., VCSEL collimating lenses and a Fourier lens, were also designed. A set of test patterns was also designed in order to allow us to determine the effect of increasing fan-in of the HPOC module.

#### 4.1 Test Pattern

The efficacy of the design and fabrication process described above was verified by examining one of the test patterns in detail. Figure 2a shows the desired interconnection pattern for the test DOIE. When illuminated with collimated 850 nm light, the DOIE should generate 32 spots of uniform size and intensity arranged on a square 512 \(\mu\)m grid in the pattern depicted in Figure 2a. The phase map produced by SPIDER corresponding to the desired interconnection pattern is shown in Figure 2b. In this figure (and in all other phase maps), a gray scale represents relative phase retardations in the range 0 (black) to 2\(\pi\) (white). The phase map actually consists of a 4 x 4 array of identical sub-maps. Using a hologram which is composed of an array of identical sub-holograms relaxes the tolerance to which the collimating lens must be aligned with the hologram. To verify that the phase map was correct before manufacturing a hologram, reconstruction of the hologram was simulated in SPIDER. Results of the simulated reconstruction are shown in Figure 2c. An array of compact, uniform spots with the correct spacing is obtained. Very little light is incident on grid points other than the desired 32. A DOIE corresponding to the phase map in Figure 2b was fabricated with the electron beam lithography/acetone etch technique discussed above. The DOIE was 512 \(\mu\)m on edge, which matches the pitch of the VCSEL array to be used in the HPOC module.

The DOIE interconnect pattern was reconstructed by illuminating it with a collimated beam of light from a buried oxide VCSEL which operated at \(\sim\)850 nm [16]. The DOIE was part of an array; to ensure that only one element of the array was illuminated, the collimated beam was passed through an aperture which was 480 \(\mu\)m in diameter. The aperture was positioned immediately in front of the DOIE.

![Figure 2: Test pattern for fan out=32: (a) desired interconnection pattern; (b) phase map of hologram; (c) simulated reconstruction; (d) measured reconstruction.](image-url)
Diffused light from the DOIE was passed through a Fourier transform lens and was imaged onto a CCD camera. A typical image is shown in Figure 2d. The desired interconnection pattern is obtained. The feature in the center of the pattern is the zero-order spot. (In the test pattern, and in all other DOIEs, the grid on which the interconnection pattern lies is offset by half the pitch from the position of the zero-order spot. This precaution prevents the zero-order spot from falling on a detector.) Ideally, there would be no zero-order spot (the spot is absent in Figure 2c). The spot may be present because the thickness of the hologram does not match the design thickness, or because the laser is not at the 850 nm design wavelength. Nevertheless, the efficiency of the DOIE design is shown by the fact that the intensity of the zero-order spot is less than 10% of that of the other spots in the test pattern. The intensity of diffused light at the grid points which are intended to be unilluminated is less than 1% of the average intensity at the illuminated grid points.

4.2 Combinatorial Multiplier

Combinatorial multipliers are attractive because they operate at high speed. If implemented in an HPOC module, a multiplication would be performed every HPOC module cycle time, i.e., every nanosecond if the HPOC module operated at 1 GHz. The pipeline delay is two cycles. Larger numbers can be multiplied at the same rate if sufficient HPOC module channels are available.

A set of DOIEs which will implement a 3-bit combinatorial multiplier in a two-stage HPOC module cascade was designed and fabricated. In a combinatorial multiplier, the calculation is performed in a “bitwise” manner: a set of minterms is generated for each of the bits of the product. All the minterms corresponding to each digit are then summed to generate the bits of the product. As described in Section 2, the minterms are calculated in the first HPOC module and the summation is performed in the second module. The calculation is performed most efficiently by reducing the Boolean equation for each bit to its minimal sum-of-products form. In the case of the 3-bit multiplier, the 111 minterms can be reduced to 35. The minimal Boolean equations for the output bits, \( O_0 \) to \( O_5 \), of the 3-bit multiplier are as follows [17]:

\[
O_0 = a_3 b_0 \\
O_1 = a_0 b_0 b_1 + a_0 a_1 b_0 a_2 b_1 + a_0 a_1 b_1 + a_0 a_1 b_1 \\
O_2 = a_0 b_1 b_2 + a_0 a_1 a_2 b_0 + a_0 a_1 a_2 b_1 a_2 b_2 + a_1 b_2 b_3 + a_0 a_1 a_2 b_2 b_2 + a_0 a_1 a_2 b_2 b_2 + a_0 a_1 a_2 b_2 b_2 + a_0 a_1 a_2 b_2 b_2 + a_0 a_1 a_2 b_2 b_2
\]

From Equation 3, a set of DOIEs was generated which will allow the 3-bit multiplier to be implemented using HPOC modules. The sets of interconnect patterns which will implement the 3-bit multiplier are shown in Figure 3. For this device, two cascaded HPOC modules are used. In Figure 3, the active pixels of the first and second VCSEL arrays are shown as filled boxes, followed by the interconnect maps corresponding to each of these pixels. There are 12 inputs because each input bit and its complement are required in the calculation. Finally, the active pixels of the second detector array, which will represent the result of the multiplication performed by the device, are shown. The multiplier exploits the ability of the HPOC module to support large fan-outs and fan-ins. The maximum fan-out required in the 3-bit multiplication is 18 for bits \( a_i \) and \( b_j \), i.e., \( a_3 \) and \( b_2 \) each appear 18 times in the above equations. Light from the VCSELs which represent these bits must be transmitted to 18 detectors in the first HPOC module. Similarly, the maximum fan-in required is 10: bit \( O_i \) is the result of the summation of 10 minterms. Light from the VCSELs which represent these minterms must be transmitted to one detector in the second HPOC module.

Figure 4 shows the calculated phase maps for the subholograms of DOIEs which implement the interconnections shown in Figure 3. In practice, each DOIE would be composed of a 4 x 4 array of identical subholograms. The subholograms are shown in order that the details of the phase structure may be readily observed. The simulated reconstructions of the DOIEs, which are shown in Figure 5, match exactly the interconnection maps shown in Figure 3.

Although the 3-bit multiplier may be implemented with the 8 x 8 HPOC modules which are currently under construction, the number of HPOC module channels required to perform a multiplication increases rapidly as the lengths of the multiplier and multiplicand increase. Minimal Boolean equations were generated for all multipliers up to six bits. A full set of Boolean equations was generated for each bit of each multiplier and was minimized with SIS [18], a software utility which includes a Boolean algebra facility. The results are presented in Figure 6, in which the number of minterms is plotted as a function of the output digit. For example, calculating bit \( O_0 \) of the product of two six-bit
Figure 3: Interconnect patterns and locations of active VCSELs for a 3-bit multiplier composed of two cascaded HPOC modules.

Figure 4: Calculated phase maps of the DOIEs to generate the interconnect patterns for a 3-bit multiplier composed of two cascaded HPOC modules.
numbers requires that 506 minterms be generated and summed. Calculating all the bits simultaneously would require more than 2,000 HPOC module channels. If HPOC modules were to be used to multiply long numbers, it would be necessary to partition the problem: an array of partial products could be calculated in HPOC modules and then summed with an HPOC module or an electronic accumulator.

4.3 Adder

It is attractive to use HPOC modules to perform addition because the carry propagation, which is problematic in electronic adders [19], can be readily managed in the high fan-in environment of the HPOC module. This requires performing the addition in three steps. Suppose that two numbers $A$ and $B$, which are represented by the binary digit sequences $a_{n-1} a_{n-2} \ldots a_1 a_0$ and $b_{n-1} b_{n-2} \ldots b_1 b_0$, are to be added.

First, a partial sum, $P_k = a_k \text{ XOR } b_k$ and a partial carry $G_k = a_k \text{ OR } b_k$ are calculated for each bit of $A$ and $B$. In the second step, a look-ahead carry is generated by evaluating the expression:

$$C_m = G_m + \sum_{i=1}^{m} G_m \prod_{j=1}^{i} P_{m+i-j}$$

Finally, the true sum is generated by XORing the look-ahead carry with the partial sum $P_i$ for each digit. The first and third steps of the calculation require the evaluation of simple logic functions of two arguments and may, therefore, be performed efficiently with electronics. The second step, evaluation of the look-ahead carry, is more complicated as it requires calculating products of increasingly large sets of numbers. In the case of a four-bit adder, the expression for each of the look-ahead carries is:
Note that these equations are in the sum-of-products form which is evaluated by a two-stage cascade of HPOC modules. It is not practical to use electronics to evaluate the look-ahead carries for numbers which are longer than a few bits because the maximum fan-out increases in proportion to the square of the number of input bits, while the maximum fan-in is equal to the number of bits. However, the HPOC module can support the large fan-outs and fan-ins required for the addition of larger numbers and can therefore be used to perform addition.

The interconnect pattern for an 8-bit adder is shown in Figure 7. The active pixels of the first and second VCSEL arrays are shown as filled boxes, followed by the interconnect maps corresponding to each of these pixels. The active pixels of the second detector array, which will represent the result of the addition performed by the device, are also shown. The maximum fan-out is 16, the maximum fan-in is 8. Note that the 8-bit adder fits onto an 8 x 8 HPOC module array. In general, an n-bit adder would fit onto an n x n HPOC module array. Holograms for an 8-bit adder have been designed and fabricated and are now being tested.

4.4 Crossbar Switch

The final application considered is the crossbar switch. The interconnection pattern for a 4 x 4 crossbar switch, which could be implemented in a two-stage cascade of HPOC modules, is shown in Figure 8. The crossbar switch takes the form of a set of four 2:4 multiplex decoders (2:4 MUX) with the corresponding output bits of all the 2:4 MUXs ORed together. From Figure 8, it can be seen that each 2:4 MUX occupies one row of the first HOPOC module. Each 2:4 MUX has one signal line and four address lines (two address bits in complementary notation). The signal lines S0 to S3 take data from the inputs to the crossbar switch. The first stage HPOC module routes the signal from each data line to one

Figure 7: Interconnect patterns and locations of active VCSELs for an 8-bit adder.

Figure 8: Interconnect patterns and locations of active VCSELs for a 4 x 4 crossbar switch.
of the detectors in the first HPOC module. For the crossbar switch to function without conflicts, each data input must be routed to a different column of the second stage HPOC module. In the second stage, signals from each column are ORed; all the signals from the VCSELs in one column are routed to one detector. To obtain operation without conflicts, the address bits must be correctly set by an external electric circuit. The switch can be used for broadcast, i.e., the signal from one input is transmitted to all the outputs, by appropriately setting the address data.

DOIEs for the 4 x 4 and an 8 x 8 crossbar switch have been designed, fabricated, and are being tested. It is expected that the crossbar switch will operate at the speed of the HPOC modules, i.e., up to 1 Gb/s. An n x n crossbar switch can be accommodated by an n x n HPOC module; the 8 x 8 crossbar switch fits onto a two stage 8 x 8 HPOC module cascade. When fully functional, the 8 x 8 crossbar should have a data throughput of up to 64 Gb/s.

5. Conclusion

In this paper, we have described a method for designing and fabricating diffractive optical interconnect elements (computer generated transmission holograms) for use in high performance optoelectronic computing modules. The modules may be used for data processors and optoelectronic switches. The design and fabrication process results in DOIEs which accurately reproduce the desired interconnection pattern. In the holograms which have been tested, we saw that the desired interconnection pattern was obtained with little crosstalk and near elimination of the zero-order spot. Using these holograms, computational algorithms can be implemented that require high fan-outs and fan-ins which are impractical with conventional high-speed electronics. We have presented a design for a 3-bit combinatorial multiplier which can be implemented in a two-stage HPOC module cascade. Finally, we have presented a design for an adder and a crossbar switch which are highly suitable for implementation in HPOC modules.

6. Acknowledgments

OptiComp Corporation would like to thank Mr. Brian Hendrickson and Mr. Richard Fedors of Rome Laboratory (U.S. Air Force) for their support of this program. We would also like to thank Dr. Arthur Gmitro and Mr. Christopher Coleman of the University of Arizona for assistance with DOIE design, Dr. Paul Maker of Jet Propulsion Laboratory for DOIE fabrication, and Dr. Richard Carson of Sandia National Laboratories for lending the buried oxide VCSELs which were used in the measurements.

7. References