The PCIe transmission using Xillybus IPcore was relatively a straightforward solution for the transmission from the FPGA to the host PC. The Xillybus IPcore provides the necessary DMA-based design and the software driver to handle the data reception at the host. It was convenient for observing the data processed by the hardware design and evaluating it. On the other hand, the p-core is available as a bit-file with little insight into the internal design and limited flexibility for custom modifications.
One of the main drawbacks of using the Xillybus IPcore was the fact that it is hard to predict the behavior of the read-enable signal. It is controlled by many factors that are opaque to the user. Some of the factors may be: The operation of the PCIe core on FPGA, the response of the host to interrupts and the motherboard’s packet switching. At some points of the transmission process it was observed that the read enable signal of the Xillybus IPcore was idle for longer intervals of time (across a ChipScope window of 8192 clock cycles at 100MHz) causing accumulation of the data in the output buffer and consequently in the queue.

Design Parameters:

The number of channels that can be handled by the spike-based data reduction platform depends on several parameters related to the hardware resources, the processing clock on the FPGA, the type of neurons, and the transmission link bandwidth to the host PC. Based on the design simulations and hardware implementation, the following formula summarizes the design parameters to define the scaling boundaries of the system.

Memory Usage:

There are three buffering stations in the design, namely the input buffer, the spike detection unit intermediate output buffer, and the common output FIFO where the spike waveforms from all the spike detection units are queued to be transmitted to the host PC. The input buffer has 16 sample words of 16 bits each assigned for each channel. The intermediate output buffer has 48 word block of 18 bits per word for each spike detected waveform. The number of 48 word blocks is equal to the number of channels to account for the worst case scenario of full bursting synchronization across all channels. The queue depth of the output FIFO depends on the transmission rate to the host PC.
Memory usage per channel for the first two buffering stations is:
Input buffer = 16×16 = 256 bits
Intermediate buffer = 48×18 = 864 bits
The total memory required for buffering one channel at the first two stages is 1120 bits.
To assure that no spike waveforms will be dropped, the system must be able to copy the spike waveforms to the common output FIFO before the intermediate buffers are filled up again with new spike waveforms. In other words, the maximum bursting rate may not be greater than the rate at which the spike waveforms are copied to the output FIFO.

Maximum bursting Rate=\frac{Internalclock}{58 clock cycles # spike waveform blocks in the intermediate buffer}
In order to keep the memory usage per channel as described above, the internal clock must be adjusted adequately to complement the maximum bursting rate of the neuronal culture recorded.

Transmission Rate and Queue Depth:

An approximation formula was derived to relate the transmission rate to the queue depth, bursting rate and number of samples per spike.
SpkAcc=(Average BFR - TR).\tau_{{burst}}Queue Depth=SpkAcc samples / spikeAverageBFR=Average bursting Rate per channel . #of channels
where SpkAcc = spike accumulation in the queue, Average BFR is the Firing rate during bursting activities, TR is the transmission rate to the host PC and τ-burst is the bursting time. Knowing the type of neurons that will be monitored, the average bursting firing rate can be estimated. Based on the transmission rate and the hardware memory resources, the number of channels can be determined. Or knowing the number of channels, the transmission rate and the queue depth can be designed. There is a number of algorithms that have been developed to accurately detect burst occurrences and durations both in vivo and in vitro.