The research presented in this dissertation was motivated by a long term goal of monitoring the electrical activity of thousands of neurons, in an effort to decipher the brain activity. Recording thousands of neural signals may provide some insight in what Santiago Ramón y Cajal, the father of modern neuroscience, called “the impenetrable jungle where many investigators have lost themselves.” Monitoring the dynamic signals of an enormous number of neurons is a breakthrough that might bridge the gap between the firing of neurons and motion, perception or even decision making. Increasing the number of recording channels is a common demand among different research areas. The development of reliable BMI with multiple degrees of freedom, to help paralyzed patients and amputees restore their independent mobility, requires monitoring the firing patterns of hundreds or even thousands of neurons. Decoding the exact patterns of brain dynamics that underlie thinking and behavior will provide essential insight into what happens when neural circuitry malfunctions in neural and psychiatric disorders. In vitro neuronal network research also requires a high density MEA data acquisition to enable studying the correlation between the static and dynamic maps of the neurons.
The real-time neural signal processing will be an essential requirement for any system dealing with a massive number of recording channels, even if it is not a closed loop system. Even systems with offline data analysis will require at least real-time data reduction to limit the data storage needs.
Increasing the number of recording channels carries many challenges at every step of the neural processing pathway from data acquisition to data analysis. The work presented in this dissertation has attempted to find solutions to some of the problems related to designing a real-time neuronal data reduction platform that can handle a few thousands of recording channels. Along the research work, more questions were raised uncovering areas of further future work potentials. The hardware architecture designs developed can serve as a testing platform for new approaches to process neuronal signals.

Integration of the Platform with a Data Acquisition System:

One of the major questions that were investigated in the dissertation work was how to handle the massive input data that is expected to result from an augmentation in the number of recording channels. The application of the Multi-Gigabit transceivers (MGTs) was suggested to get the neural data into and out of the FPGA as fast as the device can process it. Simple solutions were suggested for the alignment of data words as well as the reassignment of input data to their respective channel IDs. The comma detection and comma alignment circuits of the MGT were applied. The next research step would be examining the system interface to Analog to Digital Converters that present the final stage of a neural signal acquisition system. Starting in 2006, JEDEC introduced a series of standards allowing ADCs to connect to SerDes interfaces on FPGAs. The latest version JESD204B released in 2012 features a high maximum lane rate (up to 12.5 Gbps per channel), support for deterministic latency, and support for harmonic frame clocking. The series of standards have set a common language between fast high performance ADC and FPGAs making use of the high bandwidths SerDes can provide. Analog Devices has lately released an ADC with high-speed serial interfaces, the AD9250 dual, 14-bit, 250MSPS ADC supporting the JESD204B standard.
Theoretically speaking, this ADC can handle 10,000 recording channels sampled at 25 KSPS. Examining the practical implementation of this ADC to a neural data acquisition system and possible switching circuits that can multiplex different channels to the same ADC at this rate is a potential future research point.

The Autonomous Design Architecture:

One of the motivations of the dissertation work was to design an autonomous spike-based data reduction system, that is fully controlled by FSMs. No processors were used in the system control in order to avoid interrupt latencies that may degrade the performance of the overall design. FSM controllers were designed to handle different parts of the design, namely:
(a) The input data allocation between multiple spike detection units,
(b) The spike detection unit control and the copying process of the spike waveforms from the input BRAM into the output buffers.
(c) Autonomous threshold selection for the spike detection unit.
(d) Managing the transmission of the AP waveforms from the unit buffers to the output FIFO shared by all the units.
(e) A test-bed for the transmission of real neuronal data.
The design architecture and FSM designs can be implemented to test new neural signal processing approaches. As a proof of concept, the spike detector used threshold comparison using the NEO operator. This is a classical approach that has been used for a long time in neuronal spike detection. The architecture design can implement other spike detection techniques such as the discrete wavelet transform.

The hardware implementation:

The hardware designs presented in the dissertation work were implemented for evaluation and proof of concept on a Xilinx XUPV5-LX110t board. Virtex-7 FPGAs are expected to have lower utilization percentages, and faster speed. This will allow giving more room for design expansion to handle more channels. The design BRAM utilization showed the highest percentage of 91%. The complete design used a total of 136 x 36K BRAMs ~ 5Mb. The Virtex 7 FPGA families integrate on average 68Mb BRAMs. A rough estimate can conclude that the hardware design described in the dissertation can be replicated ~ 13 times to handle a total of more than 33 thousand channels. A definite channel count value cannot be given before synthesizing the design and running the placement and routing to ensure that the timing constraints will be met. The Virtex7 FPGA integrates 96 MGTs, each working at 28.05 Gbps. Considering the integration of 33 thousand channels and each channel recording neural signals at 30 KHz, then a total bandwidth of 990 MSPS and at a sample precision of 16 bits/Sample, the input bandwidth requirement is 15.84 Gbps. Hence it is not expected that the input data transmission would be a factor of design limitation. The design bottleneck will be the transmission through PCIe to a host PC. Further reduction will be needed to decrease the output data for example by implementing spike sorting in hardware as well. With more DSP slices integrated, the implementation of more complex spike detection and 101
spike sorting algorithms will be feasible. Concrete values can only be determined when the design is implemented on hardware. This is another implementation project to be considered in future work.
Recently, Xilinx has released the Zynq®-7000 family. A series of products based on the Xilinx All Programmable System-on-Chip architecture, that integrates a dual-core ARM® Cortex™-A9 based processing system and 28 nm Xilinx programmable logic in a single device. Implementing the spike-based data reduction platform on the Zync FPGA may allow adding more features to the design capabilities.