# Hello Humanity

## Introducing Kernel

At Kernel, our primary goal is to enable improved and accelerated insights into the human brain. The 20th century was the century of physics. We split the atom, went to the moon, and peered at the edges and origins of the universe. The 21st century will be the century of the brain, the mind, and of general intelligence. What wonders lie ahead on the path towards the next great frontier of human exploration—ourselves?

During nearly four years in stealth mode, we identified two key factors restricting progress towards understanding the brain. First, acquiring the best brain signals today requires expensive, complex, and room-sized equipment with the need for full-time, trained technicians. The cost and sheer size of these machines, in addition to the restrictions placed on user movement and experimental environments, limit their scale and potential. This means that much of the extensive progress in neuroscience from brain imaging to date has been achieved by leveraging only a modest number of hours of data collected in limited settings that do not capture the full richness of the human experience. Second, the complexities and diverse requirements for measuring and interpreting neural signals which, if done from the ground up, require input and expertise from a wide array of disciplines—including fundamental sensor physics, hardware design, firmware development, experimental design, signal processing, machine learning, and neuroscience. Progress has thus been scattered across companies, labs, and academic departments throughout the world. Putting it all together to work seamlessly is a monumental feat in and of itself.

Our team has worked to address both of these limitations to enable Neuroscience as a Service (NaaS)—neuroscience studies delivered at the touch of a button. (Sign up for NaaS here.) First, we have built hardware for brain signal acquisition that is low-cost, scalable, easy to use, and that supports natural environments and motion. Second, we have assembled a world-class "full-stack neuroscience" team to put it all together. We combine the highest levels of expertise in all of the required areas under one roof, from building and optimizing the hardware, designing and executing the experiments, developing the required signal processing and machine learning algorithms, to analyzing and interpreting the results. This enables the rapid iteration and communication necessary to take modern neuroscience to the next level. We take care of all the details while providing the customer with a simple and clean experience. In combination, our team and hardware create a value proposition for accelerating neuroscience unmatched in the world.

In this post, we offer a glimpse at Kernel's advancements on these fronts. In particular, we present two pilot experimental studies, highlighting the breadth and expertise of our team across hardware integration, software engineering, experimental execution, signal processing, data analysis, and beyond. Moreover, we offer an introduction to our hardware developments, along with a brief description of why and how we identified the promise of these technologies.

We are deliberately not yet showing any experimental demonstrations and scientific findings from our own next-generation hardware. These demonstrations are forthcoming in a separate announcement. Here, we wish to share an introductory account of our current status and engage in a public conversation with potential clients and collaborators on NaaS, both from industry and academia. Moreover, we look forward to continued engagement as we keep building and will share our progress along this journey.

We invite you to imagine how you and Kernel may work together towards a better understanding of the brain and of human intelligence itself.

## Contents

EXPERIMENTS AT KERNEL: We built two demonstrations of cutting-edge Brain-Computer Interface (BCI) applications, which we refer to as "Speller" and "Sound ID". Speller allows a participant to spell out words with their brain signal by visually attending to the corresponding letters on a screen, and Sound ID shows that we can infer what sound a person is listening to, from brain signals alone. To the best of our knowledge, this is the first time these applications have been realized in real-time (Sound ID) and with such a small amount of training data (Speller and Sound ID)

NEUROSCIENCE HARDWARE AT KERNEL: After a thorough investigation of (potential) non-invasive brain recording modalities, we identified two exciting candidate technologies with the potential for scalability, ease of use, low-cost, and natural user environments. One uses optically pumped magnetometer (OPM) sensors to directly measure the magnetic fields generated by neural activity; the other detects the associated local metabolic response through changes in photon diffusion. Anticipating non-linear benefits due to the complementary nature of these two modalities, we have built them both.

## Experiments at Kernel: Exploration with Magnetic Brain Signals

We present here two demonstrations in line with previously-explored neural decoding paradigms, which we refer to as "Speller" (Thielen 2015) and "Sound ID" (Koskinen 2012). These demonstrations use in-house, custom-built magnetic shielding and our in-house acquisition, signal processing, and analysis pipelines. The aim of the Speller demonstration was to allow the participant to generate text using only their gaze and a visual keyboard, with their gaze decoded from brain signal alone. The aim of Sound ID was to see if we could decode speech and song snippets from a known list using only brain signals of the listener — quickly, in real-time, and with a small amount of training data.

In both explorations, we either reproduce or advance the state-of-the-art. We had two goals with these experiments. First, we wanted to evaluate the quality of OPM signals from different regions in the brain (e.g., temporal and occipital cortex) and characterize event-related fields, user experience, and signal-to-noise characteristics. Second, we wanted to build out and validate our in-house signal acquisition and processing pipelines necessary to achieve or advance state-of-the art performance for various BCI applications.

### THE OP-MEG SETUP

The data acquisition system consisted of an array of optically pumped magnetometer sensors for measuring the magnetoencephalogram (OP-MEG; QuSpin Gen2 sensors), which were positioned in a helmet and mounted on a user's head. The helmet was created in Solidworks with 10-10 positioning for sensor mounts, and 3D printed from nylon. Sensors were operated in dual-axis mode, sensing magnetic fields in two directions: one normal to the scalp (radial field; z-axis), and one tangential to the scalp (tangential field; y-axis). Each sensor produced an analog voltage output that was digitized at a sampling rate of 2000 Hz using a National Instruments digital acquisition system. All experimentation took place inside a magnetically-shielded room to attenuate environmental noise. All control and acquisition equipment was kept and operated outside the shielded room. During experimentation, all sensors operated simultaneously, outputting both the radial and tangential magnetic fields.

## Experiment 1: Speller

### METHODS

#### SENSOR POSITIONS

20 sensors were placed in the helmet to cover occipital and parietal cortices. 4 extra sensors were positioned on 6 cm pedestals that were mounted on the helmet, two on each side (oriented orthogonally to each other so as to sample fields in every direction). These 4 sensors make reference measurements to detect magnetic fields further from the scalp that can be later used for removing noise from far sources (first-order gradiometer). The 20 sensors on the scalp yielded a total of 40 channels.

Subjects viewed a keyboard on the display (Fig. 1a) comprising 36 tiles (26 letters, and 10 digits). The display was placed approximately 90cm from the subjects, and the diagonal of each letter tile represented about 3.8 degrees of visual angle. Each tile of the keyboard flickered with a specific pattern, simultaneously. The patterns were chosen amongst a set of so-called "Gold codes" (see Thielen 2015 for details; their set "V"). A set of Gold codes contains pseudo-random bit sequences that have a minimized cross-correlation. The sequences flickered for each letter were 126 bits long; all simultaneously displayed at the maximum screen refresh rate (60 Hz), giving a total flickering duration of 2.1 seconds (126 bits/60 Hz) (Fig. 1b).

After setting up and calibrating the neural recordings, subjects were asked to spell 30, eight-letter words (240 trials) — selected such that all letters appeared about the same number of times. Each trial started with the word to be spelled being displayed on the screen for 5 seconds; then, to select a letter, the subject had to look at the letter directly while the keyboard was being flickered for the duration of the Gold code. Between two letters, there was a short break (1.5 - 2 seconds, distributed uniformly) during which all tiles remained static (black background, white letter), giving the subject an opportunity to find the next letter that they wished to select. Though the keyboard featured ten digits, digits were not part of the targets — they acted as visual distractors. We did not have a behavioral assessment of whether subjects were indeed performing the task correctly — i.e., looking at the correct letter. Hence, the accuracies we achieve in this instructed spelling task can be considered a lower bound estimate of what we might achieve in a free spelling context.

(a)

(b)

#### SIGNAL PROCESSING

All data analysis was performed using an in-house, Python-based pipeline (mostly based on open-source libraries and frameworks for scientific computing, including SciPy, Numpy, and Scikit-learn), and optimized for real-time signal processing. The results we present for this experiment are from offline analyses that mimic online requirements.

Visual stimulation

The sequence for Gold code 0 (letter A) was played on a photodiode at the top left corner of the screen. The first flash of the photodiode following the event sent by the stimulus computer was used to align the brain signals to the stimulus sequence. Gold codes consisted of a sequence of only two event types: short (1-bit, 10 or 100) and long (2-bit, 110, or 1100) flashes. The sequence for each letter was decomposed into their short and long flashes.

Brain signals

Brain signals were decimated to 240 Hz (a multiple of 60 Hz, the frequency of the stimulation sequence), high-pass filtered (IIR filter; stop 2 Hz, pass 5 Hz). The brain signal was next denoised via regression with the reference sensor signals (a form of first-order gradiometry), and all channels were then z-scored via the mean and standard deviation calculated from the first 20 seconds of the signal.

#### DECODING ALGORITHM

Canonical correlation analysis (CCA) is a well established statistical method for finding correlational linear relationships between two or more multidimensional variables (Hotelling 1936), which has recently been popularized in neuroscientific research (Hardoon 2004). CCA can find linear projections of the multidimensional data such that they are maximally correlated. In other words, CCA maximizes the correlation between linear transformations of the data. As CCA is a linear technique, its success in mapping stimuli to brain data depends on there being an approximately linear stimulus-response relationship; treating the brain as a linear time-invariant system has a long tradition in systems neuroscience, and has proven a useful approximation for the study of early sensory cortices.

On the one hand, we have the visual input which consists of a series of short and long flashes. On the other hand, we have brain signals from 40 channels. We wish to find a mapping from one to the other. We know that the brain response to a visual flash is fast and lasts less than 250ms. Following the approach of Thielen 2015, for each Gold code, we build two matrices, $$Ms$$ and  $$Ml$$, respectively for the short and the long flash sequences that together constitute the Gold code. The first column of  $$Ms$$ has 1's where the Gold code features single on-bit, and 0's elsewhere. All other columns of  $$Ms$$ are delayed versions of the first. In our implementation,  $$Ms$$ was of size (564 x 60), corresponding to (2.1 + 0.25) seconds at 240 Hz, and 60 delays (0.25 s at 240 Hz).  $$Ml$$ was of the same size, and built in the same fashion from the long flashes in the Gold code. Concatenated horizontally,  $$Ms$$ and  $$Ml$$ constitute the stimulus signal that is input to the CCA algorithm. The brain signal input to the CCA algorithm is not processed further — it simply consists of the brain signals, aligned in time with the Gold code, hence with dimensions 564 x 40 (channels).

Remarkably, CCA weights can be learned from the very first trial (letter) presented to the subject. This is virtually a "no training" BCI. Our algorithm outputs predictions from the second trial onward. To ensure robustness, the CCA weights are updated after each letter, until 16 letters have been viewed, at which point the weights are fixed. In this application, we only use the weights corresponding to the first canonical correlate.

To make a prediction for a new trial, the CCA weights learned to project the stimulus onto the first canonical correlate are applied to all 36 Gold codes, generating 36 candidate time courses. The CCA weights learned to project brain data onto the first canonical correlate are applied to incoming brain signals, generating the brain time course. The prediction of the algorithm is the candidate Gold code which, when projected onto the first canonical correlate, has the highest (Pearson) correlation with the projected brain time course. To assign a confidence to this prediction, we compute the leave-one-out z-score for each of the obtained correlation coefficients, indicating how much each correlation deviates from the distribution of the other 35. We then take the difference between the leave-one-out z-score of the highest correlation and the leave-one-out z-score of the second-highest correlation as our confidence metric — a proxy for how far the leading correlation is from the next best. There are other metrics one could use for a confidence score in an online BCI context, for example, one based on a softmax probability (as used below in our Sound ID experiment).

#### INFORMATION TRANSFER RATE

Information Transfer Rate, or ITR, is a classical performance metric for Brain Computer Interfaces. We compute it as follows (following Thielen 2015):

$$B=log2N+Plog2P+(1−P)log2(1−PN−1) [bits]$$
$$ITR=60/T∗B [bits/min]$$

Where N is the number of classes, P the accuracy in a range of 0 to 1, and T the number of seconds required for spelling one symbol. In the calculation, T includes both trial time (2.1 seconds max in our case) and inter-trial time (1.75 seconds in our experiment).

### RESULTS

#### ACCURATE, FAST DECODING OF THE LETTER IN FOCUS, WITH VIRTUALLY NO TRAINING

We present data from 5 subjects which we collected over two days. 4 of the subjects performed the experiment twice (Table 1).

All but 1 session consisted of n=240 trials (the session for subject 4 was cut short, and featured n=113 trials only). Because our algorithm can make predictions from the second trial onward, the accuracies reported here are the percentage of correctly identified letters (out of n - 1). The algorithm performed near perfectly (99.2%) for subject 2, session 2 (see confusion matrix in Fig. 2).

The ITR compares favorably with the ITR reported in Thielen 2015, which we closely modeled our experiment and decoding algorithm on. In fact, we obtain both a better average ITR (55.0 bits/min for our trial length, vs. 38.1 in the publication), and a better maximum ITR (78.8 bits/min for our trial length, vs. 50.0 in the publication), given fixed length trials (see Fig. 3a).

We were interested in whether trial length could be reduced, while still outputting accurate predictions. We thus looked at accuracy and ITR as a function of time (from 0.125 to 2.1 seconds), for all subjects (Fig. 3). Accuracy steadily increased with trial time (reaching significance for all subjects by 0.5 seconds). The ITR, which takes both the accuracy and the total time into account, levels off around 1.5 seconds.

(a)

(b)

#### WHERE DOES THE INFORMATION COME FROM?

We focus the remainder of the results on the best session that we recorded for Speller (highlighted in green in Table 1). How did our algorithm achieve such performance? In particular, is it the case that a single sensor carries most of the information, or is it necessary to look at several sensors simultaneously to arrive at accurate predictions?

First, we looked at the weights that CCA attributed to the different channels and delays (Fig. 4). The weights for the delayed flashes showed an expected sequence of peaks and troughs, reminiscent of a visual evoked field (although, these weights cannot be interpreted as such). The spatial weights on the brain channels show that some channels contribute more than others. However, to assess the true informational content of each sensor, we turn to an analysis in which we try to decode using small subsets of channels.

We performed the whole analysis using all subsets of 1, 2, 3, and 4 sensors; the results are shown in Fig. 5. While one sensor carried the most information (90% single sensor accuracy and presence in more than 20% of sets that achieve an accuracy above 90%), many sets that did not contain that sensor still performed very well. Hence, exact sensor placement was not necessary to achieve such performance. We find that a set of four sensors gets close to replicating the accuracy and confidence of the full set of sensors.

(a)

(b)

## Experiment 2: Sound ID

TL;DR: Can we, from their brain activity alone, determine what someone is listening to?

For speech, the answer is yes—our algorithm often figured it out within seconds. Songs are a bit harder, but the algorithm still does quite well. Check out two example runs below:

(Audio recording of Studs Turkel from StoryCorps)

In this case, we measured a person's brain activity while they listened to a story ("What Has Happened to the Human Voice?" by Studs Turkel, the late author and broadcaster), which they had chosen from a menu of options. Within a few seconds, our algorithm correctly identified what the person was listening to using only the measured brain signals. The animation above shows the "race" of probability estimates our algorithm made in real time as the story was played. The red line "winning" the race represents Studs Turkel; each color represents one of the other menu options, all of which were also speech recordings.

(P.S. We don't have the rights to play the song here, sorry. You'll have to hum it.)

In this figure, the person was listening to "Under Pressure", by Queen. It took the algorithm 19 seconds to figure it out given a menu of 10 possible song choices.

(For detailed methods and results, see below.)

## SENSOR POSITIONS

16 sensors were placed in 10-10 positions to cover temporal cortices (bilateral, 8 on each side). 4 extra sensors were used as reference sensors (as in Experiment 1). The 16 sensors on the scalp yielded a total of 32 channels.

Subjects were seated comfortably in front of a screen, and were given non-magnetic earphones (air tubes) for sound as well as a game-pad to interact with content on the screen. After setting up and calibrating the neural recordings, subjects were asked to passively listen to a short auditory stimulus (~2 minutes), typically consisting of two repetitions of the following sequence: polyphonic tones (~30 s, one tone every second), followed by 2 seconds of silence, followed by a short speech or song snippet (~30 s each). None of the stimuli used during the training phase were part of the main experiment. Subjects were then presented with a display featuring the names of 10 speech or song snippets and were asked to select and listen to one of them (60 seconds long, with some initial and final periods of silence). The selection of an audio snippet triggered the playback of the audio via the subject's earphones. All audio stimuli volume was normalized so that 99.95% of the magnitude of samples were below a set value, to roughly equalize volume for all audio snippets. For the duration of the audio, neural signal was continuously sent into a computer for real-time pre-processing (i.e., cleaning the data) and machine learning (i.e., classification algorithm) steps. The real-time output of the classification algorithm (i.e., confidence score) inferred what the subject was listening to based solely on their brain activity. The confidence score was fed back to the user by changing the displayed order of the 10 audio snippets on the screen every second, creating a leaderboard of the inference (Fig. 6). Upon completion of the audio snippet, the subject was free to choose the same or another audio snippet. We used the same approach to identify sound envelopes derived from both speech as well as lyrical songs (which in addition to speech also contain musical sounds that can dominate the sound envelope, thus likely making the classification more challenging).

## SIGNAL PROCESSING

Audio signal

To quickly estimate the envelope of each audio snippet (speech or song) — the low-level fingerprint whose representation in the brain signal we wish to learn — the absolute value of the audio waveform was taken and decimated to 100 Hz.

Brain signals

In real-time, the brain signal was buffered into 1 second, non-overlapping chunks. The chunks were also decimated to 100 Hz and high-pass filtered (IIR filter; stop 2 Hz, pass 5 Hz). The brain signal was denoised via regression with the reference sensor signals (a form of first-order gradiometry), and all channels were then z-scored via the mean and standard deviation calculated from the first 20 seconds of the incoming signal.

## DECODING ALGORITHM

The stimulus (audio envelopes) and the processed brain signal chunks can be thought of as paired multidimensional data sets, which in this application we seek to map to each other using only linear transformations. For this purpose, we can again use canonical correlation analysis (CCA), as we did in the Speller experiment. Specifically here, CCA provides a means for dimensionality reduction and spatio-temporal filtering, generating a compact representation of the common variation between the stimulus and the processed brain signal. The procedure we used for decoding is pictured in Fig. 7.

We learned a mapping between brain activity and the perceived stimulus by looking at a 240 ms window of brain signal following the stimulus, accounting for the latency of neural responses to external stimuli. This is achieved by systematically delaying the audio envelope with respect to the brain signals by 0-240 ms (in 20ms steps), creating a matrix of dimensions [# time samples x 13 delays] as an input to the CCA algorithm. The brain data, of dimensions [# time samples x 32 channels], was also delayed in the same fashion, yielding a [# time samples x 416] matrix, the other input to the CCA algorithm (where 416 = 32 channels x 13 delays each). We note that these delays applied to the brain data theoretically served a different purpose: to allow the CCA algorithm to "design" a data-driven temporal filter, tuned to isolate the information of most relevance to the perceived stimulus in the brain signals. CCA also acts as a spatial filter, combining information across channels to achieve maximal correspondence between the stimulus and the brain signal. All of these goals are jointly achieved by the algorithm, which reveals the shared signal components between the datasets and the most informative combination of spatio-temporal filters to apply to the brain signals, while accounting for physiological delays in auditory processing.

The CCA algorithm was initially trained on data from the training phase of each experiment, generating a set of projections for matching brain and stimulus data of decreasing correlation. We selected the weights of the first canonical correlate only, i.e. the weights resulting in maximal correlation. These weights were then used to project incoming 1 second chunks of brain data during the main experiment, generating a compressed representation of the data designed to maximally match the corresponding projection of the audio stimulus. As we knew that the audio stimulus was 1 of 10, we applied the weights returned by the algorithm to the corresponding chunk of each of the possible stimuli, thereby generating 10 candidate time courses. A Pearson correlation was then calculated between the transformed brain data and each of the transformed stimuli, for each chunk. Over the time of a trial, we continually updated the running average of these correlation values (as opposed to computing a growing window correlation, since the running average of the correlations focuses on higher frequency content). The average correlation values were passed into a softmax operator with a "temperature" variable T (affecting the slope of the exponential function) that intuitively controls how differences in correlation values are amplified in the output (see Eq. 1, average correlation values are denoted as  $$rj$$ , where  $$j$$ indexes the stimulus). The output of the softmax operator can be viewed as a probability score, which was then used to rank the snippets on screen. We operationalize the outcome of the softmax function as the confidence of our classifier. We used T=2, which was found to provide a good balance between stability (avoid bringing false positives up the leaderboard with high confidence) and sensitivity (reach maximum confidence fast) in our experiments. For display purposes, during the experiment, a snippet was considered a "hit" when it reached a probability above the value of 0.5 (chosen because no two snippets can concurrently reach this exact value).

$$p=erj/T∑jerj/Tp=erj/T∑jerj/T$$

EQ. 1

### RESULTS

In this section, we focus most analyses on a single session of the Sound ID experiment, which is representative of the results we observed across several subjects and sessions. This is for two reasons: 1) this experiment was designed as a single subject, single session paradigm that was used as a real-time demonstration; our goal was not to investigate cross-subject transfer learning in any depth; 2) we did not collect datasets in a fashion easily amenable to group analyses (e.g., we did not try to perfectly respect anatomical or fiducial landmarks across subjects). Overall, the demonstration was run online successfully for over 10 subjects.

## ONLINE DECODING OF SPEECH SNIPPETS

Using the real-time framework described in the Methods section, we were able to accurately determine which of 10 speech snippets a subject was currently listening to. By the end of a speech snippet (60 s of actual speech, with ~5 s silences before and after), we were typically able to decode with 100% accuracy which snippet the subject had chosen (see Fig. 8) We repeatedly achieved such perfect, or close to perfect (8 or 9 out of 10) performance, with several subjects across several sessions. Chance level identification would be 1/10 (10 possible envelopes). In a session in which a subject goes through all snippets once (as we typically instructed them to), any number of successes higher than 4 is highly significant (binomial p-value for 4 successes out of 10 is 0.013).

In our online experiment, a snippet was considered correctly identified when it reached a probability value of 0.5. For the session depicted in Fig. 8, the latency at which this happened was 18.3 +/- 4.5 s. However, this latency depends on the temperature of the softmax function (Eq. 1) and is thus not an accurate estimate of the underlying evidence. To estimate the latency at which the trace of the correct snippet was statistically distinguishable from all others, we looked at the averaged $$r$$-values up to the point of each time chunk and computed a z-score for the target snippet, based on the mean and standard deviation of all others. We considered that a snippet had been correctly identified when its z-score reached a value of 3, i.e. deviated significantly from the distribution of the other snippets (threshold is arbitrary). The latency estimated in this fashion for the same session was 11.5 +/- 7.9 s. A final note: if instead of using the running average of the correlation values we use a growing window correlation, the same method estimates a latency of 8.1 +/- 6.6 s for the same session. This suggests that the use of a growing window correlation, which is not blind to frequencies below 2 Hz, would be a better choice in this particular case.

## ONLINE DECODING OF SONG SNIPPETS

As initially hypothesized, we found that songs were more difficult to decode using solely the envelope information, with no further alterations, in our linear framework. In some subjects we were able to correctly identify up to 8 out of 10 of the songs, however in others, the performance was closer to 4 out of 10 (which is still better than chance, see above); and, compared to speech, even in cases where we identified the song correctly, the confidence of our prediction (Eq. 1) was typically much lower, reaching in most cases values of 0.5 rather than 1 in the same conditions. We present one of the best sessions we recorded (Fig. 9). Better feature creation, either guided by intuition (Drennan and Lalor 2019) or by machine learning, will be necessary to achieve reliable identification as we were able to do with speech.

## WHERE DOES THE INFORMATION COME FROM?

We focus the remainder of the results on the session of speech identification depicted in Fig. 8. How did our algorithm achieve such performance? In particular, is it the case that a single sensor carries most of the information, or is it necessary to look at several sensors simultaneously to arrive at accurate predictions?

Evoked activity to brief tones (ERFs) during the training phase

Our approach was motivated by approximating the early sensory cortex as a linear time-invariant system. Here we first verify that there are indeed significant responses evoked by short, impulse-like stimuli (auditory tones played during the training phase).

(a)

(b)

Epochs of brain data aligned on tone onset during the training session were averaged offline to visualize event-related fields (ERFs), assessing the quality of the data across the spatial montage of sensors on the head. As expected, the OP-MEG signals were observed to deviate from their baseline significantly across trials on average (Fig. 10), with deflections qualitatively visible in single trials.

Feature importance

Observing a significant event-related field to tones is a good clue to which sensors may carry reliable information about heard sounds. Other clues come from looking at the weights that the CCA algorithm attributed to the different channels and delays, and at the performance of the algorithm with a much-reduced set of channels.

CCA weights

We first looked at the weights assigned to the different head channels and delays by the CCA algorithm. It can be misleading to interpret the weights of a linear classifier as a feature importance or informational content metric (Haufe 2014), yet they can give us another clue to where the most informational channels may live. For this particular subject and session, high (absolute value) weights were assigned to sensors in the most lateral row of the helmet, closest to the ear (T7, T8, FT7 — see Fig. 11). We also assessed the frequency content with the Fourier transform of the weight data matrix. We found narrow frequency bands throughout the 0 - 8 Hz range with a band around 4 Hz contributing to the highest magnitude.

Decoding results with subsets of sensors

The result reported above (Fig. 8) used all 16 sensors (32 channels). We wondered whether a similar accuracy could be achieved with far fewer sensors — how many sensors, and which combinations would be needed to reach a performance on par with that produced when using all 16? We ran our exact procedure on all sets of 1 (16 sets), 2 (120 sets), 3 (560 sets), and 4 (1820 sets) sensors, as plotted in Fig. 12. We find that, although we achieve 10/10 with some pairs of sensors, the average confidence of the choice (as indicated by our softmax probability score) is lower than with the full set. Only with at least 4 sensors can we arrive at the accuracy and confidence level of the full set of sensors. The best performing set of 4 sensors, when taking into account accuracy and average softmax probability score at the end of each segment, is {FT8, TP8, T7, T8} for this session (accuracy 100%; mean softmax probability 0.927). In all sets of 4 sensors achieving an average softmax probability above 0.8, the most frequent sensors to appear are {T7, T8, FT8, FT7}. Among these sensors, some (but not all) had high amplitude weights in the CCA result derived from all sensors (Fig. 11).

(a)

(b)

## HOW WELL DOES THE BRAIN ACTIVITY TRACK AUDITORY ENVELOPES?

How well do we map brain activity to auditory stimulus envelopes? The correlation achieved for the first canonical correlate in the training data is on the order of ~0.3, i.e. the best mapping only explains about 10% of the variance in the auditory stimulus envelope. The correlation of the first CC in the training data is the ceiling of what we can expect to see in the test snippets. As expected, the correlation that we achieve between transformed brain signal and the transformed audio envelopes of the played test snippets is typically lower than the maximum (between 0.2 - 0.3), as exemplified in Fig. 13.

## Neuroscience Hardware at Kernel

The goal of our technology is to enable and accelerate population-level neuroscience, generating insights that are statistically powerful, universally relevant, and of the highest possible quality. The ideal technology candidate needs to be inexpensive and easy to use, allow for natural user environments (stimuli, interactions, motion), and be scalable — all without sacrificing the quality of the neural signal. To our knowledge, no existing non-invasive brain-recording technology achieves these ideals.

In 2016 we started building a world-class team of engineers, neuroscientists, and physicists. We systematically evaluated every state-of-the-art technology candidate, scrutinized existing commercial tools, built our own systems using the best available components, and forged academic collaborations as we mapped each possible path. No stone was left unturned. Most non-invasive methods for recording brain signals either measure directly the electromagnetic fields generated by collective neural activity or detect the resulting hemodynamics, which are local changes in blood flow and oxygenation due to cellular metabolism. After years of rigorous analysis, we identified two promising technologies that we believed we could push even further.

We decided to build them both.

Kernel Flux, based on Optically Pumped Magnetometers for Magnetoencephalography (OP-MEG), uses a collection of alkali vapor sensors to directly detect the magnetic fields generated by collective neural activity in the brain. Kernel Flow detects cortical hemodynamics using a compact and modularized realization of Time Domain, Near-Infrared Spectroscopy (TD-NIRS). We expect particular synergy from the combination of these orthogonal and complementary modalities, both longitudinally and directly via sensor and data fusion.

## Overcoming the Limitations of Existing Neural Recording Technology

Why OP-MEG?

Arguably, the highest quality electromagnetic signals—specifically, highest signal-to-noise ratio and finest spatial resolution (Gross 2019)—are currently obtained with superconducting quantum interference device based magnetoencephalography (SQUID-MEG). This is because, in comparison to the electrical signal, the magnetic signature of neural activity directly exits the scalp without distortion. However, user motion and operation in natural environments—two key goals for bringing neuroscience out of the lab and into the world—are currently out of the question in SQUID-MEG systems because of their inherent mobility restrictions. MEG based on optically-pumped magnetometry (OP-MEG) operates with miniaturized, wearable insulation (in contrast to massive cryogenic dewars for SQUID-MEG) and allows placement of the sensors close to the scalp. This allows more natural head motion during data recording and localized signal quality comparable to, or surpassing, SQUID-MEG (Boto 2018, Barry 2019, Iivanainen 2017). However, to date, OP-MEG does not offer dense full-head coverage and (like most SQUID-MEG systems) requires the person to be sealed in a multilayer, passively-shielded vault in order to suppress ambient magnetic fields. Overall, high installation and operational cost have prevented population-scale MEG studies. Electroencephalography (EEG), which measures electrical activity on the scalp, is a leading alternative. However, brain signals picked up by EEG electrodes are distorted by the electrically-conducting interior of the head, which degrades the signal-to-noise ratio and spatial resolution compared to MEG. As a further limitation, the highest quality EEG signals are obtained with dense, wet electrode arrays, which require a laborious setup and create an uncomfortable user experience.

Why TD-NIRS?

For hemodynamic signals, fMRI provides the highest quality non-invasive signal (specifically, best coverage and spatial resolution). However, fMRI requires bulky, million-dollar, room-sized equipment. The current best alternatives use optical methods, such as high-density diffuse optical tomography (HD-DOT). While the spatial resolution achieved by HD-DOT (~13 mm) does not yet reach that of fMRI (~6 mm), it does offer many advantages over fMRI that bring it closer to the ideal non-invasive imaging device: lower cost, increased movement tolerance, the ability to operate in naturalistic environments, and the ability to measure from multiple interacting people simultaneously (Eggebrecht 2014). However, high-quality HD-DOT systems are currently either bulky, unscalable, and research-grade with significant set-up time, or lack full head coverage. The highest quality optical signal currently available is achieved with time-domain near-infrared spectroscopy (TD-NIRS)—however, these systems are currently expensive, experimental, offer low channel count, and are also not available at scale (Lange and Tachtsidis 2019).

What's Next?

Kernel's proprietary technology is designed in-house, and built from the ground up, to solve the limitations of modern non-invasive neural recording technology. Kernel Flux is based on sensors capable of detecting the extremely small changes in magnetic fields resulting from a brain's intrinsic electrical activity. Our Flux devices record these magnetic fields across the whole head in natural environments, with hundreds of sensors offering high signal quality and spatial resolution, on par with the state of the art in MEG. Kernel Flow takes advantage of the relative transparency of the skull and brain tissue to near-infrared light by beaming photons through the skull and measuring their scattering and absorption, allowing inference about brain hemodynamics. It offers the resolution and sensitivity of state-of-the-art hemodynamic systems across the top layers of cortical tissue but can be manufactured for only a few thousand dollars and is largely scalable. Both technologies, Flux and Flow, are based on a lightweight headgear that allows for natural head motion, a wide variety of stimuli and peripherals, various natural environments, and user interaction.

We believe that, in combination and at scale, these technologies will revolutionize the capture of high-quality neural signals for the study of the brain. They will provide the largest and richest neural datasets ever recorded.

## Introducing "Kernel Flux", an OP-MEG System

Kernel Flux uses a collection of alkali vapor sensors (optically pumped magnetometers, OPMs) to directly detect the magnetic fields generated by collective neural activity in the brain while allowing for comfortable head motion. Each Kernel Flux OP-MEG system was designed from the ground up to work as an integrated system optimized around the user's experience. Flux provides a natural user experience during recording sessions, with relevance to natural home or office contexts, and for extended periods of time. We aim to support full-head OP-MEG in as many conditions as possible, from watching movies while reclining, to having natural conversations with multiple users, to interactive experiences with rich digital media.

A 48-module, full-head-coverage Flux system (as illustrated in Fig. 14) provides 720 channels of magnetometer data and operates without a multilayer, sealed-door magnetically-shielded room. The large number of channels provided by Flux allows for high-performance noise rejection in environments with head motion and extensive audio-visual peripherals. Kernel Flux is a heterogeneous system of systems, comprising several technologies: microfabricated alkali vapor cells, semiconductor lasers, compact optics, high dynamic range real-time control systems, and low-noise electronics. A real-time FPGA-based controller uses proprietary active magnetic shielding to counter the effects of natural user motion, allowing for comfort without requiring physical restraint of the user. Flux employs hundreds of sensors, configured in an array surrounding the head to sample and recover virtually all spatial frequencies of neural magnetic fields accessible at a 7-9 mm scalp standoff (Iivanainen 2019).

Because MEG sensors respond to magnetic fields regardless of their origin, it is critical to distinguish those arising inside the brain from those caused by motion through background field gradients or other ambient sources. These include muscular and other biomagnetic activity, power line noise, and peripheral electronic artifacts. Flux leverages dense sensor coverage across the head and close to the scalp to provide the signal processing software with sufficient information to allow ambient noise discrimination. In order to extract neural signals from our sensor readings, we take advantage of the differing spatiotemporal properties of the magnetic fields generated by distant ambient noise sources compared to those from nearby neural sources. Enabled by our full-stack approach to sensor development, we have built a robust, real-time control system and noise rejection pipeline that operates holistically across the entire Flux system, incorporating hardware, firmware, and back-end software.

## Technical Specifications: "Flux"

SPECIFICATION DETAIL

System architecture & measurements

• Magnetoencephalography (MEG): magnetic fields generated directly by electrical currents in the brain are detected by head-mounted optically pumped magnetometers (OPMs) near the scalp.
• Peripheral user responses (interface devices such as keypads)
• Immersive audio/visual user stimulus
• Auxiliary stimulus-tracking data (trigger events, video, etc.)
• 42 to 48 sensor modules per headset, depending on head size
• Controllers are desktop-mounted and tethered via cable assemblies (one per module).

Channel count

• Each module provides 15 channels from 9 OPMs. A channel is defined here as a time-series representation of one vector component of the magnetic field.
• 48 modules in a fully-loaded system provide 720 channels from 432 OPMs.
• Sensor layout within each module: 3x3 with 5 mm spacing.

Sensor module performance

• 15 fT / √Hz sensitivity
• 1 to 200 Hz bandwidth

Dimensional aspects

• 7.3 to 9 mm scalp standoff to sensitive vapor
• 15 cm2 module footprint (1.7 cm2 per OPM)
• 2 W power dissipation per module

• < 1.6 kg fully-populated headset weight
• Integral strain relief eliminates cable torque for neck comfort.

## Introducing "Kernel Flow", a TD-NIRS System

Kernel Flow is a time-domain near-infrared spectroscopy (TD-NIRS) system for detecting the hemodynamics of the cerebral cortex. Traditional "continuous wave" (CW) NIRS devices apply light to the head continuously, which then scatters throughout and is detected at various locations upon exiting the head. Changes in the detected light intensity allow inference of optical-property changes inside the head, including blood flow and oxygenation (hemodynamics) in the cortex. Crucially, time-domain systems capture a much richer signal by applying the light in short pulses and capturing precisely the arrival time distribution of scattered photons for each pulse. On average, photons that arrive at the detector later traveled deeper through the tissue (see Fig. 16 below). Therefore, this "time-of-flight" (ToF) measurement in addition to the overall intensity reveals additional depth-dependent information about the optical properties of the tissue.

Kernel has architected a new class of TD-NIRS system that maintains the performance of research-grade systems while reducing the footprint to a small module. A direct comparison of a typical research TD-NIRS system and our TD-NIRS hardware is shown in Fig. 15. This miniaturization of the TD-NIRS system was accomplished through a deep understanding of the lasers and detectors in addition to a thoughtful optimization of the remaining essential features of the system. We have been developing these components through a series of iterations where we systematically increased the level of integration of our devices towards a fully wearable product at scale.

Our first prototype system guides the light to the head via fibers tethered to the headgear (the system is "fiber-coupled"), and contains 8 optical modules and 8 EEG electrodes. We built this prototype from the ground up as a research-equivalent system. This has been our testing platform for neuroscience development as we iterate on Kernel custom designed components. In this fiber-coupled system, all electronics are benchtop and kept far from the user. The key component inside of this optical module is an application-specific integrated circuit (ASIC) that allows us to do highly-parallel TD-NIRS. In the coming months, we will test the next iteration of our Flow hardware, which has shrunk this box into a compact, low-power module that is suitable for building into a wearable system. This advancement is based on a Kernel-designed detector ASIC and improvements in high-power, temporally-narrow laser pulse circuits. These improvements in integration will allow for the entire box to be fit into a single module (Figs. 15b,d).

All of the components in the Flow optical modules are designed to leverage the well-established supply chains of the consumer electronics industry. As such, we will achieve a level of performance, cost, and scale for optical brain imaging that is unparalleled. Our detector ASICs are designed by Kernel and fabricated by a world-leading foundry with a proven track record for high-volume manufacturing of optical sensors. Our laser driver circuitry and electronic systems use standard printed circuit board assembly (PCBA) processes and can be produced by dozens of vendors throughout the world. Our optical systems have been simplified such that simple molded plastic optics will achieve our performance requirements. It is through this combination of low-cost and high-volume manufacturing processes that we will be able to achieve a scale of Flow systems that exceeds all existing research-grade fMRI, EEG, and NIRS systems in a short time scale.

From a data quality perspective, the time-of-flight information in our recordings enables several key improvements in the quality of our signal compared to CW approaches such as HD-DOT. As can be demonstrated with Monte Carlo simulations (see Fig. 16 for an illustration) of photon propagation, absorption, and scattering in the head (Fang 2009), photons that arrive at the detector earlier tend to pass through more superficial tissue (capturing physiological signals such as heart rate or respiration via blood flow in the skin, but not neural signal), while "later" photons tend to penetrate deeper into the brain. Using the information from the photon arrival times allows us to build algorithms and pipelines to correct for or minimize a variety of different artifacts, such as physiological (e.g., heart rate, respiration), experimental (e.g., motion), and hardware (e.g., laser power drift, timing drift, temperature-related artifacts). Additionally, each source-detector pair contains information from multiple depth levels simultaneously. Accordingly, we expect significantly improved performance, sensitivity (Selb 2005), and depth information with Kernel Flow as compared to HD-DOT on account of this additional ToF information per channel, on top of the overall increased number of channels (>1300 for Kernel Flow as compared to 1200 for state-of-the-art HD-DOT (Eggebrecht 2014)). Finally, the ToF distributions can be parametrized and calibrated, allowing the extraction of absolute hemodynamic measurements (such as absolute oxygenated hemoglobin concentration) of the underlying tissue, thus also enabling longitudinal comparison across days (Lange and Tachtsidis 2019). This compares favorably to fMRI, which can only capture relative changes in optical properties during an experiment.

## Technical Specifications: "Flow"

SPECIFICATION DETAIL

Measurement items

• Tissue oxidization index
• Normalized tissue hemoglobin index
• Oxygenated hemoglobin
• Deoxygenated hemoglobin
• Total hemoglobin
• Bulk tissue optical properties (absorption and scattering coefficients)
• We collect multichannel EEG as a complementary signal

Sampling rate

• Up to 200 Hz (optical), 1 kHz (EEG)

Light source

• Dual laser diodes

Measurement method

• Spatially & temporally resolved spectroscropy
• Modified Beer-Lambert Law (MBLL)
• Temporal optical property fitting

Data memory

• Data is streamed to a client device (e.g PC, tablet, phone) and not stored locally on the device. Data storage is limited by the client device.

Output signal

• USB connection for raw data

Measurement probes

• Tiled directly on the head. No cabling or fibers required.

Power supply

• USB-C (USB-PD) connection

Maximum number of optical modules

• 56 modular nodes

Total TD-NIRS channels (source detector pairs)

• > 1344 time-resolved channels

Total EEG electrodes

• 8

• < 1.5 kg fully populated headset

## References

Barry, D. N., et al. (2019). Imaging the human hippocampus with optically-pumped magnetoencephalography. NeuroImage, 203, 116192.

Boto, E., et al. (2018). Moving magnetoencephalography towards real-world applications with a wearable system. Nature, 62, 8909.

Drennan, D. P., & Lalor, E. C. (2019). Cortical Tracking of Complex Sound Envelopes: Modeling the Changes in Response with Intensity. eNeuro, 6(3).

Eggebrecht, A.T., et al.(2014). Mapping distributed brain function and networks with diffuse optical tomography. Nature photonics, 8(6), p.448.

Fang, Q., et al. (2009). Monte Carlo Simulation of Photon Migration in 3D Turbid Media Accelerated by Graphics Processing Units." Opt. Express. 17(22), pp 20178-20190.

Gross, J. (2019). Magnetoencephalography in Cognitive Neuroscience: A Primer. Neuron, 104(2), 189—204.

Hardoon, D. R., Szedmak, S. and Shawe-Taylor, J. (2004) Canonical correlation analysis: an overviewwith application to learning methods.Neural computation, 16(12):2639—2664.

Haufe, S., Meinecke, F., Görgen, K., Dähne, S., Haynes, J.-D., Blankertz, B., & Bießmann, F. (2014). On the interpretation of weight vectors of linear models in multivariate neuroimaging. NeuroImage, 87, 96–110.

Hotelling, H. (1936) Relations between two sets of variates. Biometrika, 28(3/4):321—377.

Iivanainen, J., et al. (2017). Measuring MEG closer to the brain: Performance of on-scalp sensor arrays. NeuroImage 147 (2017): 542-553.

Iivanainen, J., et al. (2019). Sampling theory for spatial field sensing: Application to electro- and magnetoencephalography.

Koskinen, M., Viinikanoja, J., Kurimo, M., Klami, A., Kaski, S., & Hari, R. (2012). Identifying fragments of natural speech from the listener's MEG signals. Human Brain Mapping, 34(6), 1477–1489.

Lange, F. and Tachtsidis, I. (2019). Clinical brain monitoring with time domain NIRS: A review and future perspectives. Applied Sciences 9, no. 8: 1612.

Lange, F., et al. (2019) Clinical brain monitoring with time domain NIRS: A review and future perspectives. Applied Sciences 9, no. 8: 1612.

Nagel, S., Dreher, W., Rosenstiel, W., &amp; Spüler, M. (2018). The effect of monitor raster latency on VEPs, ERPs and Brain-Computer Interface performance. Journal of Neuroscience Methods, 295, 45—50.

Selb J.J., et al. (2005) "Improved sensitivity to cerebral hemodynamics during brain activation with a time-gated optical system: analytical model and experimental validation." Journal of Biomedical Optics. 10(1), 011013.

Thielen, J., et al. (2015). Broad-Band Visually Evoked Potentials: Re(con)volution in Brain-Computer Interfacing. PloS One, 10(7), e0133797