Brain-Computer Interaction and Silent Speech Recognition on Decentralized Messaging Applications

Arteiro, Luís; Lourenço, Fábio; Escudeiro, Paula; Ferreira, Carlos

doi:10.1007/978-3-030-50732-9_1

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1226))

Included in the following conference series:

International Conference on Human-Computer Interaction

4754 Accesses
2 Citations

Abstract

Peer-to-peer communication has increasingly gained prevalence in people’s daily lives, with its widespread adoption being catalysed by technological advances. Although there have been strides for the inclusion of disabled individuals to ease communication between peers, people who suffer hand/arm impairments have scarce support in regular mainstream applications to efficiently communicate privately with other individuals. Additionally, as centralized systems have come into scrutiny regarding privacy and security, development of alternative, decentralized solutions has increased, a movement pioneered by Bitcoin that culminated on the blockchain technology and its variants.

Within the inclusivity paradigm, this paper aims to showcase an alternative on human-computer interaction with support for the aforementioned individuals, through the use of an electroencephalography headset and electromyography surface electrodes, for application navigation and text input purposes respectively. Users of the application are inserted in a decentralized system that is designed for secure communication and exchange of data between peers that are both resilient to tampering attacks and central points of failure, with no long-term restrictions regarding scalability prospects. Therefore, being composed of a silent speech and brain-computer interface, users can communicate with other peers, regardless of disability status, with no physical contact with the device. Users utilize a specific user interface design that supports such interaction, doing so securely on a decentralized network that is based on a distributed hash table for better lookup, insert and deletion of data performance. This project is still in early stages of development, having successfully been developed a functional prototype on a closed, testing environment.

You have full access to this open access chapter, Download conference paper PDF

Investigation of different classifiers and channel configurations of a mobile P300-based brain–computer interface

Article 29 May 2017

A Framework for Brain-Computer Interfaces Closed-Loop Communication Systems

The Developments of Text Entry and Recognition Using Brain-Computer Interfaces

Keywords

1 Introduction

With the advent of messaging applications, there is an apparent lack of support for disabled individuals to communicate on these platforms and use such applications to interact efficiently with other people. There has been significant progress addressing such issues mainly through electroencephalography (EEG) sensors capable of recording brain activity that, powered by machine learning, can translate it into actions [6]. Similarly, electromyography (EMG) electrodes can measure activity from muscle responsible for speech, enabling silent speech recognition (SSR) [11], that can be used for private and silent text input into the application.

Additionally, privacy and security of data within centralized systems have been under the spotlight, especially on mobile applications [2], contributing to the rise of the development of alternative, decentralized solutions have increased, a movement first created by Bitcoin [8] that effectively stapled blockchain as a growing technology. The latter has shown tremendous potential and applicability on a wide range of scenarios, boosted by the introduction of smart contracts by Ethereum [13], going beyond currency exchange. These assets allow trustless, secure, peer-to-peer value transfer that can easily be applied to the messaging paradigm, thus allowing immutable, secure messaging between nodes. The surge of this technology paved the way for variants that maintain blockchain’s main advantages but change their scope. These have set a precedent and strides for decentralization across many fields [10].

2 Related Work

Regarding brain-computer interfaces (BCI) using non-invasive EEG, research projects have emerged allowing users to control virtual keyboards and mouses, empowering individuals with motor disabilities in their interaction with computers. Regarding app’s navigation using this approach, there are experiments on cursor movement that allow a user to move it by a BCI. Tests with different approaches within movements for one or two dimensions showed an accuracy of approximately 70% [5]. A similar project utilizes a hybrid near-infrared spectroscopy-electroencephalography technique trying to extract four different types of brain signals, decoding them into direction symbols, namely, “forward”, “backward”, “left”, and “right”. The classification accuracy for “left” and “right” commands was 94.7%, for “forward” was 80.2% and 83.6% for “backward” [6]. Later in this document, it will be shown that our system uses a similar approach for the set of commands used for navigation in the platform.

Since this project concerns a messaging platform, text input is a key feature that must offer support for the target audience - disabled subjects. Experiments have been made using EEG to input text using virtual keyboards, allowing a user to make binary decisions and iteratively splitting a keyboard in half until one letter remains, having a spelling rate of about 0.5 char/min [1]. Another approach, which instead of doing binary choices, allows the selection of one between six hexagons, enabling the input of about 7 chars/min [12]. Although input speed is not the central focus of these projects, it must be user-friendly for instant messaging to be conceivable. Yet another studied approach to allow text input was through a silent speech interface (SSI), which also enables input by people with speech impairments (e.g. laryngectomy). Researches on this field still have a high word error rate (WER), as some have achieved around 50% WER for their SSR attempt, using a vocabulary of 108 words through the EMG-UKA corpus [11]. Another group proposed another solution, using the same corpus, attaining around 30% char accuracy [9]. Studies for this approach do not yield satisfactory results for a wide-range vocabulary, although appearing to be more promising when compared to EEG-based interfaces.

Regarding messaging platforms and their dedicated backend, there have been attempts to develop decentralized messaging applications that aim to address many issues present in centralized applications that are usually perceived as secure (e.g. Telegram) and allow decentralized authoritative messaging [7]. Applications that follow the classic blockchain approach towards the messaging panorama include Adamant, which showcases an example of a fully anonymous, private and secure messaging between peers. Relying on its own token that is used for each transaction, such application relies on having no personal or sensitive data being extracted from the user [4]. A decentralized solution which follows a distributed hash table (DHT) approach is Jami, where the DHT is used to locate peer addresses and establish peer-to-peer connection without any personal sensitive data associated, with undelivered messages being stored locally.

3 Our Solution

The proposed solution encompasses a new approach to the interaction of an application that partakes in a decentralized ecosystem that is user-centric, with high throughput and maintaining a bottom-line of privacy of data and message exchange. It envisions a way for people with arm/hand to be able to communicate with anyone. It is, therefore, divided into two distinctive counterparts: human-machine interaction, through BCI and SSI, and decentralized communication between peers. This is further outlined in Fig. 1.

3.1 Human-Machine Interaction

Aiming towards a hands-free controlled application with text input through silent speech, this project proposes a synthetic telepathy-based solution to messaging. It allows individuals with arm motor disabilities to interact with the platform and also creates a multitude of use cases for people without these limitations (e.g. privacy and mitigation of background noises when using speech-to-text features in public; multitasking while navigating with mental commands).

Brain-Computer Interface. For the BCI component in this project, Emotiv EPOC+ is being used for EEG recordings and the Emotiv BCI software for training and classification of the data gathered from the user. This device and related software are of great advantage since it accelerates the overall project development considering it already has predefined mental commands for a subject to train, which can be applied for the user interface (UI) navigation. Considering the UI must be accessible and intuitive for both users with or without impairments, a new design idea had to emerge to accomplish these goals. Thus, besides from regular click/tap to interact, the user can also navigate through different chat rooms using four different commands, “pull”, “push”, “left”, and “right”. Simulating a three-dimensional space, the “pull” and “push” commands pull closer to and push away from the user view, respectively (see Fig. 2), and the remaining “left” and “right” commands are used to slide through a carousel-type view with each chat room, always fixing a chat room in the middle, until the user pulls the one selected (see Fig. 3). Furthermore, more commands may be added for others minor tasks, or these already implemented commands can correspond to multiple actions depending on the state of the application or feature that is being used.

Silent Speech Interface. Regarding SSI, it is being used the aforementioned EMG-UKA corpus for implementation and testing of the system in its early stages. Approaches for this system are still being studied, and the first experiments are being conducted on a session-dependent model attempting to classify phones (speech sounds) from EMG data. Further developments will allow users to input full words into the system. In order to reach a broader audience and using the exiting corpus, the system is tailored for the English language.

3.2 Decentralized Backend

As it was aforementioned, individuals who make use of the platform, whether these suffer from any impairment or not, will partake in a decentralized ecosystem that is intended to provide secure peer-to-peer communication with data integrity. The system is agent-centric, meaning each user runs a copy of the backend, have their own identity and their private and public shared data. Being on the same encrypted peer-to-peer network entails each user can communicate with each other directly to maintain its integrity. Having each peer hold the same application bundle with the respective logic, it is possible to verify other peer’s transactions and data created. Each data has proof of authorship (i.e. a signature). Since every data is recorded and validated by the peers within the system, it is tamper-resistant - this showcases data integrity.

Data resilience is also important in any messaging platform. That is, data must not get lost when users go offline. To address this issue, each piece of public data is witnessed, validated according to the system’s logic that is present in every user and stored by a random selection of peers. This makes it that the community and cooperating parties detect invalid data, gossip such evidence of harmful agents and take action to deal with malice data/users. This is a synonym to peer data replication and data validation.

The system shares similarities with Jami in a way that it is based on a distributed hash table (DHT). This table, contrary to blockchain-based applications, is not replicated in each node. To attain a higher transactions per second (TPS), throughput and lookup, each peer stores a segment of the table. This DHT is where the data resides and, in cases where users go offline, is stored to later be retrieved by the recipient. The implementation of this table is based on top of the Holochain framework, allowing for easier data replication and validation between peers. As more users join in the network, more computational power is contributed to the environment and data replication is more redundant, allowing the system to scale as more users partake within the system.

User Record. Each user has their own chain signed with their rightful private key. This chain can be thought of as a record of all the messages/data the user has produced and exchanged within the app and is stored on their own device. Each individual has their own digital identifier using the public/private key pair cryptographic method. The combination of these two keys is imperative for the user online identity and for communication with other peers. The public key is shared with other participants. Each user proves authorship of their data through digital signatures stemming from their private key. Any tampering is promptly detected by simply using the correspondent public key to verify the signature. If such validation fails, the data is considered invalid and this finding is gossiped and broadcasted to the rest of the network. The distributed hash table is where all public data resides in. The word “public” is used very loosely. Every entry is hashed so as to make it untraceable so it is not necessarily public as in it is available for everyone to see. Each and every entry that resides within the DHT are essentially user chain entries that are merely marked as public.

Validation and Direct Peer-To-Peer Communication. The security, state and integrity of the whole system is maintained by logic that is hard-coded and bundled in every node. Data integrity is ruled by these rules which, in turn, uphold the security of the system. All data, whether it is in the user private chain or in the public DHT, is validated according to these rules. Normal messaging occurs when two users are online and is made through direct peer-to-peer connection in an end-to-end encrypted channel by resolving each agent’s IP address. This method only works when both peers are online. To circumvent this, private messages are encrypted using the recipient’s public key and published to the DHT, to later be retrieved by the recipient when they are back online.

Most of these networking protocols and distributed computing scenarios are handled by the Holochain framework, from which the architecture is based from. Not only it resolves many concerns regarding encryption but also allows for interoperatibility with other systems that take part within the Holochain network.

4 Preliminary Evaluation

As this is a work-in-progress and with no tangible results yet yielded, it is possible to define an assessment strategy for it. BCI evaluation process will consist of usability tests performed by hand/arm impaired individuals to gather feedback on how the platform improves the user’s communication and overall navigation compared with similar apps. Prior to testing, the subjects ought to train the BCI headset and tailor it to their mental activity for the model to recognize the commands from a specific user. Afterwards, subjects will be asked to perform a planned set of actions that entails navigating in the platform and interact with it. Lastly, a survey will be conducted following the System Usability Scale [3], with 10 items answered in a degree of agreement or disagreement with each statement on typical five-level Likert scale. For SSI, the datasets used for each model created will be divided into smaller training and testing sets, allowing to evaluate their accuracy, aiming to obtain the lowest WER possible.

The Holochain framework provides support for both unit and functional tests. These are useful to provide a measurement of code coverage and respective reliability. End-to-end tests will ultimately leverage a way to measure performance as the solution scales. The throughput is tracked as the number of users grows and its consistency is assessed. Early results have showcased successful data creation on the public DHT on a small number of users (five) with a step duration of 1000 ms, where the period is halved at every stage. In this, the stress test is conducted indefinitely and increases pace at every stage. Additionally, a small increase of throughput was noticed when more users joined the network, from two peers to five. These experiments have not shown any sensible traces of data corruption nor delay, although more tests on a higher scale are be needed to form more conclusive results.

5 Conclusion

This project mitigates communication barriers between subjects with or without hand/arm impairments and allows the inclusion of everyone into a single messaging system, using a new strategy to this type of platform, applying a synthetic telepathy approach for the interaction with it. Additionally, if the project produces positive results, the range of applicability for both BCI and SSI approaches can extend beyond the messaging paradigm.

The system that was outlined showcases a new proposal on the decentralized messaging platform, either from human-machine interaction standpoint or the distributed ecosystem counterpart. Following an user-centric approach towards the problem, utilizing a DHT with ad hoc peer consensus instead of data replication in its entirety with global consensus is more adequate to the messaging panorama. These design choices have a direct effect on performance and throughput of the system: more TPS can occur whilst the exchange of data is still secure, private and resilient against tampering attacks. This essentially means that scalability does not pose as a problem on a long-term perspective, unlike many solutions that are based on the classic blockchain concept.

6 Future Work

With the drafted architecture and design choices, there is still room for improvement for future project prospects so as to be providable to a broader audience and maintain a performance bottom line. On the human-machine interface counterpart, for the BCI component, other commands can be mapped and used by the platform, giving the user more options for navigation and control over the application, for a wider set of features, but keeping the interaction as intuitive as possible. SSI-wise, aiming for a more natural and reliable text input system for any user, efforts will be made for a session and user-independent system, enabling the its usage by multiple users and sessions without much accuracy fluctuations.

One of the objectives regarding the decentralized backend would be to go beyond message exchange and also extend to file sharing. Approaches could be developed similar to InterPlanetary File System (IPFS). However, this topic is sensitive since file storage takes up more space than common messages. Replication of data between peers would need to be addressed differently, perhaps through parallel channel or DHT altogether. Furthermore, voice calling could be another feature present within the application that could co-exist and be made through node-to-node channels.

References

Birbaumer, N., et al.: A spelling device for the paralysed. Nature 398, 297–298 (1999). https://doi.org/10.1038/18581
Article Google Scholar
Bouhnik, D., Deshen, M., Gan, R.: Whatsapp goes to school: mobile instant messaging between teachers and students. J. Inf. Technol. Educ. Res. 13(1), 217–231 (2014)
Google Scholar
Brooke, J., et al.: SUS-a quick and dirty usability scale. In: Usability Evaluation in Industry, vol. 189, no. 194, pp. 4–7 (1996)
Google Scholar
Evgenov, P., et al.: White paper: Adamant messaging application. Technical report (2017). https://adamant.im/whitepaper/adamant-whitepaper-en.pdf
Fabiani, G.E., McFarland, D.J., Wolpaw, J.R., Pfurtscheller, G.: Conversion of EEG activity into cursor movement by a brain-computer interface (BCI). IEEE Trans. Neural Syst. Rehabil. Eng. 12(3), 331–338 (2004). https://doi.org/10.1109/TNSRE.2004.834627
Article Google Scholar
Khan, M.J., Hong, M.J., Hong, K.S.: Decoding of four movement directions using hybrid NIRS-EEG brain-computer interface. Front. Hum. Neurosci. 8, 244 (2014). https://doi.org/10.3389/fnhum.2014.00244. https://www.frontiersin.org/article/10.3389/fnhum.2014.00244
Article Google Scholar
Leavy, T.M., Ryan, G.: Decentralized authoritative messaging, 27 November 2018. US Patent 10,142,300
Google Scholar
Nakamoto, S., et al.: Bitcoin: a peer-to-peer electronic cash system (2008). https://bitcoin.org/bitcoin.pdf
Rosello, P., Toman, P., Agarwala, N.: End-to-end neural networks for subvocal speech recognition. Stanford University (2017). http://www.pamelatoman.net/wp/wp-content/uploads/2018/06/subvocalspeechrecognitionpaper.pdf
Wan, Z., Guan, Z., Zhuo, F., Xian, H.: BKI: towards accountable and decentralized public-key infrastructure with blockchain. In: Lin, X., Ghorbani, A., Ren, K., Zhu, S., Zhang, A. (eds.) SecureComm 2017. LNICST, vol. 238, pp. 644–658. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-78813-5_33
Chapter Google Scholar
Wand, M., Janke, M., Schultz, T.: The EMG-UKA corpus for electromyography speech processing. In: INTERSPEECH-2014, pp. 1593–1597 (2014)
Google Scholar
Williamson, J., Murray-Smith, R., Blankertz, B., Krauledat, M., Müller, K.R.: Designing for uncertain, asymmetric control: interaction design for brain-computer interfaces. Int. J. Hum. Comput. Stud. 67, 827–841 (2009). https://doi.org/10.1016/j.ijhcs.2009.05.009
Article Google Scholar
Wood, G., et al.: Ethereum: a secure decentralised generalised transaction ledger. Ethereum Project Yellow Paper 151(2014), 1–32 (2014)
Google Scholar

Download references

Author information

Authors and Affiliations

Polytechnic of Porto - School of Engineering (ISEP), 4200-072, Porto, Portugal
Luís Arteiro, Fábio Lourenço, Paula Escudeiro & Carlos Ferreira

Authors

Luís Arteiro
View author publications
Search author on:PubMed Google Scholar
Fábio Lourenço
View author publications
Search author on:PubMed Google Scholar
Paula Escudeiro
View author publications
Search author on:PubMed Google Scholar
Carlos Ferreira
View author publications
Search author on:PubMed Google Scholar

Corresponding author

Correspondence to Luís Arteiro .

Editor information

Editors and Affiliations

University of Crete and Foundation for Research and Technology – Hellas (FORTH), Heraklion, Crete, Greece
Constantine Stephanidis
Foundation for Research and Technology – Hellas (FORTH), Heraklion, Crete, Greece
Margherita Antona

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Arteiro, L., Lourenço, F., Escudeiro, P., Ferreira, C. (2020). Brain-Computer Interaction and Silent Speech Recognition on Decentralized Messaging Applications. In: Stephanidis, C., Antona, M. (eds) HCI International 2020 - Posters. HCII 2020. Communications in Computer and Information Science, vol 1226. Springer, Cham. https://doi.org/10.1007/978-3-030-50732-9_1

Download citation

DOI: https://doi.org/10.1007/978-3-030-50732-9_1
Published: 10 July 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-50731-2
Online ISBN: 978-3-030-50732-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics