• How can Apache Wave Operational Transformations be encrypted? (Part 2)

    Article to be released soon.

  • Code walkthrough

    Article to be released soon.

  • GSoC 2016 Project Overview

    The code can be downloaded from this git branch (compare changes, my commits).

    Synopsis

    Apache Wave is a software framework for online real-time collaborative edition. Similarly to Google Docs and Etherpad, it uses Operational Transformations to manage user collaboration.

    During this Google Summer of Code we have provided end-to-end encryption to wave documents. This means that only the people who know a particular key, have access to the documents and can edit and retreive the contents of a them, protecting in that way the privacy of Wave users.

    We have based our work on this awesome paper that explains how some researchers encrypted Google Docs’ Operational Transformations. We have took their ideas and adapted them to Apache Wave’s architecture.

    Produced work

    To sumarize the work we have produced, we have recorded this video:

    To encrypt the messages we have used the algorithm AES-GCM from the WebCrypto API. We have used JsInterop bindings to call it from our Java classes.

    Messages are properly encrypted and decrypted when they are sent and received by the clients. The texts of a documents are also properly recovered from the server’s snapshot. Everything seems to run smoothly, except for some annoying bugs that appear sparsely, and a serious user interface bug that prevents users that did not created the wave to decrypt its snapshot. My mentor and me think that we can fix them quickly, just after the program has ended.

    How to use it

    Building our modified version of Wave does not require any additional configuration, just download the code from our git branch and use Gradle commands as usual, as it is stated in the Wave’s README file. To compile the code and run the server use:

    $ ./gradlew run
    

    Then, open the url http://localhost:9898/ with any browser. Once registered and logged in, use the “New Encrypted Wave” button to create a new encrypted wave.

    Encrypted Wave button

    In its URL you can see that the new wave’s identifier starts with “ew+” instead of “w+”, as it is usual in common waves. Also, a symmetric cryptographic key is attached, after the wave identifier, separated by an exclamation mark (!).

    Encrypted Wave URL

    The user must preserve that URL (or at least the key part) in order to open the wave again in the future.

    Future work

    AES-GCM assures both confidentiality and integrity for the messages written by the legitimate users, but an attacker who has the control over the server can still do a lot of harm:

    • Only the text of a document is encrypted, but not other parts like the content of its hiperlinks, for example. We should extend the encryption beyond the inserted characters.
    • The authentication could also be extended to all the components, not only text ones. Also, as the paper states that the history of a document should also be authenticaded (see appendix A.2).
    • It is unlikely to hide the structure and format of the document to the server, but we may be able to hide some more information, like user’s typing traits.

    On the other hand, it is not convenient having users handling symmetric keys by themselves. Keys should be encrypted and stored in the server as user data. To do so, we should derive a key from the user’s password using pbkdf2 (available in the WebCrypto API), to encrypt all the keys a user generates or registers for her waves.

    The users could use public key cryptograpy in order to being able to invite each other to edit in a wave document. This feature were part of the original plan of work for this Summer, but we have had not enough time to develop this part.

  • How can Apache Wave Operational Transformations be encrypted? (Part 1)

    Apache Wave is a software framework for online real-time collaborative edition. Similarly to Google Docs and Etherpad, it uses Operational Transformations to manage user collaboration. During this Google Summer of Code we are providing end-to-end encryption to wave documents. This means that only the people with access to the documents (those who have shared a symmetric key) can edit and retreive the contents of a document, protecting in that way the privacy of wave users.

    We base our GSOC work in this awesome paper that explains how some researchers encrypted Google Docs’ Operational Transformations. They used a simplified model to explain how document operations (called revisions in the paper) can be encrypted.

    In this blog post, we translate the paper insights to Apache Wave architecture, on which SwellRT is based. Let’s start with some basics of Apache Wave, and what is the structure of a document. If you are already familiar with them, you can just skip the next section.

    Documents and Operations

    As this fabulous blog post explains, Wave documents are well-formed XML documents that carry additional metadata called “annotations”. Annotations are (potentially-overlapping) key/value ranges which span across specific regions of the document and are usually used to style the text.

    Documents can be modified by (and also represented as) a document operation. A document operation is a series of components that move the cursor across the document. Those components can be of the type:

    • insertCharacters — Inserts the specified string at the current index.
    • deleteCharacters — Deletes the specified string from the current index.
    • openElement — Creates a new XML open-tag at the current index.
    • deleteOpenElement — Deletes the specified XML open-tag from the current index.
    • closeElement — Closes the first currently-open tag at the current index.
    • deleteCloseElement — Deletes the XML close-tag at the current index.
    • annotationBoundary — Defines the changes to any annotations (starting or ending) at the current index.
    • retain — Advances the index a specified number of items.

    So, the document operation retain(8); deleteCharacters('m'); insertCharacters('M'); retain(38); would remove the “m” at position 8 and add an “M” in the same position.

    Now we are ready to understand how to encrypt those operations.

    Encrypting Operations

    In the paper, clients share a symmetric key in order to encrypt and decrypt operations, and the server has no knowledge of it, so encryption and decryption of operations must be carried on the client side.

    The simplified model of the paper only considers two components (called primitive operations or mutations): insertion and deletion. Insertions insert the text v at a particular position p of the document, and deletions delete k characters forward or backward, depending on if k is positive or negative. Only the inserted text is encrypted and decrypted, remaining the document structure visible by the server.

    In Wave, we have a plenty more of components, as we have seen in the previous section. We are going to encrypt the text components only, remaining the XML format of the document in plaintext and visible by the server. The components that deal directly with the text are insertCharacters and deleteCharacters, so we are going to encrypt the text of their parameters.

    When a text is encrypted, its length is usually expanded due to randomization and integrity requirements (commonly known as initialization vector and MAC). This represents a challange in an Operational Transformation system, in which the length of the inserted and deleted text is something to take into account in order to compose and transform the operations.

    In the paper, it is solved by adding two more components at the end of each operation, an insertion of the extra information, and a deletion of that information. This is a clever approach that solves the problem, maintaining the cursor constant between the encrypted and non-encrypted operations.

    We can not use the same approach in Wave because its operations can not move the cursor backwards as it is done in the paper (delete something that the same operation is adding). Fortunately, we can obtain a similar effect storing the additional text in a particular operation without moving the cursor, I am talking about the annotations.

    Using annotations to store the ciphertext, we can encrypt any operation with any number of insertCharacters and deleteCharacters components, by concatenating their texts and storing its ciphertext in the annotation. For example, for the following operation:

    retain(8); deleteCharacters('m'); insertCharacters('M'); retain(38);

    We would concatenate all the texts as “mM”, encrypt it using the shared key obtaining something similar to:

    qOg89Tc6KjH9HyeC;6pQAClv6a8MnFoAHWsT06WIV;

    And we would add an annotation from the beginning to the end of the document that we will strip in the decryption process.

    annotationBoundary(changes: [
      cipher = null -> 'qOg89Tc6KjH9HyeC;6pQAClv6a8MnFoAHWsT06WIV;']);
    retain(8);
    deleteCharacters('*');
    insertCharacters('*');
    retain(38);
    annotationBoundary(ends: ['cipher']);
    

    Note that component parameters have been replaced by asterisks (*) in order to maintain a character there and do not leak any information of the underlying text that was there.

    So, now the rest of the clients can receive that operation and decrypt it by decrypting the text of the annotation and splitting its characters among the asterisks of the insertCharacters and deleteCharacters components.

    This encryption and decryption process is done just before and after the Wavelet Operations are sent to and received from the server, respectively, at a class called StaticChannelBinder.

    UPDATE 2017-07-25: Since Wave does not check if the deleted characters in the document correspond to the deleted characters received as a document operation component, we can optimize the algorithm by do not encrypting and decrypting the deleteCharacters operations. We only have to ofuscate the deleted text using a predefined character, such as the asterisk (*).

    The code that encrypts and decrypts operations is already available in our repo, and a video with the demo can be watched here:

    Next steps

    Now that we have designed the encryption and decryption of operations “on-the-fly”, we need clients to be able to decrypt a document that has already been written. In order to do it, the client needs all the ciphertext from which the current state of the document is formed of. The paper uses a smart approach to see which operations in the history are useful for the current state of the document, so the server could use it to send to the clients just the information they need in order to decrypt the document, reducing the bandwidth and the computational power needed to perform the entire replay of the whole document history. These are the things we are going to focus on the following weeks, and we will share them in another blog post when it will be ready.

    Stage 2 of GSOC has already begun.

  • Example on Operational Transformations

    In this article, we use the last example of Code Commit’s blog post in order to clarify it with specific operations.

    Path build

    Client and Server initial revision is 0, and the document state is empty.

    Client perspective

    • Alice types “Wave”.
    • Client sends InsertCharacters(“Wave”);, rev 0. (a). Document state = “Wave”.
    • Alice types “!”.
    • Client holds retain(4); InsertCharacters(“!!”);, rev 1. (b). Document state = “Wave!!”.
    • Client receives InsertCharacters(“World”);, rev 0. (c)
    • Client applies InsertCharacters(“World”); retain(6);, rev 2. (c’). Document state = “WorldWave!!
    • Alice deletes “World”.
    • Client applies DeleteCharacters(“World”); retain(6) , rev 3. (e). Document state = “Wave!!”
    • Client holds DeleteCharacters(“World”); retain(4); InsertCharacters(“!!”);, rev 4. ((e⦁b’)’)
    • Client receives InsertCharacters(“Hello “); retain(5);, rev 1. (d).
    • Client applies InsertCharacters(“Hello “); retain(6);, rev 4. (d’). Document state = “Hello Wave!!”.
    • Client receives ACK InsertCharacters(“Wave”);, rev 2. (a’).
    • Client sends DeleteCharacters(“World”); retain(4); InsertCharacters(“!!”);, rev 4. (e⦁b’)

    Server perspective

    • Bob sends InsertCharacters(“World”);, rev 0. (c)
    • Server applies it as it is. History = [c]. Document state = “World”.
    • Bob sends InsertCharacters(“Hello “); retain(5);, rev 1. (d)
    • Server applies it as it is. History = [c, d]. Document state = “Hello World”.
    • Alice sends InsertCharacters(“Wave”);, rev 0. (a)
    • Server applies retain(12); InsertCharacters(“Wave”);, rev 2. (a’). History = [c, d, a’]. Document State = “Hello WorldWave”.
    • Server receives DeleteCharacters(“World”); retain(4); InsertCharacters(“!!”);, rev 4. (e⦁b’)
    • Server applies retain(6); DeleteCharacters(“World”); retain(4); InsertCharacters(“!!”);, rev 4. ((e⦁b’)’). History = [c, d, a’, (e⦁b’)’]. Document State = “Hello Wave!!”
  • Setting up our environment

    We begin by downloading the code of WIAB (Wave in a Box). Apache Wave documentation tells us to download it using git:

    $ git clone https://git-wip-us.apache.org/repos/asf/incubator-wave.git
    

    We download the latest version of Eclipse IDE for Java Developers from its download page. It already includes the gradle plugin we need to import the project.

    To import the gradle project, we use File > Import… > Gradle > Gradle Project. We select the folder where we downloaded incubator-wave, and we use the default options. Once imported we should see three folders in the Package Explorer: incubator-wave (root directory), pst (a dependency), and wave (the code).

    Path build

    Now, we may want to disable the option Project > Build automatically.

    Right-clicking on wave-incubator folder, and then Build Path > Configure Build Path…, we should add the following folders in the different tabs:

    Path build

    Path build

    Path build

    Path build

    Now we can check everything is working with Project > Build Project.

  • Why is it important?

    “If you have nothing to hide, you have nothing to fear” reasoning only creates a climate of opression. Wanting to keep certain parts of your life private does not mean you are doing anything wrong. We need tools that protect our personal freedoms, and allow us to collaborate in better ways.