A Brighter Future For The Web? Exploring Solid

Gildas GarciaDecember 01, 2021

#tech4good#architecture

In 1989, Tim Berners-Lee invented the Web as a way to share information to support creativity and free invention through collaboration. It was meant to be open. However, our data is now siloed between private services. Besides, there is almost no connection between those services except, sometimes, for the big players. And when there is, it's rarely for the user's best interests. This leads to an increasing mistrust in those services.

In reaction, Tim founded a company called Inrupt. He also came up with a potential solution, giving control back to the users over their data, while allowing different sources of data to be connected. Enter Solid, a new technology to organize data, applications, and identities on the web, all based on web standards. You can find more details about it at solidproject.org.

What Solid Is

Solid is a new protocol built on HTTP, leveraging standard data formats and vocabulary. It allows people to store any kind of data securely, and lets them decide which part of their data they want to share. Thanks to linked data, it is interoperable and discoverable through included semantic data. I used a lot of words that might not be familiar to everyone, so I'll now try to explain them in detail.

Defining Data

One of the Solid foundations is linked data, and more broadly semantic data. Basically, any piece of data can be described with triples of metadata that includes a subject, a type, and a value.

For example, I might have data about myself available at a URI such as https://gildasgarcia.inrupt.net/profile/card. It may include several triples such as:

My full name
- subject: My card at https://gildasgarcia.inrupt.net/profile/card
- type: A full name (https://www.w3.org/2006/vcard/ns-2006.html#n)
- value: Gildas Garcia
My profile picture
- subject: My card at https://gildasgarcia.inrupt.net/profile/card
- type: A photo (https://www.w3.org/2006/vcard/ns-2006.html#photo)
- value: An URL to my profile picture

Any value in a triple can be a URI leading either to another triple, a document, a schema defining the data, or a plain value such as a string.

The schemas are parts of vocabularies, sometimes very broad, other times specific to a domain. Some of them are available at https://schema.org/docs/schemas.html, and there are many others such as http://xmlns.com/foaf/spec/ or https://www.w3.org/2006/vcard/ns-2006.html.

The data can be serialized into multiple formats such as XML. However, many are too verbose so more formats were introduced such as Turtle or JSON-LD. Those formats are readable by machines, which makes the data discoverable.

Defining Identities

For identities, Solid leverages, improves, and introduces existing or new standards such as Open ID, OAuth and Web ID. Just like any data, a WebID is an HTTP URI that targets a document describing an identity (people, organizations, devices, etc.). This document also contains triples defining the identity.

Storing Data

Another Solid principle is people should own their data. They may choose where and how to store it. They could have their own servers or may decide to trust someone to host their data inside a decentralized data store, which they call a POD. There are already a few POD providers such as Inrupt itself or the Solid community.

Accessing Data

I mentioned that users get to choose which parts of their data they share and with whom. Indeed, they may decide that some of their data is public, and other pieces are private. To access private data, agents (people or machines) must first authenticate. They may then request access to the private data and users may choose to give read or write access.

The mechanisms involved are once again standardized with the Web Access Control specification. In a nutshell, WAC allows users to give access to a document specified by its URI to an agent specified its WebID (also a URI).

Exploration: An E-Commerce Administration With Solid

Now that we have some basic understanding of what Solid is, let's build something! For my exploration, I decided to implement part of the e-commerce demo of react-admin. The goal is to store the e-commerce data on my personal POD. I limited myself to products and categories as it should be enough to try most of react-admin features, including relationships.

Screencast of the application

I started by creating a POD at https://inrupt.net/. By default, it provides a private and a public storage space. For the purpose of this demo, I used the private one.

Fortunately, Inrupt provides several JavaScript libraries to deal with Solid requests, including authentication:

@inrupt/solid-client-authn-browser deals with the authentication part. It also provides a preconfigured fetch to use after being authenticated.
@inrupt/solid-client deals with accessing data and managing permissions on data stored in Solid Pods.
@inrupt/solid-ui-react provides components and hooks to ease the usage of the two previous libraries in React. I didn't need this library for this project.

Authentication

The first thing I did was to create a custom Login page to handle the authentication process. It contains a form asking the user for the URL of their authentication provider and redirects them to it. Once the user authenticates using the provider UI, they are redirected to the application.

import { getDefaultSession, login } from '@inrupt/solid-client-authn-browser';

const handleSubmit = async event => {
    event.preventDefault();

    try {
        if (!getDefaultSession().info.isLoggedIn) {
            await login({
                oidcIssuer,
                redirect: window.location.host,
            });
        }
    } catch (error) {
        console.error(error);
        notify(error);
    }
};

Back in my application login page, I can check for the authentication status and notify react-admin:

import { handleIncomingRedirect } from '@inrupt/solid-client-authn-browser';
import { useLogin } from 'react-admin';

const login = useLogin();

useEffect(() => {
    handleIncomingRedirect({
        redirectUrl: window.location.host,
    }).then(info => {
        if (info && info.isLoggedIn) {
            login();
        }
    });
}, [login]);

Finally, I created an authProvider, the object responsible for handling authentication and authorization in react-admin.

import { getDefaultSession, logout } from '@inrupt/solid-client-authn-browser';

export const authProvider = {
    async checkAuth() {
        const session = getDefaultSession();
        const isLoggedIn = session.info.isLoggedIn;

        if (isLoggedIn) {
            return Promise.resolve();
        }
        return Promise.reject();
    },

    login() {
        const session = getDefaultSession();
        const isLoggedIn = session.info.isLoggedIn;

        if (isLoggedIn) {
            return Promise.resolve();
        }
        return Promise.reject();
    },
    logout() {
        return logout();
    },
};

You might know that react-admin can display user information (name and avatar) in the Appbar. It's just a matter of implementing the getIdentity in the authProvider. Besides, this is also an opportunity to see how to get data from the POD:

import {
    getSolidDataset,
    getStringNoLocale,
    getThing,
    getUrl,
} from '@inrupt/solid-client';
import {
    getDefaultSession,
    logout,
    fetch,
} from '@inrupt/solid-client-authn-browser';
import { VCARD } from '@inrupt/vocab-common-rdf';

export const authProvider = {
    //...
    async getIdentity() {
        // Get the solid session, needed to make authenticated requests to the POD
        const session = getDefaultSession();
        // Retrieve the data stored at the user webId using the `fetch` provided
        // by @inrupt/solid-client-authn-browser
        const dataset = await getSolidDataset(session.info.webId, { fetch });

        // Get the "thing" stored in this dataSet under the user webId key
        const profile = getThing(dataset, session.info.webId);

        // Note that you have to know the type of the data you want to extract.
        // Here it's a non localized string.
        // It means there aren't mutiple values depending on locale.
        // That string is used as the fullName.
        // VCARD.fn is a URI to the fullName schema definition
        const fullName = getStringNoLocale(profile, VCARD.fn);
        // Same here but we know it is a URL that should be used as a photo
        const avatar = getUrl(profile, VCARD.hasPhoto);

        return {
            id: session.info.webId,
            fullName,
            avatar,
        };
    },
};

Getting The Data

It's time to introduce the react-admin resources. I started with the products. I added the list, create, edit and show views. I won't include the components here as they are what you expect from a simple react-admin application. You can explore the repository to see the details.

The dataProvider however, required a bit of work. First, I need to explain how I stored my data in my POD. I chose to store the records in a file named like their resource, such as products.ttl. The ttl extension means that data is written using the Turtle syntax I mentioned earlier. To have something to show in my list, I wrote some data manually in this file using the website provided by Inrupt for my POD. This is what a product looks like:

<#ae4f8fce-248a-481d-b575-50bb76b53565> a <http://schema.org/Product>;
    <http://schema.org/identifier> 0;
    <http://schema.org/productID> "Cat Nose";
    <http://schema.org/description> "Dolorem corrupti et non ipsam nobis officiis est. Voluptatem ab vel nihil. Est aut non autem repellat hic accusantium molestias.";
    <http://schema.org/category> <https://gildasgarcia.inrupt.net/private/categories.ttl#064c74dd-d758-4995-9be0-e857ed2fdaa5>;
    <http://schema.org/image> <https://marmelab.com/posters/animals-1.jpeg>;
    <http://schema.org/height> 32.04;
    <http://schema.org/width> 32.93.

The first line defines the subject for the triple I mentioned earlier. Its first item is its identifier, and the last item is its type - in this case, a URI to the Product definition on schema.org.

Thanks to Turtle syntax, all lines after the first one that are indented have their subject automatically set to the product. They just have to specify a type as a URI to a schema definition and a value.

Besides, I can choose to store those files in either the public or private storage on my POD. I chose the private one here.

Now, as you may recall from the authProvider, I can get a dataSet at an URI and its content with the following code:

import { getSolidDataset, getThingAll } from '@inrupt/solid-client';
import { fetch } from '@inrupt/solid-client-authn-browser';

const getResourceData = async (baseUrl, resource) => {
    const datasetUri = `${baseUrl.origin}/private/${resource}.ttl`;
    const resourceDataset = await getSolidDataset(datasetUri, { fetch });
    const things = getThingAll(resourceDataset);

    return things;
};

And you might think we're done. However, things here, is actually an array of special objects that contains quads. A quad is a triple with an additional graph property that I won't cover here. This is the quad for the description property of one product:

{
    "termType": "Quad",
    "subject": {
        "termType": "NamedNode",
        "value": "https://gildasgarcia.inrupt.net/private/products.ttl#00183bca-d50a-4f89-a497-413e3139a476"
    },
    "predicate": {
        "termType": "NamedNode",
        "value": "http://schema.org/description"
    },
    "object": {
        "termType": "Literal",
        "value": "Praesentium iure ad. Omnis atque autem accusantium. Aspernatur et repellat illo laudantium.",
        "language": "",
        "datatype": {
            "termType": "NamedNode",
            "value": "http://www.w3.org/2001/XMLSchema#string"
        }
    },
    "graph": {
        "termType": "DefaultGraph",
        "value": ""
    }
}

In order to use quads in react-admin, I need to write some mapping functions that parse those quads and return plain old JavaScript objects, just like I did for the profile in the authProvider. It means I have to know the type of the data I want to read. For example:

const getProductFromThing = (thing, productDataSetUri) => ({
    id: asUrl(thing, productDataSetUri),
    identifier: getInteger(thing, schema.identifier),
    reference: getStringNoLocale(thing, schema.productID),
    description: getStringNoLocale(thing, schema.description),
    category_id: getUrl(thing, schema.category),
    image: getUrl(thing, schema.image),
    height: getDecimal(thing, schema.height),
    width: getDecimal(thing, schema.width),
});

A quick note here: As we do have schemas defining our data, it would be possible to write a smart parser relying on the schema to infer the type.

Here's what a basic dataProvider.getList function might look like:

const dataProvider = {
    async getList(resource, params) {
        const things = await getResourceData(resource);

        if (resource === 'products') {
            const products = things.map(thing => getProductFromThing(thing));

            return {
                data: products,
                total: products.length,
            };
        }
    },
};

If you're familiar with react-admin, you might wonder where are the pagination, sorting, and filtering mechanisms. Here is the bad news: PODs don't provide those mechanisms. So yes, I'm actually returning all the products here, not paginated nor sorted nor filtered. It also means that I'm potentially downloading a huge file containing thousands of products at each getList call.

Querying Data With SPARQL

It's time to introduce a new piece of technology: SPARQL. SPARQL is a declarative language for querying linked data in a RDF store. Although it has some resemblance with SQL, it's not the same beast at all. Here is an example SPARQL query:

SELECT ?description
WHERE {
  <http://schema.org/Product> <http://schema.org/description> ?description .
}

I know it looks weird. Let's start with the WHERE clause. It's used to define which part of the resource I want to retrieve. Here, I want the description of a product and I will reference it as ?description in my SELECT clause.

In the SELECT clause, I can specify what will be returned by the query and I may reference variables I declared in the WHERE clause.

Here is a more complex example

PREFIX s: <http://schema.org/>
SELECT *
WHERE {
  ?s s:identifier ?identifier .
  ?s s:reference ?reference .
  ?s s:description ?description .
  ?s s:height ?width .
}
ORDER BY ?reference
OFFSET 25
LIMIT 25

The first line defines a shortcut allowing me to avoid writing http://schema.org/THING every time I want something from this namespace. For example s:identifier instead of <http://schema.org/identifier>.

The SELECT clause specifies that I want all defined variables in my query results. I can provide an ORDER BY clause which is ascending by default but I could have written ORDER BY ASC(?reference).

Finally, I'm applying pagination parameters using OFFSET and LIMIT.

There's a lot more to it, like COUNT functions, etc.

However, at the time of writing, Inrupt SOLID PODs do not support SPARQL queries. It means that I can only use SPARQL queries on a local dataset. In a future article, I'll explore how to create our own POD server with SPARQL support. In the meantime, I ended up leveraging FakeRest collections to implement a local database populated with the POD data. On the first query, I retrieve the data from the PODs, initialize a new Collection, and then use FakeRest features to paginate, sort, and filter data.

Writing Data

Writing data (create, update, delete) is very similar to reading it. Let's start with creation.

Creating Records

To create a new thing in a dataset, I have to call the createThing function provided by the @inrupt/solid-client. It accepts an optional name property allowing me to control its final URI:

import { v4 as uuid } from 'uuid';
import { createThing } from '@inrupt/solid-client';

const createRecord = () => {
    const name = uuid();
    const thing = createThing({ name });
};

Now that I have a Thing, I can set its properties by calling functions similar to those I used before to read properties (getStringNoLocale will be addStringNoLocale, etc.). There's a catch, though: the addXXX functions are pure, they don't modify the Thing I pass them but return a new Thing with the property set:

import { v4 as uuid } from 'uuid';
import {
    createThing,
    addStringNoLocale,
    addUrl,
    addDecimal,
} from '@inrupt/solid-client';

const createRecord = (resource, data) => {
    const name = uuid();

    let thing = createThing({ name });

    if (resource === 'products') {
        thing = addStringNoLocale(thing, schema.productID, data.reference);
        thing = addStringNoLocale(thing, schema.description, data.description);
        thing = addStringNoLocale(thing, schema.category, data.category_id);
        thing = addUrl(thing, schema.image, data.image);
        thing = addDecimal(thing, schema.height, data.height);
        thing = addDecimal(thing, schema.width, data.width);
    }

    return thing;
};

Calm down Functional Programming purists! I know it could be written in a more elegant way. It's just easier to understand for this article.

Now that my Thing is ready, I still have to add it to the dataset:

const dataProvider = {
	// ...
	create(resource, params) {
		const datasetUri = `${baseUrl.origin}/private/${resource}.ttl`;
    	const resourceDataset = await getSolidDataset(datasetUri, { fetch });
		const newThing = createRecord(resource, params.data);

		// Here we add the new thing to the dataset
		const updatedDataset = setThing(resourceDataset, newThing);
		// And here we persist it on the POD
		await saveSolidDatasetAt(datasetUri, updatedDataset, { fetch });
		return params.data;
	}
}

Updating Records

Updating records is very similar to creating one, except you have first to get the thing and use setXXX functions instead of addXXX ones.

import { v4 as uuid } from 'uuid';
import {
    createThing,
    setStringNoLocale,
    setUrl,
    setDecimal,
} from '@inrupt/solid-client';

const updateRecord = (resource, thing, data) => {
	if (resource === 'products') {
		const newThing = setStringNoLocale(thing, schema.productID, data.reference);
		newThing = setStringNoLocale(newThing, schema.description, data.description);
        newThing = setStringNoLocale(thing, schema.category, data.category_id);
		newThing = setUrl(newThing, schema.image, data.image);
		newThing = setDecimal(newThing, schema.height, data.height);
		newThing = setDecimal(newThing, schema.width, data.width);
		return newThing;
	}
};

const dataProvider = {
	// ...
	update(resource, params) {
		const datasetUri = `${baseUrl.origin}/private/${resource}.ttl`;
    	const resourceDataset = await getSolidDataset(datasetUri, { fetch });
		const thing = getThing(resourceDataset, params.id);
		const newThing = updateRecord(resource, thing, params.data);
		// Here we update the new thing to the dataset
		const updatedDataset = setThing(resourceDataset, newThing);
		await saveSolidDatasetAt(datasetUri, updatedDataset, { fetch });
		return params.data;
	}
}

Conclusion

I started to write this article at the end of 2020. Every time I came back to this exploration, a lot of things had changed in the Solid specification or libraries. Some libraries were completely replaced, so I had to rewrite most of the code.

Besides, although I haven't looked for another provider, PODs provided by Inrupt do not support the features needed to build an application that won't have to download a complete dataset to work. However, as I explained earlier, I may explore this further by hosting my own Solid Server.

It means we may have a potential issue here though. Although users may store their data to the provider of their choice, your application may have to access it differently depending on the features supported by the provider.

Finally, you may have noticed there are many new concepts, new languages, and new libraries. And they change fast, sometimes breaking things!

I'm still hyped by the potential of those technologies, the impact they may have on the internet economy, the promise of interoperability, and the liberty they may offer to users. However, it seems we'll still have to wait before we can use it in our applications.

Did you like this article? Share it!