A Brighter Future For The Web? Exploring Solid
In 1989, Tim Berners-Lee invented the Web as a way to share information to support creativity and free invention through collaboration. It was meant to be open. However, our data is now siloed between private services. Besides, there is almost no connection between those services except, sometimes, for the big players. And when there is, it's rarely for the user's best interests. This leads to an increasing mistrust in those services.
In reaction, Tim founded a company called Inrupt. He also came up with a potential solution, giving control back to the users over their data, while allowing different sources of data to be connected. Enter Solid, a new technology to organize data, applications, and identities on the web, all based on web standards. You can find more details about it at solidproject.org.
What Solid Is
Solid is a new protocol built on HTTP, leveraging standard data formats and vocabulary. It allows people to store any kind of data securely, and lets them decide which part of their data they want to share. Thanks to linked data, it is interoperable and discoverable through included semantic data. I used a lot of words that might not be familiar to everyone, so I'll now try to explain them in detail.
Defining Data
One of the Solid foundations is linked data, and more broadly semantic data. Basically, any piece of data can be described with triples of metadata that includes a subject, a type, and a value.
For example, I might have data about myself available at a URI such as https://gildasgarcia.inrupt.net/profile/card
. It may include several triples such as:
- My full name
- subject: My card at https://gildasgarcia.inrupt.net/profile/card
- type: A full name (https://www.w3.org/2006/vcard/ns-2006.html#n)
- value: Gildas Garcia
- My profile picture
- subject: My card at https://gildasgarcia.inrupt.net/profile/card
- type: A photo (https://www.w3.org/2006/vcard/ns-2006.html#photo)
- value: An URL to my profile picture
Any value in a triple can be a URI leading either to another triple, a document, a schema defining the data, or a plain value such as a string.
The schemas are parts of vocabularies, sometimes very broad, other times specific to a domain. Some of them are available at https://schema.org/docs/schemas.html, and there are many others such as http://xmlns.com/foaf/spec/ or https://www.w3.org/2006/vcard/ns-2006.html.
The data can be serialized into multiple formats such as XML. However, many are too verbose so more formats were introduced such as Turtle or JSON-LD. Those formats are readable by machines, which makes the data discoverable.
Defining Identities
For identities, Solid leverages, improves, and introduces existing or new standards such as Open ID, OAuth and Web ID. Just like any data, a WebID is an HTTP URI that targets a document describing an identity (people, organizations, devices, etc.). This document also contains triples defining the identity.
Storing Data
Another Solid principle is people should own their data. They may choose where and how to store it. They could have their own servers or may decide to trust someone to host their data inside a decentralized data store, which they call a POD. There are already a few POD providers such as Inrupt itself or the Solid community.
Accessing Data
I mentioned that users get to choose which parts of their data they share and with whom. Indeed, they may decide that some of their data is public, and other pieces are private. To access private data, agents (people or machines) must first authenticate. They may then request access to the private data and users may choose to give read or write access.
The mechanisms involved are once again standardized with the Web Access Control specification. In a nutshell, WAC allows users to give access to a document specified by its URI to an agent specified its WebID (also a URI).
Exploration: An E-Commerce Administration With Solid
Now that we have some basic understanding of what Solid is, let's build something! For my exploration, I decided to implement part of the e-commerce demo of react-admin. The goal is to store the e-commerce data on my personal POD. I limited myself to products and categories as it should be enough to try most of react-admin features, including relationships.
I started by creating a POD at https://inrupt.net/. By default, it provides a private and a public storage space. For the purpose of this demo, I used the private one.
Fortunately, Inrupt provides several JavaScript libraries to deal with Solid requests, including authentication:
- @inrupt/solid-client-authn-browser deals with the authentication part. It also provides a preconfigured
fetch
to use after being authenticated. - @inrupt/solid-client deals with accessing data and managing permissions on data stored in Solid Pods.
- @inrupt/solid-ui-react provides components and hooks to ease the usage of the two previous libraries in React. I didn't need this library for this project.
Authentication
The first thing I did was to create a custom Login page to handle the authentication process. It contains a form asking the user for the URL of their authentication provider and redirects them to it. Once the user authenticates using the provider UI, they are redirected to the application.
import { getDefaultSession, login } from '@inrupt/solid-client-authn-browser';
const handleSubmit = async event => {
event.preventDefault();
try {
if (!getDefaultSession().info.isLoggedIn) {
await login({
oidcIssuer,
redirect: window.location.host,
});
}
} catch (error) {
console.error(error);
notify(error);
}
};
Back in my application login page, I can check for the authentication status and notify react-admin:
import { handleIncomingRedirect } from '@inrupt/solid-client-authn-browser';
import { useLogin } from 'react-admin';
const login = useLogin();
useEffect(() => {
handleIncomingRedirect({
redirectUrl: window.location.host,
}).then(info => {
if (info && info.isLoggedIn) {
login();
}
});
}, [login]);
Finally, I created an authProvider, the object responsible for handling authentication and authorization in react-admin.
import { getDefaultSession, logout } from '@inrupt/solid-client-authn-browser';
export const authProvider = {
async checkAuth() {
const session = getDefaultSession();
const isLoggedIn = session.info.isLoggedIn;
if (isLoggedIn) {
return Promise.resolve();
}
return Promise.reject();
},
login() {
const session = getDefaultSession();
const isLoggedIn = session.info.isLoggedIn;
if (isLoggedIn) {
return Promise.resolve();
}
return Promise.reject();
},
logout() {
return logout();
},
};
You might know that react-admin can display user information (name and avatar) in the Appbar. It's just a matter of implementing the getIdentity
in the authProvider
. Besides, this is also an opportunity to see how to get data from the POD:
import {
getSolidDataset,
getStringNoLocale,
getThing,
getUrl,
} from '@inrupt/solid-client';
import {
getDefaultSession,
logout,
fetch,
} from '@inrupt/solid-client-authn-browser';
import { VCARD } from '@inrupt/vocab-common-rdf';
export const authProvider = {
//...
async getIdentity() {
// Get the solid session, needed to make authenticated requests to the POD
const session = getDefaultSession();
// Retrieve the data stored at the user webId using the `fetch` provided
// by @inrupt/solid-client-authn-browser
const dataset = await getSolidDataset(session.info.webId, { fetch });
// Get the "thing" stored in this dataSet under the user webId key
const profile = getThing(dataset, session.info.webId);
// Note that you have to know the type of the data you want to extract.
// Here it's a non localized string.
// It means there aren't mutiple values depending on locale.
// That string is used as the fullName.
// VCARD.fn is a URI to the fullName schema definition
const fullName = getStringNoLocale(profile, VCARD.fn);
// Same here but we know it is a URL that should be used as a photo
const avatar = getUrl(profile, VCARD.hasPhoto);
return {
id: session.info.webId,
fullName,
avatar,
};
},
};
Getting The Data
It's time to introduce the react-admin resources. I started with the products. I added the list, create, edit and show views. I won't include the components here as they are what you expect from a simple react-admin application. You can explore the repository to see the details.
The dataProvider
however, required a bit of work. First, I need to explain how I stored my data in my POD. I chose to store the records in a file named like their resource, such as products.ttl
. The ttl
extension means that data is written using the Turtle syntax I mentioned earlier. To have something to show in my list, I wrote some data manually in this file using the website provided by Inrupt for my POD. This is what a product looks like:
<#ae4f8fce-248a-481d-b575-50bb76b53565> a <http://schema.org/Product>;
<http://schema.org/identifier> 0;
<http://schema.org/productID> "Cat Nose";
<http://schema.org/description> "Dolorem corrupti et non ipsam nobis officiis est. Voluptatem ab vel nihil. Est aut non autem repellat hic accusantium molestias.";
<http://schema.org/category> <https://gildasgarcia.inrupt.net/private/categories.ttl#064c74dd-d758-4995-9be0-e857ed2fdaa5>;
<http://schema.org/image> <https://marmelab.com/posters/animals-1.jpeg>;
<http://schema.org/height> 32.04;
<http://schema.org/width> 32.93.
The first line defines the subject for the triple I mentioned earlier. Its first item is its identifier, and the last item is its type - in this case, a URI to the Product definition on schema.org
.
Thanks to Turtle syntax, all lines after the first one that are indented have their subject automatically set to the product. They just have to specify a type as a URI to a schema definition and a value.
Besides, I can choose to store those files in either the public or private storage on my POD. I chose the private one here.
Now, as you may recall from the authProvider
, I can get a dataSet at an URI and its content with the following code:
import { getSolidDataset, getThingAll } from '@inrupt/solid-client';
import { fetch } from '@inrupt/solid-client-authn-browser';
const getResourceData = async (baseUrl, resource) => {
const datasetUri = `${baseUrl.origin}/private/${resource}.ttl`;
const resourceDataset = await getSolidDataset(datasetUri, { fetch });
const things = getThingAll(resourceDataset);
return things;
};
And you might think we're done. However, things
here, is actually an array of special objects that contains quads. A quad is a triple with an additional graph property that I won't cover here. This is the quad for the description
property of one product:
{
"termType": "Quad",
"subject": {
"termType": "NamedNode",
"value": "https://gildasgarcia.inrupt.net/private/products.ttl#00183bca-d50a-4f89-a497-413e3139a476"
},
"predicate": {
"termType": "NamedNode",
"value": "http://schema.org/description"
},
"object": {
"termType": "Literal",
"value": "Praesentium iure ad. Omnis atque autem accusantium. Aspernatur et repellat illo laudantium.",
"language": "",
"datatype": {
"termType": "NamedNode",
"value": "http://www.w3.org/2001/XMLSchema#string"
}
},
"graph": {
"termType": "DefaultGraph",
"value": ""
}
}
In order to use quads in react-admin, I need to write some mapping functions that parse those quads and return plain old JavaScript objects, just like I did for the profile in the authProvider
. It means I have to know the type of the data I want to read. For example:
const getProductFromThing = (thing, productDataSetUri) => ({
id: asUrl(thing, productDataSetUri),
identifier: getInteger(thing, schema.identifier),
reference: getStringNoLocale(thing, schema.productID),
description: getStringNoLocale(thing, schema.description),
category_id: getUrl(thing, schema.category),
image: getUrl(thing, schema.image),
height: getDecimal(thing, schema.height),
width: getDecimal(thing, schema.width),
});
A quick note here: As we do have schemas defining our data, it would be possible to write a smart parser relying on the schema to infer the type.
Here's what a basic dataProvider.getList
function might look like:
const dataProvider = {
async getList(resource, params) {
const things = await getResourceData(resource);
if (resource === 'products') {
const products = things.map(thing => getProductFromThing(thing));
return {
data: products,
total: products.length,
};
}
},
};
If you're familiar with react-admin, you might wonder where are the pagination, sorting, and filtering mechanisms. Here is the bad news: PODs don't provide those mechanisms. So yes, I'm actually returning all the products here, not paginated nor sorted nor filtered. It also means that I'm potentially downloading a huge file containing thousands of products at each getList
call.
Querying Data With SPARQL
It's time to introduce a new piece of technology: SPARQL. SPARQL is a declarative language for querying linked data in a RDF store. Although it has some resemblance with SQL, it's not the same beast at all. Here is an example SPARQL query:
SELECT ?description
WHERE {
<http://schema.org/Product> <http://schema.org/description> ?description .
}
I know it looks weird. Let's start with the WHERE
clause. It's used to define which part of the resource I want to retrieve. Here, I want the description of a product and I will reference it as ?description
in my SELECT
clause.
In the SELECT
clause, I can specify what will be returned by the query and I may reference variables I declared in the WHERE
clause.
Here is a more complex example
PREFIX s: <http://schema.org/>
SELECT *
WHERE {
?s s:identifier ?identifier .
?s s:reference ?reference .
?s s:description ?description .
?s s:height ?width .
}
ORDER BY ?reference
OFFSET 25
LIMIT 25
The first line defines a shortcut allowing me to avoid writing http://schema.org/THING every time I want something from this namespace. For example s:identifier
instead of <http://schema.org/identifier>
.
The SELECT
clause specifies that I want all defined variables in my query results.
I can provide an ORDER BY
clause which is ascending by default but I could have written ORDER BY ASC(?reference)
.
Finally, I'm applying pagination parameters using OFFSET
and LIMIT
.
There's a lot more to it, like COUNT
functions, etc.
However, at the time of writing, Inrupt SOLID PODs do not support SPARQL queries. It means that I can only use SPARQL queries on a local dataset. In a future article, I'll explore how to create our own POD server with SPARQL support. In the meantime, I ended up leveraging FakeRest collections to implement a local database populated with the POD data. On the first query, I retrieve the data from the PODs, initialize a new Collection
, and then use FakeRest features to paginate, sort, and filter data.
Writing Data
Writing data (create, update, delete) is very similar to reading it. Let's start with creation.
Creating Records
To create a new thing in a dataset, I have to call the createThing
function provided by the @inrupt/solid-client
. It accepts an optional name
property allowing me to control its final URI:
import { v4 as uuid } from 'uuid';
import { createThing } from '@inrupt/solid-client';
const createRecord = () => {
const name = uuid();
const thing = createThing({ name });
};
Now that I have a Thing
, I can set its properties by calling functions similar to those I used before to read properties (getStringNoLocale
will be addStringNoLocale
, etc.). There's a catch, though: the addXXX
functions are pure, they don't modify the Thing
I pass them but return a new Thing
with the property set:
import { v4 as uuid } from 'uuid';
import {
createThing,
addStringNoLocale,
addUrl,
addDecimal,
} from '@inrupt/solid-client';
const createRecord = (resource, data) => {
const name = uuid();
let thing = createThing({ name });
if (resource === 'products') {
thing = addStringNoLocale(thing, schema.productID, data.reference);
thing = addStringNoLocale(thing, schema.description, data.description);
thing = addStringNoLocale(thing, schema.category, data.category_id);
thing = addUrl(thing, schema.image, data.image);
thing = addDecimal(thing, schema.height, data.height);
thing = addDecimal(thing, schema.width, data.width);
}
return thing;
};
Calm down Functional Programming purists! I know it could be written in a more elegant way. It's just easier to understand for this article.
Now that my Thing
is ready, I still have to add it to the dataset:
const dataProvider = {
// ...
create(resource, params) {
const datasetUri = `${baseUrl.origin}/private/${resource}.ttl`;
const resourceDataset = await getSolidDataset(datasetUri, { fetch });
const newThing = createRecord(resource, params.data);
// Here we add the new thing to the dataset
const updatedDataset = setThing(resourceDataset, newThing);
// And here we persist it on the POD
await saveSolidDatasetAt(datasetUri, updatedDataset, { fetch });
return params.data;
}
}
Updating Records
Updating records is very similar to creating one, except you have first to get the thing and use setXXX
functions instead of addXXX
ones.
import { v4 as uuid } from 'uuid';
import {
createThing,
setStringNoLocale,
setUrl,
setDecimal,
} from '@inrupt/solid-client';
const updateRecord = (resource, thing, data) => {
if (resource === 'products') {
const newThing = setStringNoLocale(thing, schema.productID, data.reference);
newThing = setStringNoLocale(newThing, schema.description, data.description);
newThing = setStringNoLocale(thing, schema.category, data.category_id);
newThing = setUrl(newThing, schema.image, data.image);
newThing = setDecimal(newThing, schema.height, data.height);
newThing = setDecimal(newThing, schema.width, data.width);
return newThing;
}
};
const dataProvider = {
// ...
update(resource, params) {
const datasetUri = `${baseUrl.origin}/private/${resource}.ttl`;
const resourceDataset = await getSolidDataset(datasetUri, { fetch });
const thing = getThing(resourceDataset, params.id);
const newThing = updateRecord(resource, thing, params.data);
// Here we update the new thing to the dataset
const updatedDataset = setThing(resourceDataset, newThing);
await saveSolidDatasetAt(datasetUri, updatedDataset, { fetch });
return params.data;
}
}
Conclusion
I started to write this article at the end of 2020. Every time I came back to this exploration, a lot of things had changed in the Solid specification or libraries. Some libraries were completely replaced, so I had to rewrite most of the code.
Besides, although I haven't looked for another provider, PODs provided by Inrupt do not support the features needed to build an application that won't have to download a complete dataset to work. However, as I explained earlier, I may explore this further by hosting my own Solid Server.
It means we may have a potential issue here though. Although users may store their data to the provider of their choice, your application may have to access it differently depending on the features supported by the provider.
Finally, you may have noticed there are many new concepts, new languages, and new libraries. And they change fast, sometimes breaking things!
I'm still hyped by the potential of those technologies, the impact they may have on the internet economy, the promise of interoperability, and the liberty they may offer to users. However, it seems we'll still have to wait before we can use it in our applications.