Foreign Data Wrapper: Postgresify all the things!
Did you know that we can query any data source directly with PostgreSQL? With the Foreign Data Wrapper (FDW), we can create a virtual table that represents an external data source, such as a REST API, and query it using SQL. This lets us integrate external data sources seamlessly into our PostgreSQL database.
Querying a CSV File

Let’s start with a common use case. Suppose we have a CSV file that we need to import into our database. Instead of writing a custom script to parse the CSV file and insert the data, we can use the FDW to create a foreign table that maps directly to the CSV file.
If the CSV file is already properly formatted, the COPY command is the most efficient way to import it. But if we need to transform the data before importing, or only want to import a subset of the data, we can create a foreign table that maps to just the relevant columns and rows.
How does this work? We can use the file_fdw extension that comes with PostgreSQL to create a foreign table that maps to the CSV file. Here is an example:
CREATE EXTENSION IF NOT EXISTS file_fdw;CREATE SERVER csv_server FOREIGN DATA WRAPPER file_fdw;CREATE FOREIGN TABLE csv_table ( id integer, name text, age integer) SERVER csv_server OPTIONS ( filename '/path/to/your/file.csv', format 'csv', header 'true');Once the foreign table is created, we can query it just like any other table in our database:
SELECT * FROM csv_table;We can even join it with other tables in our database:
SELECT * FROM csv_tableJOIN other_table ON csv_table.id = other_table.id;Joins can be very inefficient if the CSV file is large, so use them with caution.
We can also insert data from the CSV file into another table in our database:
INSERT INTO my_table (id, name, age)SELECT id, name, age FROM csv_table;The file_fdw extension is read-only. We cannot use it to insert, update, or delete data in the CSV file.
Querying Another PostgreSQL Database

We can also use the FDW to query another PostgreSQL database. This is useful if we need to migrate data from one database to another, or if we want to query data from multiple databases in a single query.
To do this, we use the postgres_fdw extension that comes built in with PostgreSQL.
Here is an example of how to create a foreign table that maps to another PostgreSQL database:
-- First enable the postgres_fdw extensionCREATE EXTENSION IF NOT EXISTS postgres_fdw;-- Then create a server that represents the foreign databaseCREATE SERVER foreign_server FOREIGN DATA WRAPPER postgres_fdw OPTIONS ( host 'foreign_host', dbname 'foreign_db', port '5432');-- Then create a user mapping that maps the current user to a user in the-- foreign databaseCREATE USER MAPPING FOR current_user SERVER foreign_server OPTIONS ( user 'foreign_user', password 'foreign_password');-- Finally create a foreign table that maps to a table in the foreign-- databaseCREATE FOREIGN TABLE foreign_table ( id integer, name text, age integer) SERVER foreign_server OPTIONS ( schema_name 'public', table_name 'foreign_table');Once the foreign table is created, we can query it just like any other table in our database:
SELECT * FROM foreign_table;We can also join it with other tables in our database:
SELECT * FROM foreign_tableJOIN other_table ON foreign_table.id = other_table.id;We can even update data in the foreign table:
UPDATE foreign_tableSET name = 'New Name'WHERE id = 1;Querying A NoSQL Database

What if the other database is not PostgreSQL? Maybe it’s not even a relational database, but a NoSQL database like MongoDB. No problem! There are FDW extensions for many different types of databases.
We can also create our own custom FDW extension using Multicorn, a Python library that lets us build FDW extensions for any data source.
To see how this works in practice, let’s create a custom FDW extension that maps to a MongoDB collection.
First, define a Python class that inherits from multicorn’s ForeignDataWrapper and implements the necessary methods to fetch data from the data source:
from multicorn import ForeignDataWrapperfrom multicorn.utils import log_to_postgresfrom pymongo import MongoClient
class MongoForeignDataWrapper(ForeignDataWrapper): # Initialize the FDW with mongo connection information def __init__(self, options, columns): super(MongoForeignDataWrapper, self).__init__(options, columns) self.dbUri = options.get('dbUri') self.columns = columns
# Helper method to create MongoDB client def _get_client(self): """Create MongoDB client with optional authentication""" return MongoClient(self.dbUri)
# execute is called by PostgreSQL to SELECT data from the foreign # data source. It receives the qualifiers (WHERE clauses) and the # columns to fetch, and it should return an iterable of rows that # match the query. def execute(self, qualifiers, columns): """Execute query against MongoDB collection""" # Build MongoDB query filter from qualifiers query_filter = {}
for qual in qualifiers: field = qual.field_name operator = qual.operator value = qual.value
# Map SQL operators to MongoDB query operators if operator == '=': # Equality query_filter[field] = value elif operator == '<': query_filter[field] = {'$lt': value} elif operator == '>': query_filter[field] = {'$gt': value} elif operator == '<=': query_filter[field] = {'$lte': value} elif operator == '>=': query_filter[field] = {'$gte': value} elif operator == '<>': query_filter[field] = {'$ne': value} elif operator == '~~': # LIKE # Convert SQL LIKE pattern to regex pattern = value.replace('%', '.*').replace('_', '.') query_filter[field] = { '$regex': f'^{pattern}$', '$options': 'i' }
# Build projection (only fetch requested columns) projection = {col: 1 for col in columns}
try: client = self._get_client() db = client.database
# Execute query cursor = db.collection.find(query_filter, projection)
for doc in cursor: row = {} for column_name in columns: if column_name == '_id': # Convert MongoDB ObjectId to string for # PostgreSQL row[column_name] = str(doc.get('_id', '')) else: row[column_name] = doc.get(column_name) yield row
client.close()
except Exception as e: log_to_postgres( f"MongoDB query failed: {e}", level="ERROR" )
@property def rowid_column(self): """Return the column name used as row identifier""" return '_id'Once the class is defined, you can create a foreign table that uses the custom FDW extension:
CREATE EXTENSION IF NOT EXISTS multicorn;CREATE SERVER mongo_server FOREIGN DATA WRAPPER multicorn OPTIONS ( -- Specify the Python class that implements the FDW wrapper 'mongo_fdw.MongoForeignDataWrapper');CREATE USER MAPPING FOR current_user SERVER mongo_server OPTIONS ( username 'mongo_user', password 'mongo_password');CREATE FOREIGN TABLE mongo_table ( id text, name text, age integer) SERVER mongo_server OPTIONS ( host 'mongo_host', port '27017', database 'mongo_db', collection 'mongo_collection', username 'mongo_user', password 'mongo_password');Here I only defined the execute method to query data from the MongoDB collection, but we can also implement mutation methods like insert, update, and delete to allow modifying data in the MongoDB collection directly from PostgreSQL.
It’s usually not necessary to build a Foreign data wrapper for popular databases like MongoDB as they already exist. Here are a few:
See the full list of available FDW extensions for more details.
Querying a REST API

Not too long ago, at Marmelab, we had a mission to integrate a third-party API into an application. We needed to fetch data from the API and store it in our database for further processing.
At the time, we wrote a custom cron job that periodically fetched the data from the API and inserted it into our database. It worked, but it was a bit of a hack and required extra maintenance.
With FDW, we could create a foreign table that mapped directly to the API endpoint. We could then query it from our application without writing extra code to fetch the data.
To create a foreign table that maps to a REST API, we can use the same Python library as for the MongoDB example, multicorn. Here is an example of how to create a foreign table that maps to the magic the gathering public REST API.
First, create the extension using multicorn
from multicorn import ForeignDataWrapperfrom multicorn.utils import log_to_postgresimport requests
class MtgCardsForeignDataWrapper(ForeignDataWrapper): """FDW for MTG Cards endpoint"""
def __init__(self, options, columns): super(MtgCardsForeignDataWrapper, self) .__init__(options, columns) self.base_url = options.get( 'base_url', 'https://api.magicthegathering.io/v1' ) self.endpoint = options.get('endpoint', '/cards') self.columns = columns
def execute(self, qualifiers, columns): # Build query parameters from WHERE clauses params = {}
for qual in qualifiers: field = qual.field_name operator = qual.operator value = qual.value
# Only handle equality for API parameters if operator == '=': if field in [ 'name', 'type', 'colors', 'rarity', 'set', 'cmc', 'power', 'toughness' ]: params[field] = value # Map `set_code` to `set` API parameter # `set` is a reserved keyword in PostgreSQL, # so we use `set_code` as the column name elif field == 'set_code': params['set'] = value
try: # Make API request url = f"{self.base_url}{self.endpoint}" response = requests.get(url, params=params) response.raise_for_status() data = response.json()
# Yield rows from the cards array # And map API fields to the columns name for card in data.get('cards', []): row = {} for column_name in columns: if column_name == 'mana_cost': row[column_name] = card.get('manaCost') elif column_name == 'set_name': row[column_name] = card.get('setName') elif column_name == 'set_code': row[column_name] = card.get('set') elif column_name == 'image_url': row[column_name] = card.get('imageUrl') elif column_name == 'color_identity': row[column_name] = card.get('colorIdentity') else: row[column_name] = card.get(column_name) yield row
except requests.RequestException as e: log_to_postgres( f"MTG Cards API request failed: {e}", level="ERROR" )And then create the foreign table using the custom FDW extension:
-- Enable the multicorn extensionCREATE EXTENSION multicorn;
-- Define the server for Cards API endpointCREATE SERVER mtg_cards_api FOREIGN DATA WRAPPER multicornOPTIONS ( wrapper 'mtg_fdw.MtgCardsForeignDataWrapper', base_url 'https://api.magicthegathering.io/v1', endpoint '/cards');
-- Foreign table for cardsCREATE FOREIGN TABLE mtg_cards ( id text, name text, mana_cost text, cmc numeric, colors text[], color_identity text[], type text, rarity text, set_name text, set_code text, text text, flavor text, artist text, power text, toughness text, image_url text) SERVER mtg_cards_api;The Magic REST API does not need authentication, but if we need to query an API that does, we can implement the logic to include authentication headers or tokens in the API requests.
See the multicorn documentation for more details on how to create custom FDW extensions.
There are also foreign data wrappers for some well-known APIs, including:
See the full list of available FDW extensions for more details. And of course, as we just showed, we can also create our own custom FDW extension for any API that we need to query from PostgreSQL.
Conclusion
The Foreign Data Wrapper is a powerful PostgreSQL feature that lets us query any external data source directly from SQL. Whether using existing FDW extensions or creating custom ones with Multicorn, it seamlessly integrates external data into our applications.
Next time you need to connect an external data source, consider leveraging Foreign Data Wrappers to query it directly from PostgreSQL!
Authors
Full-stack web developer at marmelab, loves functional programming and JavaScript.