Aquameta Chapter 1: meta - Writable System Catalog for PostgreSQL

Follow @aquameta

In the introduction, we talked about Aquameta's first principles, datafication and visualization. We wield the broadsword of datafication as we charge the many-headed hydra that is unnecessary programming complexity. Once the stack is datafied, it's easy to build visual interfaces against. In this writeup, we describe our first, and in some ways most-challenging foe, datafication of the database itself.

Layer zero of Aquameta is called meta, a writable system catalog for PostgreSQL. It exposes PostgreSQL administration commands through data manipulation commands, enabling schema creation, table creation, column renaming, role dropping, and much more via INSERTs UPDATEs and DELETEs. In other words, it makes the DDL accessible via the DML.

Meta unifies "normal" data with schema, database configuration, everything that is PostgreSQL, into a coherent and synthesized information model. Everything is accessible as data. This adds the needed paths to make PostgreSQL truly homoiconic, which breaks down the wall between schema and data, and opens the doors for all manner of meta-programming.

PostgreSQL already has two system catalogs, INFORMATION_SCHEMA and pg_catalog. Meta is different because:

  • It's writable, and changes to the data take effect in the database
  • It has friendly names for relations and columns
  • It's normalized, the views are sensibly laid out in parallel with PostgreSQL's architecture
  • It has a type system of meta-identifiers for primary keys

Here's a simple example of how to use it. Instead of doing:

aquameta=# create schema test_schema;  
CREATE SCHEMA  

You can do:

aquameta=# insert into meta.schema (name) values ('test_schema');  
INSERT 0 1  

These two commands perform identical operations under the hood; meta just makes them accessible through a different interface.

Here is an ER diagram of meta's schema:

aquameta=# set search_path=meta  
aquameta=# \d  
 Schema |         Name         | Type | Owner
--------+----------------------+------+-------
 meta   | cast                 | view | eric
 meta   | column               | view | eric
 meta   | connection           | view | eric
 meta   | constraint_check     | view | eric
 meta   | constraint_unique    | view | eric
 meta   | extension            | view | eric
 meta   | foreign_column       | view | eric
 meta   | foreign_data_wrapper | view | eric
 meta   | foreign_key          | view | eric
 meta   | foreign_server       | view | eric
 meta   | foreign_table        | view | eric
 meta   | function             | view | eric
 meta   | operator             | view | eric
 meta   | relation             | view | eric
 meta   | role                 | view | eric
 meta   | schema               | view | eric
 meta   | sequence             | view | eric
 meta   | table                | view | eric
 meta   | trigger              | view | eric
 meta   | type                 | view | eric
 meta   | view                 | view | eric
(21 rows)

Each relation in meta is a VIEW that queries INFORMATION_SCHEMA, pg_catalog, or wherever else we had to dig to get the data. These views support INSERT, UPDATE and DELETE statements via TRIGGERs that translate the operation into a native PostgreSQL command.

We have a good start on PostgreSQL feature coverage. You can do most common operations through meta instead. We don't have 100% feature coverage yet, but that is the goal.

On the surface, these views expose a clean, consistent, writable interface for PostgreSQL administration via data manipulation.

Examples

Here are a few examples of how you might use meta:

/* drop all the schemas in the database.  Highly destructive! */
delete from meta.schema;  
/* create a table with no columns called `foo` in the `public` schema */
insert into meta.table (name, schema_name) values ('foo', 'public');  
/* rename all columns named `id` to `foo` */
update meta.column set name='foo' where name='id';  
/* list all columns in the beehive schema */
select r.name, c.name, c.type_name  
    from meta.column c
    join meta.relation r on c.relation_id = r.id 
    join meta.schema s on r.schema_id = s.id
    where s.name = 'beehive'
    order by r.name, c.position;
    relation_name     |            name             |      type_name
----------------------+-----------------------------+----------------------
 brands               | name                        | pg_catalog.text
 brands_brand         | id                          | pg_catalog.int4
 brands_brand         | name                        | pg_catalog.text
 brands_brand         | show_on_website             | pg_catalog.bool
 brands_brandcategory | id                          | pg_catalog.int4
 brands_brandcategory | name                        | pg_catalog.text
 brands_brandcategory | brand_id                    | pg_catalog.int4
 brands_brandgroup    | id                          | pg_catalog.int4
 brands_brandgroup    | brand_id                    | pg_catalog.int4
 brands_brandgroup    | name                        | pg_catalog.text
 ...
 (462 rows)
/* list of all the relations in the `beehive` schema */
select name, type from meta.relation where schema_name='beehive';  
                  name                  |    type    
----------------------------------------+------------
 vendor_paymentterm                     | BASE TABLE
 product_margin                         | BASE TABLE
 favorites                              | BASE TABLE
 countries_usstate                      | BASE TABLE
 product_cost                           | VIEW
 warehouse_pieces_piecebreakdown        | BASE TABLE
 ...
(144 rows)

Anatomy of a meta View

Let's take a look at one of these views in detail, meta.column:

aquameta=# \d meta.column  
                       View "meta.column"
    Column     |                Type                | Modifiers
---------------+------------------------------------+-----------
 id            | meta.column_id                     |
 relation_id   | meta.relation_id                   |
 schema_name   | information_schema.sql_identifier  |
 relation_name | information_schema.sql_identifier  |
 name          | information_schema.sql_identifier  |
 position      | information_schema.cardinal_number |
 type_name     | text                               |
 type_id       | meta.type_id                       |
 nullable      | boolean                            |
 default       | information_schema.character_data  |
 primary_key   | boolean                            |
Triggers:  
    meta_column_delete_trigger INSTEAD OF DELETE ON meta."column" FOR EACH ROW EXECUTE PROCEDURE meta.column_delete()
    meta_column_insert_trigger INSTEAD OF INSERT ON meta."column" FOR EACH ROW EXECUTE PROCEDURE meta.column_insert()
    meta_column_update_trigger INSTEAD OF UPDATE ON meta."column" FOR EACH ROW EXECUTE PROCEDURE meta.column_update()
  • The id field is a kind of "soft" primary key. It's of a special type, meta.column_id, which is one of the "meta-identifiers" in the system that uniquely identifies a column with a single value. More about meta-identifiers later.
  • The relation_id field is another meta-identifier, a kind of "soft foreign key" to the meta.relation view, which contains a row for the table or view that this column is a member of.
  • Then comes the human identifiers, schema_name, relation_name and name. These are what they sound like. When INSERTing into this view, you need to specify either the human identifiers, or the meta-identifiers above.
  • The position field tells where this column is in relation to the other columns. It is not currently updatable, as PostgreSQL does not support column reordering.
  • The type_name and type_id fields reference the data type of this column. type_id is another meta-relation, this one foreign-keying to the meta.type relation. You can UPDATE the type field either by updating type_name or type_id, and if PostgreSQL can cast from the original datatype to the new one, it will update the column's type. Otherwise the UPDATE will fail without changing anything.
  • The nullable field is a boolean that determines whether the column is nullable. It behaves as you would expect.
  • The default field contains the column's default value, represented as text. You can update this as well.
  • The primary_key boolean determines whether or not this key is a primary key. Aquameta assumes a single primary key on all tables.
  • Finally, the TRIGGERs listed handle INSERT, UPDATE and DELETE, passing off the operation to the functions meta.column_delete(), meta.column_insert() and meta.column_update().

All together, this view is a general purpose column administration interface. The rest of the meta views behave similarly.

The meta-identifier TYPE System

Besides just the views, meta also contains a system of meta-identifiers, a collection of PostgreSQL composite types that encapsulate the unique identifier for a PostgreSQL component as a single value. You can think of them as the primary keys of the meta views.

PostgreSQL EntityPostgreSQL identifier(s)meta-identifier
schemanameschema_id
castnamecast_id
relationschema_name, namerelation_id
functionschema_name, namefunction_id
typeschema_name, nametype_id
operatorschema_name, nameoperator_id
sequenceschema_name, namesequence_id
functionschema_name, name, parametersfunction_id
triggerschema_name, relation_name, nametrigger_id
foreign_keyschema_name, relation_name, nameforeign_key_id
columnschema_name, relation_name, namecolumn_id
constraintschema_name, relation_name, nameconstraint_id
rowschema_name, relation_name, pk_name, pk_valuerow_id
fieldschema_name, relation_name, pk_name, pk_value, column_namefield_id

When querying against the meta relations, instead of using the human names as identifiers, you can also use the meta-identifiers:

/* select the beehive.customers_customer.name column */
select * from meta.column where id=meta.column_id('beehive','customers_customer','name');  

Meta-identifiers can be cast to other, less-specific identifiers. For example, to get the schema_id that a column_id contains, you can do:

select meta.column_id('beehive','customers_customer','name')::meta.schema_id;  

You can also use the meta-identifiers to do meta-programming in your own tables. For example, if you want to make a table that references a PostgreSQL view, it would look like this:

create table column_widget (  
    id serial primary key,
    name text,
    column_id meta.column_id
);

Benefits

An all-data interface to PostgreSQL has a lot of benefits:

  • Consistency: Operations have a high degree of simplicity and consistency, namely that they all take the form of an INSERT, UPDATE or DELETE instead of PostgreSQL's extensive DDL grammar.
  • Programming Simplicity: From a developer's perspective, writing, say, a PostgreSQL administration GUI against meta instead of the DDL means that the app is "just another CRUD app".
  • Batch Operations: Since UPDATEs and DELETEs can affect multiple rows with a single query, you can easily do batch operations with a single statement, like deleting all roles that match a WHERE clause, or renaming columns en masse.
  • Meta-Programming: Developers can make "normal" tables that foreign-key to one of meta's views, for example a reporting app that has a foreign key to the VIEWs behind the reports.

But really, we don't think we've fully wrapped our head around everything you can do with meta. We're excited to see how people use it.

Conclusion

Meta is the foundational layer for the rest of Aquameta. It allows us to build a programming environment where under the hood, programming is always some form of data manipulation.

In the next chapter, we'll cover Aquameta Layer 1, bundle, a data version control system.

Comments

comments powered by Disqus