csvw.db

SQLite as alternative storage backend for a TableGroup’s data.

For the most part, translation of a TableGroup’s tableSchema to SQL works as expected:

  • each table is converted to a CREATE TABLE statement

  • each column specifies a column in the corresponding CREATE TABLE statement

  • foreignKey constraints are added according to the corresponding tableSchema property.

List-valued foreignKeys are supported as follows: For each pair of tables related through a list-valued foreign key, an association table is created. To make it possible to distinguish multiple list-valued foreign keys between the same two tables, the association table has a column context, which stores the name of the foreign key column from which a row in the assocation table was created.

Other list-valued columns work in two different ways: If the atomic datatype is string, the specified separator is used to create a concatenated string representation in the database field. Otherwise, the list of values is serialized as JSON.

SQL table and column names can be customized by passing a translator callable when instantiating a Database.

SQLite support has the following limitations:

  • regex constraints on strings (as specified via a csvw.Datatype’s format attribute) are not enforced by the database.

class csvw.db.ColSpec(name, csvw_type='string', separator=None, db_type=None, convert=None, read=None, required=False, csvw=None)[source]

A ColSpec captures sufficient information about a csvw.Column for the DB schema.

check(translate)[source]

We try to convert as many data constraints as possible into SQLite CHECK constraints.

Parameters:

translate (csvw.db.ColumnTranslator) – Callable to translate column names between CSVW metadata and DB schema.

Return type:

typing.Optional[str]

Returns:

A string suitable as argument of an SQL CHECK constraint.

class csvw.db.ColumnTranslator(*args, **kwargs)[source]
class csvw.db.Database(tg, fname=None, translate=None, drop_self_referential_fks=True)[source]

Represents a SQLite database associated with a csvw.TableGroup instance.

Parameters:
  • tg (csvw.metadata.TableGroup) – TableGroup instance defining the schema of the database.

  • fname (typing.Union[str, pathlib.Path, None]) – Path to which to write the database file.

  • translate (typing.Optional[csvw.db.SchemaTranslator]) – Schema object name translator.

  • drop_self_referential_fks (typing.Optional[bool]) – Flag signaling whether to drop or enforce self-referential foreign-key constraints.

Warning

We write rows of a table to the database sequentially. Since CSVW does not require ordering rows in tables such that self-referential foreign-key constraints are satisfied at each row, we don’t enforce self-referential foreign-keys by default in order to not trigger “false” integrity errors. If data in a CSVW Table is known to be ordered appropriately, False should be passed as drop_self_referential_fks keyword parameter to enforce self-referential foreign-keys.

association_table_context(table, column, fkey)[source]

Context for association tables is created calling this method.

Note: If a custom value for the context column is created by overwriting this method, select_many_to_many must be adapted accordingly, to make sure the custom context is retrieved when reading the data from the db.

Parameters:
  • table

  • column

  • fkey

Returns:

a pair (foreign key, context)

static name_translator(table, column=None)[source]

A callable with this signature can be passed into DB creation to control the names of the schema objects.

Parameters:
  • table (str) – CSVW name of the table before translation

  • column (typing.Optional[str]) – CSVW name of a column of table before translation

Return type:

str

Returns:

Translated table name if column is None else translated column name

read()[source]
Return type:

typing.Dict[str, typing.List[typing.OrderedDict]]

Returns:

A dict where keys are SQL table names corresponding to CSVW tables and values are lists of rows, represented as dicts where keys are the SQL column names.

separator(tname, cname)[source]
Return type:

typing.Optional[str]

Returns:

separator for the column specified by db schema names tname and cname.

Parameters:
  • tname (str) –

  • cname (str) –

write(*, force=False, _exists_ok=False, _skip_extra=False, **items)[source]

Creates a db file with the core schema.

Parameters:

force – If True an existing db file will be overwritten.

class csvw.db.SchemaTranslator(*args, **kwargs)[source]
class csvw.db.TableSpec(name, columns=_Nothing.NOTHING, foreign_keys=_Nothing.NOTHING, many_to_many=_Nothing.NOTHING, primary_key=None)[source]

A TableSpec captures sufficient information about a csvw.Table for the DB schema.

Note

We support “light-weight” many-to-many relationships by allowing list-valued foreign key columns in CSVW. In the database these columns are turned into an associative table, adding the name of the column as value a context column. Thus, multiple columns in a table my be specified as targets of many-to-many relations with the same table.

classmethod association_table(atable, apk, btable, bpk)[source]

List-valued foreignKeys are supported as follows: For each pair of tables related through a list-valued foreign key, an association table is created. To make it possible to distinguish multiple list-valued foreign keys between the same two tables, the association table has a column context, which stores the name of the foreign key column from which a row in the assocation table was created.

Return type:

csvw.db.TableSpec

classmethod from_table_metadata(table, drop_self_referential_fks=True)[source]

Create a TableSpec from the schema description of a csvw.metadata.Table.

Parameters:
  • table (csvw.metadata.Table) – csvw.metadata.Table instance.

  • drop_self_referential_fks (typing.Optional[bool]) – Flag signaling whether to drop self-referential foreign keys. This may be necessary, if the order of rows in a CSVW table does not guarantee referential integrity when inserted in order (e.g. an eralier row refering to a later one).

Return type:

csvw.db.TableSpec

Returns:

TableSpec instance.

sql(translate)[source]
Parameters:

translate (csvw.db.SchemaTranslator) –

Return type:

str

Returns:

The SQL statement to create the table.

csvw.db.insert(db, translate, table, keys, *rows, single=False)[source]

Insert a sequence of rows into a table.

Parameters:
  • db (sqlite3.Connection) – Database connection.

  • translate (csvw.db.SchemaTranslator) – Callable translating table and column names to proper schema object names.

  • table (str) – Untranslated table name.

  • keys (typing.Sequence[str]) – Untranslated column names.

  • rows (list) – Sequence of rows to insert.

  • single (typing.Optional[bool]) – Flag signaling whether to insert all rows at once using executemany or one at a time, allowing for more focused debugging output in case of errors.

csvw.db.schema(tg, drop_self_referential_fks=True)[source]

Convert the table and column descriptions of a TableGroup into specifications for the DB schema.

Parameters:
  • tg (csvw.metadata.TableGroup) – CSVW TableGroup.

  • drop_self_referential_fks (typing.Optional[bool]) – Flag signaling whether to drop self-referential foreign keys. This may be necessary, if the order of rows in a CSVW table does not guarantee referential integrity when inserted in order (e.g. an eralier row refering to a later one).

Return type:

typing.List[csvw.db.TableSpec]

Returns:

A pair (tables, reference_tables).