CSV on the Web with csvw

Overview

Exploring CSVW described data

>>> from csvw import TableGroup
>>> tg = TableGroup.from_url(
...     'https://raw.githubusercontent.com/cldf/csvw/master/tests/fixtures/csv.txt-metadata.json')
>>> len(tg.tables)
1
>>> len(tg.tables[0].tableSchema.columns)
2
>>> tg.tables[0].tableSchema.columns[0].datatype.base
'string'
>>> tg.tables[0].tableSchema.columns[0].name
'ID'
>>> list(tg.tables[0])[0]['ID']
'first'

Creating CSVW described data

>>> from csvw import Table
>>> t = Table(url='data.csv')
>>> t.tableSchema.columns.append(Column.fromvalue(dict(name='ID', datatype='integer')))
>>> t.write([dict(ID=1), dict(ID=2)])
2
>>> t.to_file('data.csv-metadata.json')
PosixPath('data.csv-metadata.json')
>>> list(Table.from_file('data.csv-metadata.json').iterdicts())
[OrderedDict([('ID', 1)]), OrderedDict([('ID', 2)])]

Where’s the “on the Web” part?

While it’s nice that we can fetch data descriptions from URLs with TableGroup and Table, CSVW also includes a specification for locating metadata. This is meant to support use cases where you find the data first - and might be missing out on the description without such mechanisms. This part of the spec is implemented in csvw.CSVW.

>>> from csvw import CSVW
>>> data = CSVW('https://raw.githubusercontent.com/cldf/csvw/master/tests/fixtures/csv.txt')
>>> data.t
TableGroup(...)
class csvw.CSVW(url, md_url=None, validate=False)[source]

Python API to read CSVW described data and convert it to JSON.

Parameters:
  • url (str) –

  • md_url (typing.Optional[str]) –

  • validate (bool) –

property is_valid: bool

Performs CSVW validation.

Note

For this to also catch problems during metadata location, the CSVW instance must be initialized with validate=True.

static locate_metadata(url=None)[source]

Implements metadata discovery as specified in §5. Locating Metadata

Return type:

typing.Tuple[dict, bool]

to_json(minimal=False)[source]

Implements algorithm described in https://w3c.github.io/csvw/csv2json/#standard-mode

Top-level descriptions

The csvw package provides a Python API to read and write CSVW decribed data. The main building blocks of this API are Python classes representing the top-level objects of a CSVW description.

class csvw.Table(common_props=_Nothing.NOTHING, at_props=_Nothing.NOTHING, parent=None, aboutUrl=None, datatype=None, default='', lang='und', null=_Nothing.NOTHING, ordered=None, propertyUrl=None, required=None, separator=None, textDirection=None, valueUrl=None, dialect=None, notes=_Nothing.NOTHING, tableDirection='auto', tableSchema=None, transformations=_Nothing.NOTHING, url=None, fname=None, suppressOutput=False)[source]

A table description is an object that describes a table within a CSV file.

Table objects provide access to schema manipulation either by manipulating the tableSchema property directly or via higher-level methods like add_foreign_key()

Table objects also mediate read/write access to the actual data through

add_foreign_key(colref, ref_resource, ref_colref)[source]

Add a foreign key constraint to tableSchema.foreignKeys.

Parameters:
  • colref – Column reference for the foreign key.

  • ref_resource – Referenced table.

  • ref_colref – Column reference of the key in the referenced table.

iterdicts(log=None, with_metadata=False, fname=None, _Row=<class 'collections.OrderedDict'>, strict=True)[source]

Iterate over the rows of the table

Create an iterator that maps the information in each row to a dict whose keys are the column names of the table and whose values are the values in the corresponding table cells, or for virtual columns (which have no values) the valueUrl for that column. This includes columns not specified in the table specification.

Note: If the resolved data filename does not exist - but there is a zip file of the form fname+’.zip’, we try to read the data from this file after unzipping.

Parameters:
  • log – Logger object (default None) The object that reports parsing errors. If none is given, parsing errors raise ValueError instead.

  • with_metadata (bool) – (default False) Also yield fname and lineno

  • fname – file-like, pathlib.Path, or str (default None) The file to be read. Defaults to inheriting from a parent object, if one exists.

  • strict – Flag signaling whether data is read strictly - i.e. raising ValueError when invalid data is encountered - or not - i.e. only issueing a warning and returning invalid data as str as provided by the undelying DSV reader.

Return type:

typing.Generator[dict, None, None]

Returns:

A generator of dicts or triples (fname, lineno, dict) if with_metadata

write(items, fname=<object object>, base=None, strict=False, _zipped=False)[source]

Write row items to a CSV file according to the table schema.

Parameters:
  • items (typing.Iterable[typing.Union[dict, list, tuple]]) – Iterator of dict storing the data per row.

  • fname (typing.Union[str, pathlib.Path, None]) – Name of the file to which to write the data.

  • base (typing.Union[str, pathlib.Path, None]) – Base directory relative to which to interpret table urls.

  • strict (typing.Optional[bool]) – Flag signaling to use strict mode when writing. This will raise ValueError if any row (dict) passed in items contains unspecified fieldnames.

  • _zipped (typing.Optional[bool]) – Flag signaling whether the resulting data file should be zipped.

Return type:

typing.Union[str, int]

Returns:

The CSV content if fname==None else the number of rows written.

class csvw.TableGroup(common_props=_Nothing.NOTHING, at_props=_Nothing.NOTHING, parent=None, aboutUrl=None, datatype=None, default='', lang='und', null=_Nothing.NOTHING, ordered=None, propertyUrl=None, required=None, separator=None, textDirection=None, valueUrl=None, dialect=None, notes=_Nothing.NOTHING, tableDirection='auto', tableSchema=None, transformations=_Nothing.NOTHING, url=None, fname=None, tables=_Nothing.NOTHING)[source]

A table group description is an object that describes a group of tables.

A TableGroup delegates most of its responsibilities to the Table objects listed in its tables property. For convenience, TableGroup provides methods to - read data from all tables: TableGroup.read() - write data for all tables: TableGroup.write()

It also provides a method to check the referential integrity of data in related tables via TableGroup.check_referential_integrity()

check_referential_integrity(data=None, log=None, strict=False)[source]

Strict validation does not allow for nullable foreign key columns.

copy(dest)[source]

Write a TableGroup’s data and metadata to files relative to dest, adapting the base attribute.

Parameters:

dest (typing.Union[pathlib.Path, str]) –

Returns:

read()[source]

Read all data of a TableGroup

write(fname, strict=False, _zipped=False, **items)[source]

Write a TableGroup’s data and metadata to files.

Parameters:
  • fname (typing.Union[str, pathlib.Path]) – Filename for the metadata file.

  • items (typing.Iterable[typing.Union[dict, list, tuple]]) – Keyword arguments are used to pass iterables of rows per table, where the table URL is specified as keyword.

  • strict (typing.Optional[bool]) –

  • _zipped (typing.Optional[bool]) –

Reading and writing top-level descriptions

Both types of objects are recognized as top-level descriptions, i.e. may be encountered a JSON objects on the web or on disk. Thus, they share some factory and serialization methods inherited from a base class:

class csvw.metadata.TableLike(common_props=_Nothing.NOTHING, at_props=_Nothing.NOTHING, parent=None, aboutUrl=None, datatype=None, default='', lang='und', null=_Nothing.NOTHING, ordered=None, propertyUrl=None, required=None, separator=None, textDirection=None, valueUrl=None, dialect=None, notes=_Nothing.NOTHING, tableDirection='auto', tableSchema=None, transformations=_Nothing.NOTHING, url=None, fname=None)[source]

A CSVW description object as encountered “in the wild”, i.e. identified by URL on the web or as file on disk.

Since TableLike objects may be instantiated from “externally referenced” objects (via file paths or URLs), they have the necessary means to resolve link properties

>>> from csvw import Table, TableGroup, Link
>>> t = Table.from_file('tests/fixtures/csv.txt-table-metadata.json')
>>> Link('abc.txt').resolve(t.base)
PosixPath('tests/fixtures/abc.txt')
>>> tg = TableGroup.from_url(
...     'https://raw.githubusercontent.com/cldf/csvw/master/tests/fixtures/'
...     'csv.txt-metadata.json')
>>> str(tg.tables[0].url)
'csv.txt'
>>> tg.tables[0].url.resolve(tg.base)
'https://raw.githubusercontent.com/cldf/csvw/master/tests/fixtures/csv.txt'

and URI template properties (see expand()).

property base: str | Path

The “base” to resolve relative links against.

expand(tmpl, row, _row, _name=None, qname=False, uri=False)[source]

Expand a URITemplate using row, _row and _name as context and resolving the result against TableLike.url.

>>> from csvw import URITemplate, TableGroup
>>> tg = TableGroup.from_url(
...     'https://raw.githubusercontent.com/cldf/csvw/master/tests/fixtures/'
...     'csv.txt-metadata.json')
>>> tg.expand(URITemplate('/path?{a}{#b}'), dict(a='1', b='2'), None)
'https://raw.githubusercontent.com/path?1#2'
Parameters:
  • tmpl (csvw.metadata.URITemplate) –

  • row (dict) –

Return type:

str

classmethod from_file(fname, data=None)[source]

Instantiate a CSVW Table or TableGroup description from a metadata file.

Parameters:

fname (typing.Union[str, pathlib.Path]) –

Return type:

csvw.metadata.TableLike

classmethod from_url(url, data=None)[source]

Instantiate a CSVW Table or TableGroup description from a metadata file specified by URL.

Parameters:

url (str) –

Return type:

csvw.metadata.TableLike

to_file(fname, omit_defaults=True)[source]

Write a CSVW Table or TableGroup description as JSON object to a local file.

Parameters:
  • omit_defaults – The CSVW spec specifies defaults for most properties of most description objects. If omit_defaults==True, these properties will be pruned from the JSON object.

  • fname (typing.Union[str, pathlib.Path]) –

Return type:

pathlib.Path

Table schema descriptions

A table’s schema is described using a hierarchy of description objects:

class csvw.metadata.Schema(common_props=_Nothing.NOTHING, at_props=_Nothing.NOTHING, parent=None, aboutUrl=None, datatype=None, default='', lang='und', null=_Nothing.NOTHING, ordered=None, propertyUrl=None, required=None, separator=None, textDirection=None, valueUrl=None, columns=_Nothing.NOTHING, foreignKeys=_Nothing.NOTHING, primaryKey=None, rowTitles=_Nothing.NOTHING)[source]

A schema description is an object that encodes the information about a schema, which describes the structure of a table.

Variables:
  • columnslist of Column descriptions.

  • foreignKeyslist of ForeignKey descriptions.

class csvw.Column(common_props=_Nothing.NOTHING, at_props=_Nothing.NOTHING, parent=None, aboutUrl=None, datatype=None, default='', lang='und', null=_Nothing.NOTHING, ordered=None, propertyUrl=None, required=None, separator=None, textDirection=None, valueUrl=None, name=None, suppressOutput=False, titles=None, virtual=False, number=None)[source]

A column description is an object that describes a single column.

The description provides additional human-readable documentation for a column, as well as additional information that may be used to validate the cells within the column, create a user interface for data entry, or inform conversion into other formats.

class csvw.Datatype(common_props=_Nothing.NOTHING, at_props=_Nothing.NOTHING, base=None, format=None, length=None, minLength=None, maxLength=None, minimum=None, maximum=None, minInclusive=None, maxInclusive=None, minExclusive=None, maxExclusive=None)[source]

A datatype description

Cells within tables may be annotated with a datatype which indicates the type of the values obtained by parsing the string value of the cell.

classmethod fromvalue(v)[source]
Parameters:

v (typing.Union[str, dict, csvw.metadata.Datatype]) – Initialization data for cls; either a single string that is the main datatype of the values of the cell or a datatype description object, i.e. a dict or a cls instance.

Return type:

csvw.metadata.Datatype

Returns:

An instance of cls

For a list of datatypes supported by csvw, see csvw.datatypes