csvw.dsv

Support for reading delimiter-separated value files.

This module contains unicode aware replacements for csv.reader() and csv.writer(). It was stolen/extracted from the csvkit project to allow re-use when the whole csvkit package isn’t required.

The original implementations were largely copied from examples in the csv module documentation.

class csvw.dsv.NamedTupleReader(f, fieldnames=None, restkey=None, restval=None, **kw)[source]: A UnicodeReader yielding one namedtuple per row.

Note

This reader has some limitations, notably that fieldnames must be normalized to be admissible Python names, but also bad performance (compared with UnicodeDictReader).

class csvw.dsv.UnicodeDictReader(f, fieldnames=None, restkey=None, restval=None, **kw)[source]

A UnicodeReader yielding one dict per row.

Parameters:

f – As for UnicodeReader
fieldnames –

>>> with UnicodeDictReader(
...         'tests/fixtures/frictionless-data.csv',
...         dialect=Dialect(delimiter='|', header=False),
...         fieldnames=[str(i) for i in range(1, 11)]) as reader:
...     for row in reader:
...         print(row)
...         break
...
OrderedDict([('1', 'FK'), ('2', 'Year'), ('3', 'Location name'), ('4', 'Value'),
('5', 'binary'), ('6', 'anyURI'), ('7', 'email'), ('8', 'boolean'), ('9', 'array'),
('10', 'geojson')])

class csvw.dsv.UnicodeReader(f, dialect=None, **kw)[source]

Read Unicode data from a csv file.

Parameters:

f (typing.Union[str, pathlib.Path, typing.IO, typing.Iterable[str]]) – The source from which to read the data; a local path specified as str or pathlib.Path, a file-like object or a list of lines.
dialect (typing.Union[csvw.dsv_dialects.Dialect, str, None]) – Either a dialect name as recognized by csv.reader or a Dialect instance for dialect customization beyond what can be done with csv.writer.
kw – Keyword arguments passed through to csv.reader.

>>> with UnicodeReader('tests/fixtures/frictionless-data.csv', delimiter='|') as reader:
...     for row in reader:
...         print(row)
...         break
...
['FK', 'Year', 'Location name', 'Value', 'binary', 'anyURI', 'email', 'boolean', 'array',
'geojson']

class csvw.dsv.UnicodeReaderWithLineNumber(f, dialect=None, **kw)[source]

A UnicodeReader yielding (lineno, row) pairs, where “lineno” is the 1-based number of the the text line where the (possibly multi-line) row data starts in the DSV file.

Parameters:

f (typing.Union[str, pathlib.Path, typing.IO, typing.Iterable[str]]) –
dialect (typing.Union[csvw.dsv_dialects.Dialect, str, None]) –

class csvw.dsv.UnicodeWriter(f=None, dialect=None, **kw)[source]

Write Unicode data to a csv file.

Parameters:

f (typing.Union[str, pathlib.Path, None]) – The target to which to write the data; a local path specified as str or pathlib.Path or None, in which case the data, formatted as DSV can be retrieved via read()
dialect (typing.Union[csvw.dsv_dialects.Dialect, str, None]) – Either a dialect name as recognized by csv.writer or a Dialect instance for dialect customization beyond what can be done with csv.writer.
kw – Keyword arguments passed through to csv.writer.

>>> from csvw import UnicodeWriter
>>> with UnicodeWriter('data.tsv', delimiter='  ') as writer:
...     writer.writerow(['ä', 'ö', 'ü'])

read()[source]

If the writer has been initialized passing None as target, the CSV data as bytes can be retrieved calling this method.

Return type:: typing.Optional[bytes]

csvw.dsv.filter_rows_as_dict(fname, filter_, **kw)[source]

Rewrite a dsv file, filtering the rows.

Parameters:

fname (typing.Union[str, pathlib.Path]) – Path to dsv file
filter – callable which accepts a dict with a row’s data as single argument returning a Boolean indicating whether to keep the row (True) or to discard it False.
kw – Keyword arguments to be passed UnicodeReader and UnicodeWriter.
filter_ (typing.Callable[[dict], bool]) –

Return type:

int

Returns:

The number of rows that have been removed.

csvw.dsv.iterrows(lines_or_file, namedtuples=False, dicts=False, encoding='utf-8', **kw)[source]

Convenience factory function for csv reader.

Parameters:

lines_or_file (typing.Union[str, pathlib.Path, typing.IO, typing.Iterable[str]]) – Content to be read. Either a file handle, a file path or a list of strings.
namedtuples (typing.Optional[bool]) – Yield namedtuples.
dicts (typing.Optional[bool]) – Yield dicts.
encoding (typing.Optional[str]) – Encoding of the content.
kw – Keyword parameters are passed through to csv.reader.

Return type:

typing.Generator

Returns:

A generator over the rows.

csvw.dsv.rewrite(fname, visitor, **kw)[source]

Utility function to rewrite rows in dsv files.

Parameters:

fname (typing.Union[str, pathlib.Path]) – Path of the dsv file to operate on.
visitor (typing.Callable[[int, typing.List[str]], typing.Optional[typing.List[str]]]) – A callable that takes a line-number and a row as input and returns a (modified) row or None to filter out the row.
kw – Keyword parameters are passed through to csv.reader/csv.writer.

DSV data can be surprisingly diverse. While Python’s csv module offers out-of-the-box support for the basic formatting parameters, CSVW recognizes a couple more, like skipColumns or skipRows.