csvw.dsv
Support for reading delimiter-separated value files.
This module contains unicode aware replacements for csv.reader()
and csv.writer()
. It was stolen/extracted from the csvkit
project to allow re-use when the whole csvkit
package isn’t
required.
The original implementations were largely copied from examples in the csv module documentation.
- class csvw.dsv.NamedTupleReader(f, fieldnames=None, restkey=None, restval=None, **kw)[source]
A UnicodeReader yielding one namedtuple per row.
Note
This reader has some limitations, notably that fieldnames must be normalized to be admissible Python names, but also bad performance (compared with UnicodeDictReader).
- class csvw.dsv.UnicodeDictReader(f, fieldnames=None, restkey=None, restval=None, **kw)[source]
A UnicodeReader yielding one dict per row.
- Parameters:
f – As for
UnicodeReader
fieldnames –
>>> with UnicodeDictReader( ... 'tests/fixtures/frictionless-data.csv', ... dialect=Dialect(delimiter='|', header=False), ... fieldnames=[str(i) for i in range(1, 11)]) as reader: ... for row in reader: ... print(row) ... break ... OrderedDict([('1', 'FK'), ('2', 'Year'), ('3', 'Location name'), ('4', 'Value'), ('5', 'binary'), ('6', 'anyURI'), ('7', 'email'), ('8', 'boolean'), ('9', 'array'), ('10', 'geojson')])
- class csvw.dsv.UnicodeReader(f, dialect=None, **kw)[source]
Read Unicode data from a csv file.
- Parameters:
f (
typing.Union
[str
,pathlib.Path
,typing.IO
,typing.Iterable
[str
]]) – The source from which to read the data; a local path specified as str or pathlib.Path, a file-like object or a list of lines.dialect (
typing.Union
[csvw.dsv_dialects.Dialect
,str
,None
]) – Either a dialect name as recognized by csv.reader or aDialect
instance for dialect customization beyond what can be done with csv.writer.kw – Keyword arguments passed through to csv.reader.
>>> with UnicodeReader('tests/fixtures/frictionless-data.csv', delimiter='|') as reader: ... for row in reader: ... print(row) ... break ... ['FK', 'Year', 'Location name', 'Value', 'binary', 'anyURI', 'email', 'boolean', 'array', 'geojson']
- class csvw.dsv.UnicodeReaderWithLineNumber(f, dialect=None, **kw)[source]
A UnicodeReader yielding (lineno, row) pairs, where “lineno” is the 1-based number of the the text line where the (possibly multi-line) row data starts in the DSV file.
- Parameters:
f (
typing.Union
[str
,pathlib.Path
,typing.IO
,typing.Iterable
[str
]]) –dialect (
typing.Union
[csvw.dsv_dialects.Dialect
,str
,None
]) –
- class csvw.dsv.UnicodeWriter(f=None, dialect=None, **kw)[source]
Write Unicode data to a csv file.
- Parameters:
f (
typing.Union
[str
,pathlib.Path
,None
]) – The target to which to write the data; a local path specified as str or pathlib.Path or None, in which case the data, formatted as DSV can be retrieved viaread()
dialect (
typing.Union
[csvw.dsv_dialects.Dialect
,str
,None
]) – Either a dialect name as recognized by csv.writer or aDialect
instance for dialect customization beyond what can be done with csv.writer.kw – Keyword arguments passed through to csv.writer.
>>> from csvw import UnicodeWriter >>> with UnicodeWriter('data.tsv', delimiter=' ') as writer: ... writer.writerow(['ä', 'ö', 'ü'])
- csvw.dsv.filter_rows_as_dict(fname, filter_, **kw)[source]
Rewrite a dsv file, filtering the rows.
- Parameters:
fname (
typing.Union
[str
,pathlib.Path
]) – Path to dsv filefilter – callable which accepts a dict with a row’s data as single argument returning a Boolean indicating whether to keep the row (True) or to discard it False.
kw – Keyword arguments to be passed UnicodeReader and UnicodeWriter.
filter_ (
typing.Callable
[[dict
],bool
]) –
- Return type:
int
- Returns:
The number of rows that have been removed.
- csvw.dsv.iterrows(lines_or_file, namedtuples=False, dicts=False, encoding='utf-8', **kw)[source]
Convenience factory function for csv reader.
- Parameters:
lines_or_file (
typing.Union
[str
,pathlib.Path
,typing.IO
,typing.Iterable
[str
]]) – Content to be read. Either a file handle, a file path or a list of strings.namedtuples (
typing.Optional
[bool
]) – Yield namedtuples.dicts (
typing.Optional
[bool
]) – Yield dicts.encoding (
typing.Optional
[str
]) – Encoding of the content.kw – Keyword parameters are passed through to csv.reader.
- Return type:
typing.Generator
- Returns:
A generator over the rows.
- csvw.dsv.rewrite(fname, visitor, **kw)[source]
Utility function to rewrite rows in dsv files.
- Parameters:
fname (
typing.Union
[str
,pathlib.Path
]) – Path of the dsv file to operate on.visitor (
typing.Callable
[[int
,typing.List
[str
]],typing.Optional
[typing.List
[str
]]]) – A callable that takes a line-number and a row as input and returns a (modified) row or None to filter out the row.kw – Keyword parameters are passed through to csv.reader/csv.writer.
DSV data can be surprisingly diverse. While Python’s csv module offers out-of-the-box support for the basic formatting parameters, CSVW recognizes a couple more, like skipColumns or skipRows.
See also
- class csvw.dsv_dialects.Dialect(encoding='utf-8', lineTerminators=_Nothing.NOTHING, quoteChar='"', doubleQuote=True, skipRows=0, commentPrefix='#', header=True, headerRowCount=1, delimiter=',', skipColumns=0, skipBlankRows=False, skipInitialSpace=False, trim='false')[source]
A CSV dialect specification.