py4ai.data.layer.fs.tables module

Module with abstraction for accessing to data persisted in files and represented by TabularData.

class py4ai.data.layer.fs.tables.CsvSerializer(path: Path, encoding: str = 'utf-8', sep: str = ';')

Bases: FileSerializer[str, TabularData]

DataSerializer to be used for serializing/deserializing CSV files.

Return instance of DataSerializer.

Parameters
  • path – local folder to be used to construct filenames

  • encoding – type of IO serialization (text, binary) to be used when writing files

  • sep – separator used in the csv

get_key(entity: TabularData) str

Extract key for given entity.

Parameters

entity – provided TabularData

Returns

entity key

mode: FileSerializerMode = ''
to_entity(document: IndexedIO[str]) TabularData

Deserialize raw content into domain object entity.

Parameters

document – raw content

Returns

domain object entity

to_object(entity: TabularData) IndexedIO[str]

Serialize domain object entity into raw content.

Parameters

entity – domain object entity

Returns

raw content

to_object_key(key: str) str

Transform entity key into raw key, to be used for indexing in the persistence layer.

Parameters

key – entity key

Returns

raw key

class py4ai.data.layer.fs.tables.LocalDatabase(path: ~pathlib.Path, serializer: ~typing.Type[~py4ai.data.layer.fs.serializer.FileSerializer[str, ~py4ai.data.layer.fs.tables.TabularData]] = <class 'py4ai.data.layer.fs.tables.CsvSerializer'>)

Bases: FileSystemRepository[str, TabularData]

Archiver used for persistent layers used to store tabular data files.

Return an instance of the class.

Parameters
  • path – path where to store the files.

  • serializer – An instance of serializer to convert between raw and domain objects.

criteria: FileSystemCriteriaFactory[KE, E]
class py4ai.data.layer.fs.tables.PickleSerializer(path: Path, encoding: str = 'utf-8', sep: str = ';')

Bases: CsvSerializer

DataSerializer to be used for serializing/deserializing pickle files.

Return instance of DataSerializer.

Parameters
  • path – local folder to be used to construct filenames

  • encoding – type of IO serialization (text, binary) to be used when writing files

  • sep – separator used in the csv

mode: FileSerializerMode = 'b'
to_entity(document: IndexedIO[str]) TabularData

Deserialize raw content into domain object entity.

Parameters

document – raw content

Returns

domain object entity

to_object(entity: TabularData) IndexedIO[str]

Serialize domain object entity into raw content.

Parameters

entity – domain object entity

Returns

raw content

to_object_key(key: str) str

Transform entity key into raw key, to be used for indexing in the persistence layer.

Parameters

key – entity key

Returns

raw key

class py4ai.data.layer.fs.tables.TabularData(*, name: str = '', data: DataFrame)

Bases: BaseModel

Domain object to represent tabular data.

Create a new model by parsing and validating input data from keyword arguments.

Raises ValidationError if the input data cannot be parsed to form a valid model.

class Config

Bases: object

Specs for the pydantic model.

arbitrary_types_allowed = True
data: DataFrame
name: str
update(other: TabularData) TabularData

Return TabularData object by concatenating two TabularData objects.

Parameters

other – second TabularData

Returns

merged TabularData