py4ai.data.layer.pandas.repository module

Module with abstraction for accessing to data persistent in pickles, mimicking a ficticious database.

class py4ai.data.layer.pandas.repository.CsvRepository(filename: Union[str, os.PathLike[str]], serializer: DataSerializer[KE, KD, E, Series], sep: str = ';')

Bases: PandasRepository[KE, KD, E]

Repository to be used with tabular files stored as csv files.

Create an in-memory archiver based on structured data stored in the filesystem as a CSV.

Parameters
  • filename – str, path object or file like object. Any valid string path to a csv file.

  • serializer – An instance of serializer to convert between raw and domain objects.

  • sep – str, default ‘;’. Delimiter to use

class py4ai.data.layer.pandas.repository.PandasRepository(serializer: DataSerializer[KE, KD, E, Series])

Bases: Repository[KE, KD, E, Series, Callable[[DataFrame], Series]], ABC

Archiver based on persistent layers based on tabular files, represented in memory by a pandas DataFrame.

Create an in-memory archiver based on structured data stored as a pandas DataFrame.

Parameters

serializer – An instance of serializer that helps to retrieve/archive a pd.DataFrame row

commit() PandasRepository[KE, KD, E]

Persist data stored in memory in the file.

Returns

self

async create(obj: E) E

Insert an object of type Document/pd.DataFrame/pd.Series in a pd.DataFrame.

Parameters

obj – An instance of cgnal.data.model.text.Document, pd.DataFrame or pd.Series

Returns

self i.e. an instance of PandasArchiver with updated self.data object

property data: DataFrame

Return tabular data stored in memory.

Returns

pd.DataFrame

async delete(key: KE) bool

Delete the entry in the persisence layer associated to the provided entity key.

Parameters

key – key identifying the entity.

Returns

boolean value indicating whether the deletion has completed successfully.

async delete_by_criteria(criteria: SearchCriteria[Q]) bool

Delete all entries matching a given query.

Parameters

criteria – query to be used for deleting entries.

Returns

boolean value indicating whether the deletion has completed successfully.

async list(options: ~py4ai.data.layer.common.repository.QueryOptions = <py4ai.data.layer.common.repository.QueryOptions object>) Paged[E]

Return a full list of entities stored in the persistence layer.

Parameters

options – query options to be used when retrieving data

Returns

Paged object for retrieved list of entities

async retrieve(key: KE) Optional[E]

Retrieve row from a dataframe by id.

Parameters

key – row id

Returns

retrieved row parsed according to self.dao

async retrieve_by_criteria(criteria: ~py4ai.data.layer.common.criteria.SearchCriteria[~typing.Callable[[~pandas.core.frame.DataFrame], ~pandas.core.series.Series]], options: ~py4ai.data.layer.common.repository.QueryOptions = <py4ai.data.layer.common.repository.QueryOptions object>) Paged[E]

Retrieve rows satisfying condition, sorted according to given ordering.

Parameters
  • criteria – condition to satisfy. If None returns all rows.

  • options – ordering to respect. If None, no ordering is given.

Returns

iterator of (ordered) rows satisfying given condition

async save(entities: Sequence[E]) Sequence[E]

Insert many objects of type Document/pd.DataFrame/pd.Series in a pd.DataFrame.

Parameters

entities – List of objects to be inserted.

Returns

the entities inserted in the persistence layer.

property serializer: DataSerializer[KE, KD, E, Series]

Return the serializer of the data repository.

Returns

data serializer