py4ai.data.layer.pandas.repository module
Module with abstraction for accessing to data persistent in pickles, mimicking a ficticious database.
- class py4ai.data.layer.pandas.repository.CsvRepository(filename: Union[str, os.PathLike[str]], serializer: DataSerializer[KE, KD, E, Series], sep: str = ';')
Bases:
PandasRepository
[KE
,KD
,E
]Repository to be used with tabular files stored as csv files.
Create an in-memory archiver based on structured data stored in the filesystem as a CSV.
- Parameters
filename – str, path object or file like object. Any valid string path to a csv file.
serializer – An instance of serializer to convert between raw and domain objects.
sep – str, default ‘;’. Delimiter to use
- class py4ai.data.layer.pandas.repository.PandasRepository(serializer: DataSerializer[KE, KD, E, Series])
Bases:
Repository
[KE
,KD
,E
,Series
,Callable
[[DataFrame
],Series
]],ABC
Archiver based on persistent layers based on tabular files, represented in memory by a pandas DataFrame.
Create an in-memory archiver based on structured data stored as a pandas DataFrame.
- Parameters
serializer – An instance of
serializer
that helps to retrieve/archive a pd.DataFrame row
- commit() PandasRepository[KE, KD, E]
Persist data stored in memory in the file.
- Returns
self
- async create(obj: E) E
Insert an object of type Document/pd.DataFrame/pd.Series in a pd.DataFrame.
- Parameters
obj – An instance of
cgnal.data.model.text.Document, pd.DataFrame or pd.Series
- Returns
self i.e. an instance of
PandasArchiver
with updated self.data object
- property data: DataFrame
Return tabular data stored in memory.
- Returns
pd.DataFrame
- async delete(key: KE) bool
Delete the entry in the persisence layer associated to the provided entity key.
- Parameters
key – key identifying the entity.
- Returns
boolean value indicating whether the deletion has completed successfully.
- async delete_by_criteria(criteria: SearchCriteria[Q]) bool
Delete all entries matching a given query.
- Parameters
criteria – query to be used for deleting entries.
- Returns
boolean value indicating whether the deletion has completed successfully.
- async list(options: ~py4ai.data.layer.common.repository.QueryOptions = <py4ai.data.layer.common.repository.QueryOptions object>) Paged[E]
Return a full list of entities stored in the persistence layer.
- Parameters
options – query options to be used when retrieving data
- Returns
Paged object for retrieved list of entities
- async retrieve(key: KE) Optional[E]
Retrieve row from a dataframe by id.
- Parameters
key – row id
- Returns
retrieved row parsed according to self.dao
- async retrieve_by_criteria(criteria: ~py4ai.data.layer.common.criteria.SearchCriteria[~typing.Callable[[~pandas.core.frame.DataFrame], ~pandas.core.series.Series]], options: ~py4ai.data.layer.common.repository.QueryOptions = <py4ai.data.layer.common.repository.QueryOptions object>) Paged[E]
Retrieve rows satisfying condition, sorted according to given ordering.
- Parameters
criteria – condition to satisfy. If None returns all rows.
options – ordering to respect. If None, no ordering is given.
- Returns
iterator of (ordered) rows satisfying given condition
- async save(entities: Sequence[E]) Sequence[E]
Insert many objects of type Document/pd.DataFrame/pd.Series in a pd.DataFrame.
- Parameters
entities – List of objects to be inserted.
- Returns
the entities inserted in the persistence layer.
- property serializer: DataSerializer[KE, KD, E, Series]
Return the serializer of the data repository.
- Returns
data serializer