py4ai.data.model.ml module
Module for specifying data-models to be used in modelling.
- class py4ai.data.model.ml.CachedDataset(*args, **kwargs)
- Bases: - DatasetUtilsMixin[- FeatType,- LabType],- CachedIterable[- Sample[- FeatType,- LabType]],- DillSerialization- Class that represents dataset cached in-memory, derived by a cached iterables of samples. - Return instance of a class to be used for implementing cached iterables. - Parameters
- items – sequence or iterable of elements 
 - cached_type
- alias of - CachedDataset
 - lazy_type
- alias of - LazyDataset
 - to_df() DataFrame
- Reformat the Features and Labels as a DataFrame. - Returns
- DataFrame, Dataframe with features and labels 
 
 - union(other: TDatasetUtilsMixin) CachedDataset[FeatType, LabType]
- Perform union on CachedDatasets. - Parameters
- other – CachedDataset 
- Returns
- union of current and other CachedDataset 
 
 
- class py4ai.data.model.ml.DatasetUtilsMixin(*args, **kwargs)
- Bases: - IterableUtilsMixin[- Sample[- FeatType,- LabType],- LazyDataset[FeatType, LabType],- CachedDataset[FeatType, LabType]],- Generic[- FeatType,- LabType],- ABC- Base class for representing datasets as iterable over Samples. - Create a new instance of this class. - Parameters
- cls – parent object class 
- args – passed to the super class __new__ method 
- kwargs – passed to the super class __new__ method 
 
- Raises
- RuntimeError – if the cached and lazy versions were not defined before instantiating the class 
- Returns
- an instance of this class 
 - property asPandasDataset: PandasDataset[FeatType, LabType]
- Cast object as a PandasDataset. - Returns
- dataset 
 
 - cached_type: Type[CachedIterableType]
 - static checkNames(x: Optional[Union[int, str, Any]]) Union[str, int]
- Check that feature names comply with format and cast them to either string or int. - Parameters
- x – feature name 
- Returns
- name as int or str 
- Raises
- AttributeError – if x is none 
 
 - getFeaturesAs(type: Literal['array']) ndarray[Any, dtype[Any]]
- getFeaturesAs(type: Literal['pandas']) DataFrame
- getFeaturesAs(type: Literal['dict']) Dict[Union[str, int], FeatType]
- getFeaturesAs(type: Literal['list']) List[FeatType]
- getFeaturesAs(type: Literal['lazy']) Iterator[FeatType]
- Return object of the specified type containing the feature space. - Parameters
- type – type of return. Can be one of “pandas”, “dict”, “list” or “array 
- Returns
- an object of the specified type containing the features 
- Raises
- ValueError – if the provided type is not one of the allowed ones 
 
 - getLabelsAs(type: Literal['array']) ndarray[Any, dtype[Any]]
- getLabelsAs(type: Literal['pandas']) DataFrame
- getLabelsAs(type: Literal['dict']) Dict[Union[str, int], LabType]
- getLabelsAs(type: Literal['list']) List[LabType]
- getLabelsAs(type: Literal['lazy']) Iterator[LabType]
- Return an object of the specified type containing the labels. - Parameters
- type – type of return. Can be one of “pandas”, “dict”, “list” or “array 
- Returns
- an object of the specified type containing the features 
- Raises
- ValueError – if the provided type is not one of the allowed ones 
 
 - lazy_type: Type[LazyIterableType]
 - property type: Type[Sample[FeatType, LabType]]
- Return the type of the objects in the Iterable. - Returns
- type of the object of the iterable 
 
 - abstract union(other: TDatasetUtilsMixin) DatasetUtilsMixin[FeatType, LabType]
- Return a union of datasets. - Parameters
- other – other dataset to join 
- Returns
- union dataset 
 
 
- class py4ai.data.model.ml.LazyDataset(*args, **kwargs)
- Bases: - LazyIterable[- Sample[- FeatType,- LabType]],- DatasetUtilsMixin[- FeatType,- LabType]- Class that represents dataset derived by a lazy iterable of samples. - Return an instance of the class to be used for implementing lazy iterables. - Parameters
- items – IterGenerator containing the generator of items 
 - cached_type
- alias of - CachedDataset
 - features() Iterator[FeatType]
- Return an iterator over sample features. - Returns
- iterable of features 
 
 - getFeaturesAs(type: Literal['array']) ndarray[Any, dtype[Any]]
- getFeaturesAs(type: Literal['pandas']) DataFrame
- getFeaturesAs(type: Literal['dict']) Dict[Union[str, int], FeatType]
- getFeaturesAs(type: Literal['list']) List[FeatType]
- getFeaturesAs(type: Literal['lazy']) Iterator[FeatType]
- Return object of the specified type containing the feature space. - Parameters
- type – type of return. Can be one of “pandas”, “dict”, “list” or “array 
- Returns
- an object of the specified type containing the features 
 
 - getLabelsAs(type: Literal['array']) ndarray[Any, dtype[Any]]
- getLabelsAs(type: Literal['pandas']) DataFrame
- getLabelsAs(type: Literal['dict']) Dict[Union[str, int], LabType]
- getLabelsAs(type: Literal['list']) List[LabType]
- getLabelsAs(type: Literal['lazy']) Iterator[LabType]
- Return an object of the specified type containing the labels. - Parameters
- type – type of return. Can be one of “pandas”, “dict”, “list”, “array” or iterators 
- Returns
- an object of the specified type containing the features 
 
 - labels() Iterator[LabType]
- Return an iterator over sample labels. - Returns
- iterable of labels 
 
 - lazy_type
- alias of - LazyDataset
 - union(other: TDatasetUtilsMixin) LazyDataset[FeatType, LabType]
- Perform union on LazyDatasets. - Parameters
- other – LazyDataset 
- Returns
- union of LazyDatasets 
 
 - withLookback(lookback: int) LazyDataset[FeatType, LabType]
- Create a LazyDataset with features that are an array of - lookbacklists of samples’ features.- Parameters
- lookback – number of samples’ features to look at 
- Returns
- LazyDatasetwith changed samples
 
 
- class py4ai.data.model.ml.MultiFeatureSample(features: List[ndarray[Any, dtype[Any]]], label: Optional[LabType] = None, name: Optional[str] = None)
- Bases: - Sample[- List[- ndarray],- LabType]- Class representing an observation defined by a nested list of arrays. - Object representing a single sample of a training or test set. - Parameters
- features – features of the sample 
- label – labels of the sample (optional) 
- name – id of the sample (optional) 
 
 
- class py4ai.data.model.ml.PandasDataset(*args, **kwargs)
- Bases: - Generic[- FeatType,- LabType],- DatasetUtilsMixin[- FeatType,- LabType],- DillSerialization- Dataset represented via pandas Dataframes for features and labels. - Return a datastructure built on top of pandas dataframes. - The PandasDataFrame allows to pack features and labels together and obtain features and labels as a pandas dataframe, numpy array or a dictionary. For unsupervised learning tasks the labels are left as None. - Parameters
- features – a dataframe or a series of features 
- labels – a dataframe or a series of labels. None in case no labels are present. 
 
- Raises
- TypeError – if the labels or features are not DataFrames nor Series 
 - property cached: bool
- Return whether the dataset is cached or not in memory. - Returns
- boolean 
 
 - cached_type
- alias of - PandasDataset
 - classmethod createObject(features: Union[DataFrame, Series], labels: Optional[Union[DataFrame, Series]]) TPandasDataset
- Create a PandasDataset object. - Parameters
- features – features as pandas dataframe/series 
- labels – labels as pandas dataframe/series 
 
- Returns
- a - PandasDatasetobject
 
 - dropna(**kwargs: Any) TPandasDataset
- Drop NAs from feature and labels. - Parameters
- kwargs – keyworded arguments are passed to dropna 
- Returns
- PandasDatasetwith features and labels without NAs
 
 - classmethod empty() TPandasDataset
- Return empty object. - Returns
- Empty instance of class 
 
 - property features: DataFrame
- Get features as pandas dataframe. - Returns
- pd.DataFrame 
 
 - classmethod from_sequence(datasets: Sequence[TPandasDataset]) TPandasDataset
- Create a PandasDataset from a list of pandas datasets using pd.concat. - Parameters
- datasets – list of PandasDatasets 
- Returns
- PandasDataset
 
 - getFeaturesAs(type: Literal['array']) ndarray[Any, dtype[Any]]
- getFeaturesAs(type: Literal['pandas']) DataFrame
- getFeaturesAs(type: Literal['dict']) Dict[Union[str, int], FeatType]
- getFeaturesAs(type: Literal['list']) List[FeatType]
- getFeaturesAs(type: Literal['lazy']) Iterator[FeatType]
- Get features as numpy array, pandas dataframe or dictionary. - Parameters
- type – str, default is ‘array’, can be ‘array’,’pandas’,’dict’ 
- Returns
- features according to the given type 
- Raises
- ValueError – provided type not allowed 
 
 - getLabelsAs(type: Literal['array']) ndarray[Any, dtype[Any]]
- getLabelsAs(type: Literal['pandas']) DataFrame
- getLabelsAs(type: Literal['dict']) Dict[Union[str, int], LabType]
- getLabelsAs(type: Literal['list']) List[LabType]
- getLabelsAs(type: Literal['lazy']) Iterator[LabType]
- Get labels as numpy array, pandas dataframe or dictionary. - Parameters
- type – str, default is ‘array’, can be ‘array’,’pandas’,’dict’ 
- Returns
- labels according to the given type 
- Raises
- ValueError – provided type not allowed 
 
 - property index: Index
- Get Dataset index. - Returns
- pd.Index 
 
 - intersection() TPandasDataset
- Intersect feature and labels indices. - Returns
- PandasDatasetwith features and labels with intersected indices
 
 - property items: Iterator[Sample[FeatType, LabType]]
- Get features as an iterator of Samples. - Yield
- Iterator of objects of - py4ai.data.model.ml.Sample
 
 - property labels: DataFrame
- Get labels as a pandas dataframe. - Returns
- pd.DataFrame 
 
 - lazy_type
- alias of - LazyDataset
 - loc(idx: List[Any]) TPandasDataset
- Find given indices in features and labels. - Parameters
- idx – input indices 
- Returns
- PandasDataset with features and labels filtered on input indices 
 
 - takeAsPandas(n: int) TPandasDataset
- Return top n records as a PandasDataset. - Parameters
- n – int specifying number of records to output 
- Returns
- PandasDatasetof length n
 
 - union(other: TPandasDataset) TPandasDataset
- Return a union between PandasDatasets. - Parameters
- other – Dataset to be merged 
- Returns
- Dataset resulting from the merge 
 
 
- class py4ai.data.model.ml.PandasTimeIndexedDataset(*args, **kwargs)
- Bases: - PandasDataset[- FeatType,- LabType],- Generic[- FeatType,- LabType]- Class to be used for datasets that have time-indexed samples. - Return a datastructure built on top of pandas dataframes that allows to pack features and labels that are time indexed. - Features and labels can be obtained as a pandas dataframe, numpy array or a dictionary. For unsupervised learning tasks the labels are left as None. - Parameters
- features – pandas dataframe/series where index elements are dates in string format 
- labels – pandas dataframe/series where index elements are dates in string format 
 
 
- class py4ai.data.model.ml.Sample(features: FeatType, label: Optional[LabType] = None, name: Optional[Union[int, str, Any]] = None)
- Bases: - DillSerialization,- Generic[- FeatType,- LabType]- Base class for representing a sample/observation. - Return an object representing a single sample of a training or test set. - Parameters
- features – features of the sample 
- label – labels of the sample (optional) 
- name – id of the sample (optional) 
 
 
- py4ai.data.model.ml.features_and_labels_to_dataset(X: Union[DataFrame, Series], y: Optional[Union[DataFrame, Series]] = None) CachedDataset[Dict[Any, Any], int]
- Pack features and labels into a CachedDataset. - Parameters
- X – features which can be a pandas dataframe or a pandas series object 
- y – labels which can be a pandas dataframe or a pandas series object 
 
- Returns
- an instance of - py4ai.data.model.ml.CachedDataset