malt.data.dataset.Dataset
- class malt.data.dataset.Dataset(molecules: Optional[List] = None)[source]
Bases:
torch.utils.data.dataset.Dataset
A collection of Molecules with functionalities to be compatible with training and optimization.
- Parameters
molecules (List[malt.Molecule]) – A list of Molecules.
- featurize(molecules)
Featurize all molecules in the dataset.
Methods
__init__
([molecules])append
(molecule)Append a molecule to the dataset.
apply
(function)Apply a function to all molecules in the dataset.
batch
(*args, **kwargs)clone
()Return a copy of self.
Erase the metadata.
Featurize all molecules in dataset.
shuffle
([seed])Shuffle the dataset and return it.
split
(partition)Split the dataset according to some partition.
view
([collate_fn, by])Provide a data loader from portfolio.
Attributes
Returns the mapping between the SMILES and the molecule.
Return the list of SMILE strings in the datset.
- append(molecule)[source]
Append a molecule to the dataset.
Alias of append for molecules.
Note
This append in-place.
- Parameters
molecule (molecule) – The data molecule to be appended.
- apply(function)[source]
Apply a function to all molecules in the dataset.
- Parameters
function (Callable) – The function to be applied to all molecules in this dataset in place.
Examples
>>> molecule = Molecule("CC") >>> dataset = Dataset([molecule]) >>> from ..molecule import Molecule >>> fn = lambda molecule: Molecule( ... smiles=molecule.smiles, metadata={"name": "john"}, ... ) >>> dataset = dataset.apply(fn) >>> dataset[0]["name"] 'john'
- property lookup
Returns the mapping between the SMILES and the molecule.
- property smiles
Return the list of SMILE strings in the datset.
- split(partition)[source]
Split the dataset according to some partition.
- Parameters
partition (Sequence[Optional[int, float]]) – Splitting partition.
- Returns
List of datasets split according to the partition.
- Return type
List[Dataset]
Examples
>>> dataset = Dataset([Molecule("CC"), Molecule("C")]) >>> dataset0, dataset1 = dataset.split([1, 1]) >>> dataset0[0].smiles 'CC'
- view(collate_fn: Optional[Callable] = None, by: Union[Iterable, str] = ['g', 'y'], *args, **kwargs)[source]
Provide a data loader from portfolio.
- Parameters
collate_fn (Optional[Callable]) – The function to gather data molecules.
assay (Union[None, str]) – Batch data from molecules using key provided to filter metadata.
by (Union[Iterable, str])
- Returns
Resulting data loader.
- Return type
torch.utils.data.DataLoader