malt.data.dataset.Dataset
- class malt.data.dataset.Dataset(molecules: Optional[List] = None)[source]
 Bases:
torch.utils.data.dataset.DatasetA collection of Molecules with functionalities to be compatible with training and optimization.
- Parameters
 molecules (List[malt.Molecule]) – A list of Molecules.
- featurize(molecules)
 Featurize all molecules in the dataset.
Methods
__init__([molecules])append(molecule)Append a molecule to the dataset.
apply(function)Apply a function to all molecules in the dataset.
batch(*args, **kwargs)clone()Return a copy of self.
Erase the metadata.
Featurize all molecules in dataset.
shuffle([seed])Shuffle the dataset and return it.
split(partition)Split the dataset according to some partition.
view([collate_fn, by])Provide a data loader from portfolio.
Attributes
Returns the mapping between the SMILES and the molecule.
Return the list of SMILE strings in the datset.
- append(molecule)[source]
 Append a molecule to the dataset.
Alias of append for molecules.
Note
This append in-place.
- Parameters
 molecule (molecule) – The data molecule to be appended.
- apply(function)[source]
 Apply a function to all molecules in the dataset.
- Parameters
 function (Callable) – The function to be applied to all molecules in this dataset in place.
Examples
>>> molecule = Molecule("CC") >>> dataset = Dataset([molecule]) >>> from ..molecule import Molecule >>> fn = lambda molecule: Molecule( ... smiles=molecule.smiles, metadata={"name": "john"}, ... ) >>> dataset = dataset.apply(fn) >>> dataset[0]["name"] 'john'
- property lookup
 Returns the mapping between the SMILES and the molecule.
- property smiles
 Return the list of SMILE strings in the datset.
- split(partition)[source]
 Split the dataset according to some partition.
- Parameters
 partition (Sequence[Optional[int, float]]) – Splitting partition.
- Returns
 List of datasets split according to the partition.
- Return type
 List[Dataset]
Examples
>>> dataset = Dataset([Molecule("CC"), Molecule("C")]) >>> dataset0, dataset1 = dataset.split([1, 1]) >>> dataset0[0].smiles 'CC'
- view(collate_fn: Optional[Callable] = None, by: Union[Iterable, str] = ['g', 'y'], *args, **kwargs)[source]
 Provide a data loader from portfolio.
- Parameters
 collate_fn (Optional[Callable]) – The function to gather data molecules.
assay (Union[None, str]) – Batch data from molecules using key provided to filter metadata.
by (Union[Iterable, str])
- Returns
 Resulting data loader.
- Return type
 torch.utils.data.DataLoader