malt.data.dataset.Dataset

class malt.data.dataset.Dataset(molecules: Optional[List] = None)[source]

Bases: torch.utils.data.dataset.Dataset

A collection of Molecules with functionalities to be compatible with training and optimization.

Parameters: molecules (List[malt.Molecule]) – A list of Molecules.

featurize(molecules): Featurize all molecules in the dataset.

view()[source]: Generate a torch.utils.data.DataLoader from this Dataset.

__init__(molecules: Optional[List] = None) → None[source]

Methods

`__init__`([molecules])
`append`(molecule)	Append a molecule to the dataset.
`apply`(function)	Apply a function to all molecules in the dataset.
`batch`(args, *kwargs)
`clone`()	Return a copy of self.
`erase_annotation`()	Erase the metadata.
`featurize_all`()	Featurize all molecules in dataset.
`shuffle`([seed])	Shuffle the dataset and return it.
`split`(partition)	Split the dataset according to some partition.
`view`([collate_fn, by])	Provide a data loader from portfolio.

Attributes

`lookup`	Returns the mapping between the SMILES and the molecule.
`smiles`	Return the list of SMILE strings in the datset.

append(molecule)[source]

Append a molecule to the dataset.

Alias of append for molecules.

Note

This append in-place.

Parameters: molecule (molecule) – The data molecule to be appended.

apply(function)[source]

Apply a function to all molecules in the dataset.

Parameters: function (Callable) – The function to be applied to all molecules in this dataset in place.

Examples

>>> molecule = Molecule("CC")
>>> dataset = Dataset([molecule])
>>> from ..molecule import Molecule
>>> fn = lambda molecule: Molecule(
...     smiles=molecule.smiles, metadata={"name": "john"},
... )
>>> dataset = dataset.apply(fn)
>>> dataset[0]["name"]
'john'

clone()[source]: Return a copy of self.

erase_annotation()[source]: Erase the metadata.

featurize_all()[source]: Featurize all molecules in dataset.

property lookup: Returns the mapping between the SMILES and the molecule.

shuffle(seed=None)[source]: Shuffle the dataset and return it.

property smiles: Return the list of SMILE strings in the datset.

split(partition)[source]

Split the dataset according to some partition.

Parameters: partition (Sequence[Optional[int, float]]) – Splitting partition.
Returns: List of datasets split according to the partition.
Return type: List[Dataset]

Examples

>>> dataset = Dataset([Molecule("CC"), Molecule("C")])
>>> dataset0, dataset1 = dataset.split([1, 1])
>>> dataset0[0].smiles
'CC'

view(collate_fn: Optional[Callable] = None, by: Union[Iterable, str] = ['g', 'y'], *args, **kwargs)[source]

Provide a data loader from portfolio.

Parameters

collate_fn (Optional[Callable]) – The function to gather data molecules.
assay (Union[None, str]) – Batch data from molecules using key provided to filter metadata.
by (Union[Iterable, str])

Returns

Resulting data loader.

Return type

torch.utils.data.DataLoader