Skip to content

Feature/remove duplicates - Identify and handle duplicate atoms

This is Work-In-Progress and needs some discussion.

When reading experimental crystal structures (for example .cifs from the Cambridge Structural Database) often (H)-atoms are defined multiple times, due to uncertainties in the experimental Xray data. The ase.io.cif module will read all atoms, leading to unphysical structures with overlapping atoms.

The proposed method (get_duplicate_atoms) allows efficient identification of duplicate atoms within an Atoms-object using a cutoff radius. In addition, it is possible to directly delete one set of duplicates. This can be useful not only for the .cif issue, but also for other cases when one needs to test if atoms are too close within a system.

[Issue 1] - I added this method to ase.atoms since I felt it would be a natural addition to the existing atoms-methods. But of course it could be added as some routine that just takes atoms objects as inputs.

[Issue 2] - scipy.spatial.distance.pdist: I know there is a neighbourlist calculator, but I'm not sure if I understand why it is implemented in pure Python - it is quite slow compared to alternatives? Unfortunately I couldn't find some kind of core guidelines on what imports are acceptable for which parts of ASE.

Merge request reports