Master’s Thesis: Machine-Learned Compression of Gaussian Basis Sets

My master’s thesis explored whether machine learning can compress Gaussian basis sets for ab initio density-functional theory (DFT) calculations without sacrificing accuracy.

Introduced a projection-based loss that aligns a compact, optimisable basis to a larger reference set, enabling joint optimisation of exponents and contractions with gradient-based methods via automatic differentiation.
Evaluated the learned bases across diverse small molecules, noting consistent energy-error reductions for minimal and split-valence sets (especially STO-nG), while gains over modern polarised/augmented references remained modest.
Analysed trends with molecule size and basis-family choice, compared optimisers and learning schedules, and catalogued failure modes such as overfitting, divergence, and missing polarisation/diffuse character.
Discussed how the approach points toward data-driven, atom-centric basis design and integration into mixed-basis workflows for larger systems.

The study highlights both the promise and the current limitations of ML-driven basis reduction, motivating richer datasets and hybrid strategies.