Everyone's voice is unique, this is not merely a conceit of artistic practitioners. Science is catching up to the amazing precision of human hearing in identifying and decoding speech, even as telephony, Internet streaming budget audio gear degrade our sound and hearing.
This article recounts an anecdote involving MP3, which is optimized in the spectral domain. AFAIK, there are no lossy algorithms that maintain phase information, which factors heavily in speech intelligibility and REAL spatial audio. (always use FLAC or other lossless CODEC!)
99.9% of "stereo", "surround", "spatial", and "Immersive" content is produced through panning, which is artificial learned delusions of spatial information. A few games and VR/AR experiments use Ambisonics as an input - which is a true 3D mathematical representation of a 3D sound field - but there is still no valid method of coupling that information to ears!