Spectral denoising: electronic denoising

The electronic_denoising function removes obvious electronic noise ions in MS/MS spectra, usually shown as multiple ions with identical intensities ( Grass noise )

According to empiracally tested on NIST23 database, in a given spectrum, the number of ions with identical intensities more than 4 is extremely unlikely (< 0.05%). Thus, the electronic_denoising function removes ions with identical intensities greater than 4.

Here’s an example:

import spectral_denoising as sd
from spectral_denoising.noise import *
peak = np.array([[69.071, 7.917962], [86.066, 1.021589], [86.0969, 100.0]], dtype=np.float32)
pmz = 91
noise = generate_noise(pmz, lamda=10, n = 50)
peak_with_noise = add_noise(peak, noise)

peak_denoised = sd.electronic_denoising(peak)
print(f'Entropy similarity of spectra with noise: {sd.entropy_similairty(peak_with_noise,peak, pmz ):.2f}.')
print(f'Entropy similarity of spectra with noise: {sd.entropy_similairty(peak_denoised,peak, pmz ):.2f}.')

The output will be:

Entropy similarity of spectra with noise: 0.37.
Entropy similarity of spectra with noise: 1.00.

References

spectral_denoising.electronic_denoising(msms)[source]

Perform electronic denoising on a given mass spectrometry (MS/MS) spectrum. This function processes the input MS/MS spectrum by sorting the peaks based on their intensity, and then iteratively selects and confirms peaks based on a specified intensity threshold. The confirmed peaks are then packed and sorted before being returned.

Parameters:

msms (np.ndarray): The first item is always m/z and the second item is intensity.

Returns:

np.ndarray: The cleaned spectrum with electronic noises removed. If no ion presents, will return np.nan.

spectral_denoising.noise.generate_noise(pmz, lamda, n=100)[source]

Generate synthetic electronic noise for spectral data.

Parameters:

pmz (float): The upper bound for the mass range.

lamda (float): The lambda parameter for the Poisson distribution, which serves as both mean and standard deviation of the distribution.

n (int, optional): The number of random noise ions to generate. Defaults to 100.

Returns:

np.array: A synthetic spectrum with electronic noise.

spectral_denoising.noise.generate_chemical_noise(pmz, lamda, polarity, formula_db, n=100)[source]

Generate chemical noise for a given mass-to-charge ratio (m/z) and other parameters. The m/z of the chemical noise is taken from a database of all true possible mass values. The detailes about this database can be found paper: LibGen: Generating High Quality Spectral Libraries of Natural Products for EAD-, UVPD-, and HCD-High Resolution Mass Spectrometers

Args:

pmz (float): The target mass-to-charge ratio (m/z) value.

lamda (float): The lambda parameter for the Poisson distribution used to generate intensities, which serves as both mean and standard deviation of the distribution.

polarity (str): The polarity of the adduct, either ‘+’ or ‘-‘.

formula_db (pandas.DataFrame): A DataFrame containing a column ‘mass’ with possible mass values.

n (int, optional): The number of noise peaks to generate. Default is 100.

Returns:

np.array: A synthetic spectrum with chemical noise.

Raises:

ValueError: If the polarity is not ‘+’ or ‘-‘.

spectral_denoising.noise.add_noise(msms, noise)[source]

Add noise to a mass spectrum and process the resulting spectrum. This function takes a mass spectrum and a noise spectrum, standardizes the mass spectrum, adds the noise to it, normalizes the resulting spectrum, and sorts it.

Args:

msms (np.ndarray): The mass spectrum to which noise will be added.

noise (np.ndarray): The noise spectrum to be added to the mass spectrum.

Returns:

np.ndarray: The processed mass spectrum after adding noise, normalization, and sorting.

Notes:
  • The noise spectrum is generated with intensity as ralatie measure (from 0-1)

  • Thus, the mass spectrum is standardized using the standardize_spectrum function.