HELP

The LIGYSIS web server is a way to access, visualise, and download the results of our pipeline for the analysis of ligand binding sites, LIGYSIS.

Exploring the LIGYSIS dataset

The LIGYSIS dataset is comprised by 64,782 ligand binding sites defined from 435,038 biologically relevant ligands binding to 25,003 proteins across 104,456 structures deposited on the PDBe. To explore the dataset, users can search for a protein of interest by its UniProt Accession Identifier, or protein name.

Submitting a job to the LIGYSIS web server

Additionally, users can submit their own set of structure for analysis to the LIGYSIS Web server. The input to do so is the following:

UniProt Accession: the UniProt Accession Identifier of the protein the structures map to. This is necessary if the protein is present in UniProt. If the structures map to a protein not present in UniProt, this field can be left empty.
Structure files: the set of structure files to be analysed. LIGYSIS is compatible with either be in mmCIF (.cif) or PDB (.ent, .pdb) format. However, all files within the set of structures must present the same format. These structures must represent the same protein unit, i.e., a dimer, or monomer, bound to the ligands of interest. This is required for the correct superposition of the structures. There is currently no limit in the number of structures that can be submitted, however, the combined size of all input files cannot exceed 50MB. If you need to submit a larger set of structures, please contact us.

LIGYSIS only supports single protein-ligand complexes, i.e., homomultimers are accepted, but not heteromultimers. Files can be selected from the file browser or dragged and dropped into the designated area.

Results Page

This is the LIGYSIS results page. At the very top, the UniProt accession, entry and protein names can be found, as well as the number of chains, ligands and sites for this LIGYSIS entry. Below, the three main panels: Binding Sites (left), Structure (centre), and Binding Site Residues (right).

The Structure Panel (centre)

This is the main panel. It includes a 3Dmol.js structure viewer and the structure control buttons. There are 13 buttons.

SLICE: clicking on this button displays/hides the controls (near and far) to take slices of the current view between the two planes.
SURF: clicking on this button switches the visibility of the protein surfaces.
LABEL: clicking on this button switches the visibility of residue labels of a clicked binding site, displayed protein-ligand contacts or hovered residue.
LIGAND: clicking on this button switches the visibility of ligands (small molecule).
HOH: clicking on this button switches the visibility of water molecules.
CONTACT: clicking on this button switches the visibility of protein-ligand interactions. Ligands and interacting sidechains will be displayed as sticks, and dashed cylinders will connect them. The cylinders have hover-over information.
Download Current Assembly Contacts: clicking on this button will download a .csv file with the contacts of the current assembly as calculated by pdbe-arpeggio. This option is unavailable when viewing the superposition.
Save Image: click here to save the current state of the viewer to a .png image.
Spin: click here to save to start/stop a spin animation on the Y axis.
Download ALL Assemblies Contacts: clicking on this button will download a .zip file containing a .csv file with the contacts of each independent assembly as calculated by pdbe-arpeggio.
Download Current Assembly: clicking on this button will download a .zip file containing the necessary files to visualise the currently visualised assembly in ChimeraX (.cxc) or PyMol (.pml).
Download Superposition: clicking on this button will download a .zip file containing the necessary files to visualise the ligand superposition in either ChimeraX or PyMol.
Download ALL Assemblies: clicking on this button will download a .zip file containing a subdirectory for each assembly on this entry with the necessary files to visualise each assembly in ChimeraX or PyMol.

The Binding Site Panel (left)

This panel includes an interactive scatter plot generated with Chart.js, in which each data point represents a defined ligand binding site on the protein. Three utilities can be found below:

X-axis: this dropdown menu selects the variable to be plotted on the X-axis.
Y-axis: this dropdown menu selects the variable to be plotted on the Y-axis.
Save Image: clicking here saves the current graph to a .png image.

In addition, this panel includes a dynamic table showing the data underlying the graph. Each table row corresponds to a data point in the plot, i.e., to a ligand binding site on the protein. The columns are:

ID: the unique identifier of the binding site. This ID is arbitrarily assigned by the pipeline, i.e., lower/higher numbers do not have any meaning.
RSA: the relative solvent accessibility of the binding site. This is a measure of how exposed the binding site is to the solvent. It is calculated as the average of the relative solvent accessibility of the residues forming the binding site.
DS: the divergence score of the binding site. This is a measure of how different in sequence is the binding site across homologues in a multiple sequence alignment (MSA). It is calculated as the average of the divergence score of the residues forming the binding site. This is a normalised version of the Shenkin divergence score.
MES: the missense enrichment score of the binding site. This is a measure of how constrained the binding site is across human homologues. It is calculated as the average of the missense enrichment score of the residues forming the binding site. This is an odds ratio (OR) calculated using the gnomAD database. For more details see MacGowan et al., 2017, Utgés et al., 2021, and MacGowan et al., 2024.
Size: the size of the binding site in amino acids (aa), i.e., how many amino acids form the binding site.
Cluster: the RSA cluster label predicted for the binding site. This label is obtained using an artificial neural network (ANN), specifically a multilayer perceptron (MLP). The label is predicted from the solvent accessibility profile of the site and is indicative of the likely functional state of the binding site (see Utgés et al., 2024). Cluster 1 is the most likely to be functional, while Cluster 4 is the least likely.
FS: the functional score of the binding site. The score ranges from 0.04-0.52. The higher the score, the more likely the binding site is to be functional. It is calculated based on the following formula:

`FS_(i) = sum_(j=1)^4 p_(i_j) f_(j)`

The functional score of binding site `i`, `FS_(i)`, is defined by `p_(i_j)`, which is the probability of binding site `i` belonging to cluster `j`. `p_(i_j)` is given by the outcome vector resulting from the MLP: `P_i`

`P_i = [p_(i_(C_(1))), p_(i_(C_(2))), p_(i_(C_(3))), p_(i_(C_(4)))]`

Where `p_(i_(C_(1)))` is the probability of binding site `i` to belong to cluster `C_1`, and so on.

Finally, `f_(j)` is the functional score of cluster `j`.The functional score of each cluster is calculated as the proportion of known functional sites, as annotated in UniProt, within that cluster.

The functional scores of the clusters are given by `F = [0.52, 0.18, 0.05, 0.04]` for `C_1`, `C_2`, `C_3`, and `C_4`, respectively. 13,000 human protein-ligand binding sites were used to calculate these proportions of functional sites within each cluster.

At the moment, this score is purely based on the RSA Cluster label. Future versions will also include evolutionary divergence, as well as missense variation information.

The Binding Site Residues Panel (right)

This panel also includes an interactive scatter plot generated with Chart.js, but now each data point represents a binding site residue. Below this, three utilities can be found:

X-axis: this dropdown menu allows the user to select the variable to be plotted on the X-axis.
Y-axis: this dropdown menu allows the user to select the variable to be plotted on the Y-axis.
Save Image: click here to save the current graph to a .png image.

In addition, this panel includes a dynamic table showing the data underlying the graph. Each table row corresponds to a data point in the plot, i.e., to a residue forming a binding site on the protein. The columns are:

UPResNum: the residue number identifier matching the UniProt sequence, or UniProt Residue Number.
MSACol: the multiple sequence alignment (MSA) column index where a residue aligns to.
DS: the divergence score of the alignment column where a residue is aligned. The score used is a normalised version of the Shenkin divergence score.
MES: the missense enrichment score of the alignment column where a residue aligns to. This is an odds ratio (OR) calculated using the gnomAD database. For more details see MacGowan et al., 2017, Utgés et al., 2021, and MacGowan et al., 2024.
p: the p-value of the MES (OR), obtained using Fisher's exact test.
AA: the amino acid one letter code of a given residue.
RSA: the relative solvent accessibility of a given residue. This is a normalised version of the accessible surface area (ASA) calculated by DSSP.
SS: the secondary structure element of a given residue as calculated by DSSP.

This table can be downloaded as a .csv file by clicking on the Download Table button. Information about the columns can be found on hovering on the column names, or in more detail by clicking on the Column Information button, which leads to this Help page. Additionally, the multiple sequence alignment (MSA) of the protein can be downloaded as a .sto file by clicking on the Download MSA button.

Code availability

The code for the LIGYSIS pipeline can be found here.

The code for the LIGYSIS web server can be found here.

The code for the LIGYSIS customised pipeline for user jobs can be found here.

Citing LIGYSIS

If you use LIGYSIS web server or the LIGYSIS dataset, please cite the following references:

Utgés et al. LIGYSIS-web: a resource for the analysis of protein-ligand binding sites. Nucleic Acids Res. gkaf411 (2025). DOI: 10.1093/nar/gkaf411

Utgés, JS & Barton, GJ. Comparative evaluation of methods for the prediction of protein–ligand binding sites. J. Cheminform. 16, 126 (2024). DOI: 10.1186/s13321-024-00923-z

Utgés et al. Classification of likely functional class for ligand binding sites identified from fragment screening. Commun. Biol. 7, 320 (2024). DOI: 10.1038/s42003-024-05970-8

This website is free, open to all, including commercial, users and has no login requirement

License