blog

Environment construction and Python script examples for performing machine learning on glycan structures, cell types, etc. using deep learning using LecChip (lectin microarray) data

In order to use LecChip (lectin microarray) data to identify the structure of glycan structures, cell types, etc., it is effective to use Deep learning on a large amount of data.
The preparations required for this work are as follows.
Python(Anaconda3 is used below)
Tensorflow
Keras
First of all, you need to install these software on your PC and create an environment.

Our product “SA/DL Easy” eliminates such the hassle preparation in advance and allows you to configure Deep learning networks with just a click of a mouse.
Using SA/DL Easy, you can easily enjoy the world for Deep learning without writing scripts like the one below.
When executing the following scripts on your PC, please make sure if the path of the Python script is saved, the path of the saved input data, the path of the folder where the learning results and test results are correctly specified.
———————————————————————————————
# An example of Deep learning Python Script for identifing glycan structures, cell types, etc. using LecChip data.

from __future__ import print_function
import numpy as np
import csv
import pandas
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers.core import Dense, Dropout, Activation
from keras.optimizers import RMSprop
from keras.utils import np_utils
from make_tensorboard import make_tensorboard

np.random.seed(1671) # for reproducibility

# network and training
NB_EPOCH = 100 # how many times you want to let them learn.
BATCH_SIZE = 2  # divide the dataset into several subsets
VERBOSE = 1
NB_CLASSES = 2 # final output number
OPTIMIZER = RMSprop() # optimizer
N_HIDDEN = 45  # number of nodes in the hidden layer is set to 45 here according to the number of lectins used in LecChip.
VALIDATION_SPLIT = 0.2 # percentage of the training data used as test data
DROPOUT = 0.3
LECTINS = 45

def drop(df):
return df[pandas.to_numeric(df.iloc[:, 2], errors=’coerce’).notnull()]

# data is normalized so that the maximum value is 1.
def normalize_column(d):
dmax = np.max(d)
dmin = np.min(d)
return (np.log10(d + 1.0) – np.log10(dmin + 1.0)) / \
(np.log10(dmax + 1.0) – np.log10(dmin + 1.0))

def normalize(data):
return np.apply_along_axis(normalize_column, 0, data)

# The input data should be in CSV file format
df1 = drop(pandas.read_csv(r’c:\Users\Masao\Anaconda3\DL_scripts\cell.csv’)).reset_index(drop=True)
X_train = normalize(df1.iloc[:, 2:].astype(np.float64))
family_column = df1.iloc[:, 1]
family_list = sorted(list(set(family_column)))
Y_train = np.array([family_list.index(f) for f in family_column])

df2 = drop(pandas.read_csv(r’c:\Users\Masao\Anaconda3\DL_scripts\cell_test.csv’)).reset_index(drop=True)
X_test = normalize(df2.iloc[:, 2:].astype(np.float64))
familyt_column = df2.iloc[:, 1]
familyt_list = sorted(list(set(familyt_column)))
Y_test = np.array([familyt_list.index(f) for f in familyt_column])

print(X_train.shape[0], ‘train samples’)
print(X_test.shape[0], ‘test samples’)

# convert class vectors to binary class matrices
Y_train = np_utils.to_categorical(Y_train, NB_CLASSES)
Y_test = np_utils.to_categorical(Y_test, NB_CLASSES)

print(X_train)
print(Y_train)
print(X_test)
print(Y_test)

# An example of neural network configuration
# 2 hidden layers
# Input is LecChip data (using 45 lectins)
# The final layer is activated with softmax

model = Sequential()
model.add(Dense(N_HIDDEN, input_shape=(LECTINS,)))
model.add(Activation(‘relu’))
model.add(Dropout(DROPOUT))
model.add(Dense(N_HIDDEN))
model.add(Activation(‘relu’))
model.add(Dropout(DROPOUT))
model.add(Dense(NB_CLASSES))
model.add(Activation(‘softmax’))
model.summary()

# to visualize the learning and the test results with Tensorboard
callbacks = [make_tensorboard(set_dir_name=’Glycan_Profile’)]

model.compile(loss=’categorical_crossentropy’,
optimizer=OPTIMIZER,
metrics=[‘accuracy’])

model.fit(X_train, Y_train,
batch_size=BATCH_SIZE, epochs=NB_EPOCH,
callbacks=callbacks,
verbose=VERBOSE, validation_split=VALIDATION_SPLIT)

score = model.evaluate(X_test, Y_test, verbose=VERBOSE)
print(“\nTest score:”, score[0])
print(‘Test accuracy:’, score[1])

————————————————————————————
# Python Script for using Tensorboard

# -*- coding: utf-8 -*-
from __future__ import absolute_import
from __future__ import unicode_literals
from time import gmtime, strftime
from keras.callbacks import TensorBoard
import os

def make_tensorboard(set_dir_name=”):
ymdt = strftime(“%a_%d_%b_%Y_%H_%M_%S”, gmtime())
directory_name = ymdt
log_dir = set_dir_name + ‘_’ + directory_name
os.mkdir(log_dir)
tensorboard = TensorBoard(log_dir=log_dir, write_graph=True, )
return tensorboard

————————————————————————————
To visualize the learning and test results,
run $ make_tensorboard.py,
run $ tensorboard –logdir=./Glycan_Profile_Mon_10_Feb_2025_23_06_26 (./folder where data is recorded),
and access http://localhost:6006/ with your browser.

(base) PS C:\Users\masao\Anaconda3\DL_Scripts> python make_tensorboard.py
Using TensorFlow backend.
(base) PS C:\Users\masao\Anaconda3\DL_Scripts> tensorboard –logdir=./Glycan_Profile_Mon_10_Feb_2025_23_06_26
Serving TensorBoard on localhost; to expose to the network, use a proxy or pass –bind_all
TensorBoard 2.0.2 at http://localhost:6006/ (Press CTRL+C to quit)

—————————————————————————————————-
LecChip data is in CSV format as shown below.
From the left, the sample name, family name (this will be the training data), and numerical values ​​of various lectins are listed.

—————————————————————————————————-
After the training, you should save the model.
If the model was saved, it could be restored, and unknown data can be given to make predictions.
Those scripts will be uploaded separately for your information.

Vision for the Evolution of AI

Sam Altman of OpenAI gave a talk at the University of Tokyo on the “Vision for the Evolution of AI”.

The following is a quote from what was said in the talk:
When a student asked, “How do you think society will change in the next 10, 30, or 100 years?”, Altman replied, “In 10 years, AI will accelerate advances in science and technology, in 30 years, all aspects of society will be integrated with AI and will evolve together, and as for what the future will look like 100 years from now, it’s hard to imagine at this point, but human life will be completely different than it is now.”

It is certain that advances in AI will have positive impacts on human life. However, this will remain true as long as AI will be just a human tool. This is very similar to the situation where someone asks someone else to do something that they think is troublesome or time-consuming for him/her. It’s just like that the other person here has been replaced by AI. By having someone do the troublesome and time-consuming tasks for you, you can focus on more productive works. As AI’s inference capabilities improve further, unseen solutions that humans have never thought of may be discovered, and thereby science and technology will advance at an accelerated speed.

The problem is 100 years from now. If AI’s abilities exceed those of humans, we will probably lose the will to fight intelligently, just as ordinary people would when they encountered a genius they couldn’t compete with. There is a strong possibility that this feeling of inferiority, that you cannot win even if you think about it, will deprive you of motivation, and that humans will simply become animals that are used by AI.
In that era, people were liberated from production activities, and intellectual activities, which were once a source of income, became worthless in the face of an AI that could neither fight nor win, and became nothing more than just an existence for fun?

The only way to overcome this problem that we will encounter at this point is for humans to evolve in the same way as AI and acquire abilities that are equal to or better than AI. It will be an era where AI and the human brain will merge. At that time, if this creature can be called a human? I’m not even sure that.

Galectins are not just β-galacside binding lectins, but have multifaceted functions

A group from Axe of Infectious and Immune Diseases, CHU de Quebec-Université Laval Research Centre, Faculty of Medicine, and Research Centre for Infectious Diseases, Laval University, Quebec, Canada etc. has reviewed about the multifaceted roles of galectins in self-defense.
https://www.sciencedirect.com/science/article/pii/S1044532324000642?via%3Dihub#bib63

Unlike other lectins, galectins lack signal peptides, so they are synthesized as soluble, non-glycosylated proteins in the cytosol, where glycan ligands are absent. Further, certain galectins, such as galectin-1 and galectin-3, have been found to translocate to the nucleus under specific conditions. This presents a paradox: galectins’ primary location is in the cytosol, an environment devoid of the glycans they bind.

Interestingly, however, galectins are also capable of reaching the extracellular space via a non-classical, leaderless secretory pathway.

This unique distribution of galectins both inside and outside the cell underscores their versatility, suggesting a layered regulatory mechanism that allows galectin function in host defense to be modulated both by their synthesis and by the spatial control of their access to ligands, extending their evolutionary role beyond traditional glycan recognition.

This must be one of the good reviews about Galectins.

A new lectin ALA might be effective on CCA treatments

A group from Department of Biochemistry, Faculty of Medicine, Khon Kaen University, Thailand has reported about a new lectin ALA extracted from the seeds of Artocarpus lakoocha.
https://www.nature.com/articles/s41598-024-84444-7

ALA exhibits agglutinin activity and has binding specificity to T- and Tn-associated glycoproteins and monosaccharides such as Gal and GalNAc.

It was confirmed that glycans identified by ALA were elevated in human Cholangiocarcinoma (CCA) tissues.
ALA significantly reduced cell viability of CCA cells, KKU-100 and KKU-213B, in a dose-dependent manner (up to 30 µg/mL) with approximately a 30% decrease observed at the highest concentration. And also, ALA significantly reduced the migration and invasion ability of KKU-100 and KKU-213B cells in a dose-dependent manner with 1–2 µg/mL which did not affect cell viability.

These results suggest their therapeutic potential effects on CCA treatments.

Glycan binding specificity of marine lectins

A group from School of Medicine and Life Sciences, Far Eastern Federal University, Vladivostok, Russia has summarized about marine lectins and applicationn to human brain tumores.
https://pmc.ncbi.nlm.nih.gov/articles/PMC11679326/

HOL-18 from the marine sponge Halichondria okadai binds to complex N-glycans
OXYL from marine star Anneissia japonica binds to LacNAc type 2 but does not bind to LacNAc type 1
AVL from marine sponge Aphrocallistes vastus binds to sialylated-mucin glycans
ESA from the seaweed Eucheuma serra binds to high mannose N-glycans
UPL1 from the seaweed Ulva pertusa binds to GlcNAc and high-mannose glycans
BPL2 from the seaweed Bryopsis plumosa binds to trimannosyl core
KSL from red alga Kappaphycus striatus binds to high mannose N-glycans
DIFBL from the sea bass Dicentrarchus labrax binds to fucose
APL from starfish Asterina pectinifera binds to Tn antigen
CGL from the bivalve Crenomytilus grayanus binds to GalNAc/Gal and recognizes Gb3
MytiLec from the Mediterranean mussel also binds to Gb3
HCL from the marine sponge Haliclona cratera binds to binds to GalNAc/Gal
DTL from the ascidian Didemnum ternatanum binds to GlcNAc

A new glycobiomarker for discriminating Psoriatic Arthritis (PsA) and Rheumatoid Arthritis (RA)

A group from Division of Laboratory Diagnostics, Department of Laboratory Diagnostics, Faculty of Pharmacy, Wroclaw Medical University, Poland has reprted about a new glycobiomarker, change in glycosylation pattern of serum clusterin, for discriminating Psoriatic Arthritis (PsA) and Rheumatoid Arthritis (RA).
https://www.mdpi.com/1422-0067/25/23/13060

PsA and RA are connective tissue autoimmune diseases.
The present study aimed to check whether serum clusterin (CLU) concentration and its glycosylation pattern may be markers differentiating these diseases.

The followings were found.
Clusterin concentrations were significantly lower in the sera of the RA patients compared to the PsA group, and there were no other significant differences between the examined groups in CLU concentration.

The relative reactivities of CLU glycans with SNA (α2-6 Sia binding lectin) were significantly higher in the RA and PsA patients in comparison to the control group. There were no significant differences between the studied groups in the relative reactivities of CLU glycans with MAA (α2-3 Sia binding lectin).

These results indicate that PsA and RA can be distinguished by CLU concentration and sialic acid modification (by SNA).

β1-4 galactosylated glycan could inhibit SARS-CoV-2 infection

A group from Laboratory for Functional Glycomics, College of Life Sciences, Northwest University, Xi’an 710069, China has reported that β1-4 galactosylated glycan could inhibit SARS-CoV-2 infection.
https://www.sciencedirect.com/science/article/pii/S2090123224005666?via%3Dihub

It has been known that SARS-CoV-2-S protein has 22 potential N-glycosites and 17 O-glycosites, with 14 N-glycosites adorned with complex-type N-glycans, and ACE2 has a total of 7 N-glycosites, and most of these sites are occupied by complex-type N-glycans.

In this paper, it was demonstrated that the β1-4 galactosylated N-glycans of ACE2 play a crucial role as glycan receptors for the binding of S1 of SARS-CoV-2, and isolated glycoproteins harboring multivalent β1-4 galactosylated N-glycans exhibited the ability to competitively inhibit the interaction between S1 and ACE2, thereby preventing the attachment and entry of SARS-CoV-2 pseudovirus into host cells. This may be a rather late article, but let me introduce it to you.

Targeting Tn-antigen suppresses metastasis in breast Cancer

A group from Department of Gynecology and Obstetrics, Beijing Chao-Yang Hospital, Capital Medical University, Beijing, China, has reported that targeting Tn-antigen suppresses metastasis in breast Cancer.
https://onlinelibrary.wiley.com/doi/10.1111/jcmm.70279

Tn antigen was prevalent in breast carcinomas, particularly within metastatic lesions. Tn antigen expression was positively correlated with lymph node metastasis and poorer patient survival. Tn antigen-expressing breast cancer cells exhibited enhanced invasiveness and metastasis, along with significant activation of EMT and FAK signaling pathways.

There was a significant downregulation of E-cadherin and ZO-1, both of which are canonical epithelial markers, along with a significant up-regulation of the mesenchymal markers, including ZEB-1, Vimentin, Snail, and Slug in both cells expressing Tn antigen

Targeting Tn-positive cancer cells with HPA demonstrated the suppression of invasive and metastatic capabilities. It is known that the lectin HPA specifically recognizes and binds the Tn antigen, whereas the lectin PNA only recognizes and binds T antigen. Compared to the PNA-treated control group, mice in HPA-treated group exhibited a significantly reduced number of pulmonary metastases. In addition, immunofluorescence analysis showed that HPA treatment reduced formation of cellular protrusions of Tn-positive cancer cells, whereas PNA showed no inhibitory effects. At the molecular level, the EMT and FAK signaling pathways were consistently inhibited in Tn-positive cancer cells treated with HPA.

A new glycan marker for depressive disorder

A group from Division of Neuropsychiatry, Department of Neuroscience, Yamaguchi University Graduate School of Medicine, Yamaguchi, Japan has reported about a new glycan marker for predicting depressive disorder.
https://www.nature.com/articles/s41598-024-80507-x

It was found that plasma extracellular vesicles (EVs) containing WGA-binding von Willebrand factor (vWF) (WGA-vWF) could be a diagnostic marker for the diagnosis of patients with major depressive disorder (MDD) in a depressive state regardless of gender and age.

WGA-vWF expression was significantly lower in plasma EVs of patients with MDD in a depressive state than those of healthy control participants (HCs). ROC analysis indicated that the AUC value for the diagnosis was 0.92 (95% CI 0.82–1.00) between patients with MDD and HCs. Furthermore, WGA-vWF expression remarkably increased from depressive to remission processes. With using this result, it was possible to distinguish between patients with MDD in depressive and remission states (AUC of 0.98, 95% CI 0.93–1.00).

Bisecting GlcNAc could be an early Alzheimer’s disease biomarker

A group from Division of Neurogeriatrics, Department of Neurobiology, Care Sciences and Society, Center for Alzheimer Research, Karolinska Institutet, Sweden has reported about a new glyca marker, bisecting GlcNAc, which is able to predict cognitive decline in amyloid- and tau-negative patients.
https://academic.oup.com/braincomms/article/6/6/fcae371/7826117?login=false

This is a reprt about an early biomarker for Alzheimer’s disease.

In Alzheimer’s disease, it has been known that an increased accumulation and aggregation of amyloid β-peptide (Aβ) causes amyloid formation in the brain followed by phosphorylation of tau and neurodegeneration and cognitive decline. In this study, it was found that Bisecting GlcNAc could be an early Alzheimer’s disease biomarker, which is able to predict cognitive decline already at an amyloid-/tau-negative stage.

Powered by WordPress |Copyright © 2020 Emukk. All rights reserved