{"id":15347,"date":"2025-02-11T09:58:23","date_gmt":"2025-02-11T00:58:23","guid":{"rendered":"https:\/\/www.emukk.com\/WP\/?p=15347"},"modified":"2025-02-12T08:54:55","modified_gmt":"2025-02-11T23:54:55","slug":"environment-construction-and-python-script-examples-for-performing-machine-learning-on-glycan-structures-cell-types-etc-using-deep-learning-using-lecchip-lectin-microarray-data","status":"publish","type":"post","link":"https:\/\/www.emukk.com\/WP\/en\/environment-construction-and-python-script-examples-for-performing-machine-learning-on-glycan-structures-cell-types-etc-using-deep-learning-using-lecchip-lectin-microarray-data\/","title":{"rendered":"Environment construction and Python script examples for performing machine learning on glycan structures, cell types, etc. using deep learning using LecChip (lectin microarray) data"},"content":{"rendered":"<p>In order to use LecChip (lectin microarray) data to identify the structure of glycan structures, cell types, etc., it is effective to use Deep learning on a large amount of data.<br \/>\nThe preparations required for this work are as follows.<br \/>\nPython\uff08Anaconda3 is used below\uff09<br \/>\nTensorflow<br \/>\nKeras<br \/>\nFirst of all, you need to install these software on your PC and create an environment.<\/p>\n<p>Our product &#8220;SA\/DL Easy&#8221; eliminates such the hassle preparation in advance and allows you to configure Deep learning networks with just a click of a mouse.<br \/>\nUsing SA\/DL Easy, you can easily enjoy the world for Deep learning without writing scripts like the one below.<br \/>\nWhen executing the following scripts on your PC, please make sure if the path of the Python script is saved, the path of the saved input data, the path of the folder where the learning results and test results are correctly specified.<br \/>\n&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;<br \/>\n#\u3000An example of Deep learning Python Script for identifing glycan structures, cell types, etc. using LecChip data.<\/p>\n<p>from __future__ import print_function<br \/>\nimport numpy as np<br \/>\nimport csv<br \/>\nimport pandas<br \/>\nfrom keras.datasets import mnist<br \/>\nfrom keras.models import Sequential<br \/>\nfrom keras.layers.core import Dense, Dropout, Activation<br \/>\nfrom keras.optimizers import RMSprop<br \/>\nfrom keras.utils import np_utils<br \/>\nfrom make_tensorboard import make_tensorboard<\/p>\n<p>np.random.seed(1671) # for reproducibility<\/p>\n<p># network and training<br \/>\nNB_EPOCH = 100 # how many times you want to let them learn.<br \/>\nBATCH_SIZE = 2\u3000\u3000# divide the dataset into several subsets<br \/>\nVERBOSE = 1<br \/>\nNB_CLASSES = 2 # final output number<br \/>\nOPTIMIZER = RMSprop() # optimizer<br \/>\nN_HIDDEN = 45\u3000\u3000# number of nodes in the hidden layer is set to 45 here according to the number of lectins used in LecChip.<br \/>\nVALIDATION_SPLIT = 0.2 # percentage of the training data used as test data<br \/>\nDROPOUT = 0.3<br \/>\nLECTINS = 45<\/p>\n<p>def drop(df):<br \/>\nreturn df[pandas.to_numeric(df.iloc[:, 2], errors=&#8217;coerce&#8217;).notnull()]<\/p>\n<p>#\u3000data is normalized so that the maximum value is 1.<br \/>\ndef normalize_column(d):<br \/>\ndmax = np.max(d)<br \/>\ndmin = np.min(d)<br \/>\nreturn (np.log10(d + 1.0) &#8211; np.log10(dmin + 1.0)) \/ \\<br \/>\n(np.log10(dmax + 1.0) &#8211; np.log10(dmin + 1.0))<\/p>\n<p>def normalize(data):<br \/>\nreturn np.apply_along_axis(normalize_column, 0, data)<\/p>\n<p># The input data should be in CSV file format<br \/>\ndf1 = drop(pandas.read_csv(r&#8217;c:\\Users\\Masao\\Anaconda3\\DL_scripts\\cell.csv&#8217;)).reset_index(drop=True)<br \/>\nX_train = normalize(df1.iloc[:, 2:].astype(np.float64))<br \/>\nfamily_column = df1.iloc[:, 1]<br \/>\nfamily_list = sorted(list(set(family_column)))<br \/>\nY_train = np.array([family_list.index(f) for f in family_column])<\/p>\n<p>df2 = drop(pandas.read_csv(r&#8217;c:\\Users\\Masao\\Anaconda3\\DL_scripts\\cell_test.csv&#8217;)).reset_index(drop=True)<br \/>\nX_test = normalize(df2.iloc[:, 2:].astype(np.float64))<br \/>\nfamilyt_column = df2.iloc[:, 1]<br \/>\nfamilyt_list = sorted(list(set(familyt_column)))<br \/>\nY_test = np.array([familyt_list.index(f) for f in familyt_column])<\/p>\n<p>print(X_train.shape[0], &#8216;train samples&#8217;)<br \/>\nprint(X_test.shape[0], &#8216;test samples&#8217;)<\/p>\n<p># convert class vectors to binary class matrices<br \/>\nY_train = np_utils.to_categorical(Y_train, NB_CLASSES)<br \/>\nY_test = np_utils.to_categorical(Y_test, NB_CLASSES)<\/p>\n<p>print(X_train)<br \/>\nprint(Y_train)<br \/>\nprint(X_test)<br \/>\nprint(Y_test)<\/p>\n<p># An example of neural network configuration<br \/>\n# 2 hidden layers<br \/>\n# Input is LecChip data (using 45 lectins)<br \/>\n# The final layer is activated with softmax<\/p>\n<p>model = Sequential()<br \/>\nmodel.add(Dense(N_HIDDEN, input_shape=(LECTINS,)))<br \/>\nmodel.add(Activation(&#8216;relu&#8217;))<br \/>\nmodel.add(Dropout(DROPOUT))<br \/>\nmodel.add(Dense(N_HIDDEN))<br \/>\nmodel.add(Activation(&#8216;relu&#8217;))<br \/>\nmodel.add(Dropout(DROPOUT))<br \/>\nmodel.add(Dense(NB_CLASSES))<br \/>\nmodel.add(Activation(&#8216;softmax&#8217;))<br \/>\nmodel.summary()<\/p>\n<p>#\u3000to visualize the learning and the test results with Tensorboard<br \/>\ncallbacks = [make_tensorboard(set_dir_name=&#8217;Glycan_Profile&#8217;)]<\/p>\n<p>model.compile(loss=&#8217;categorical_crossentropy&#8217;,<br \/>\noptimizer=OPTIMIZER,<br \/>\nmetrics=[&#8216;accuracy&#8217;])<\/p>\n<p>model.fit(X_train, Y_train,<br \/>\nbatch_size=BATCH_SIZE, epochs=NB_EPOCH,<br \/>\ncallbacks=callbacks,<br \/>\nverbose=VERBOSE, validation_split=VALIDATION_SPLIT)<\/p>\n<p>score = model.evaluate(X_test, Y_test, verbose=VERBOSE)<br \/>\nprint(&#8220;\\nTest score:&#8221;, score[0])<br \/>\nprint(&#8216;Test accuracy:&#8217;, score[1])<\/p>\n<p>&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;<br \/>\n# Python Script for using Tensorboard<\/p>\n<p># -*- coding: utf-8 -*-<br \/>\nfrom __future__ import absolute_import<br \/>\nfrom __future__ import unicode_literals<br \/>\nfrom time import gmtime, strftime<br \/>\nfrom keras.callbacks import TensorBoard<br \/>\nimport os<\/p>\n<p>def make_tensorboard(set_dir_name=&#8221;):<br \/>\nymdt = strftime(&#8220;%a_%d_%b_%Y_%H_%M_%S&#8221;, gmtime())<br \/>\ndirectory_name = ymdt<br \/>\nlog_dir = set_dir_name + &#8216;_&#8217; + directory_name<br \/>\nos.mkdir(log_dir)<br \/>\ntensorboard = TensorBoard(log_dir=log_dir, write_graph=True, )<br \/>\nreturn tensorboard<\/p>\n<p>&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;<br \/>\nTo visualize the learning and test results,<br \/>\nrun $ make_tensorboard.py,<br \/>\nrun $ tensorboard &#8211;logdir=.\/Glycan_Profile_Mon_10_Feb_2025_23_06_26 (.\/folder where data is recorded),<br \/>\nand access http:\/\/localhost:6006\/ with your browser.<\/p>\n<p>(base) PS C:\\Users\\masao\\Anaconda3\\DL_Scripts&gt; python make_tensorboard.py<br \/>\nUsing TensorFlow backend.<br \/>\n(base) PS C:\\Users\\masao\\Anaconda3\\DL_Scripts&gt; tensorboard &#8211;logdir=.\/Glycan_Profile_Mon_10_Feb_2025_23_06_26<br \/>\nServing TensorBoard on localhost; to expose to the network, use a proxy or pass &#8211;bind_all<br \/>\nTensorBoard 2.0.2 at http:\/\/localhost:6006\/ (Press CTRL+C to quit)<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-15344\" src=\"https:\/\/www.emukk.com\/WP\/wp-content\/uploads\/2025\/02\/tensorboard.jpg\" alt=\"\" width=\"762\" height=\"883\" srcset=\"https:\/\/www.emukk.com\/WP\/wp-content\/uploads\/2025\/02\/tensorboard.jpg 762w, https:\/\/www.emukk.com\/WP\/wp-content\/uploads\/2025\/02\/tensorboard-259x300.jpg 259w, https:\/\/www.emukk.com\/WP\/wp-content\/uploads\/2025\/02\/tensorboard-600x695.jpg 600w\" sizes=\"auto, (max-width: 762px) 100vw, 762px\" \/><\/p>\n<p>&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;-<br \/>\nLecChip data is in CSV format as shown below.<br \/>\nFrom the left, the sample name, family name (this will be the training data), and numerical values \u200b\u200bof various lectins are listed.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-15345\" src=\"https:\/\/www.emukk.com\/WP\/wp-content\/uploads\/2025\/02\/LecChip-csv.jpg\" alt=\"\" width=\"934\" height=\"499\" srcset=\"https:\/\/www.emukk.com\/WP\/wp-content\/uploads\/2025\/02\/LecChip-csv.jpg 934w, https:\/\/www.emukk.com\/WP\/wp-content\/uploads\/2025\/02\/LecChip-csv-300x160.jpg 300w, https:\/\/www.emukk.com\/WP\/wp-content\/uploads\/2025\/02\/LecChip-csv-768x410.jpg 768w, https:\/\/www.emukk.com\/WP\/wp-content\/uploads\/2025\/02\/LecChip-csv-600x321.jpg 600w\" sizes=\"auto, (max-width: 934px) 100vw, 934px\" \/><\/p>\n<p>&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;-<br \/>\nAfter the training, you should save the model.<br \/>\nIf the model was saved, it could be restored, and unknown data can be given to make predictions.<br \/>\nThose scripts will be uploaded separately for your information.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>In order to use LecChip (lectin microarray) data to identify the structure of glycan structures, cell types, e<\/p><\/div>\n<div class=\"blog-btn\"><a href=\"https:\/\/www.emukk.com\/WP\/en\/environment-construction-and-python-script-examples-for-performing-machine-learning-on-glycan-structures-cell-types-etc-using-deep-learning-using-lecchip-lectin-microarray-data\/\" class=\"home-blog-btn\">\u7d9a\u304d\u3092\u8aad\u3080<\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[7],"tags":[],"class_list":["post-15347","post","type-post","status-publish","format-standard","hentry","category-technology-en"],"aioseo_notices":[],"_links":{"self":[{"href":"https:\/\/www.emukk.com\/WP\/wp-json\/wp\/v2\/posts\/15347","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.emukk.com\/WP\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.emukk.com\/WP\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.emukk.com\/WP\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.emukk.com\/WP\/wp-json\/wp\/v2\/comments?post=15347"}],"version-history":[{"count":5,"href":"https:\/\/www.emukk.com\/WP\/wp-json\/wp\/v2\/posts\/15347\/revisions"}],"predecessor-version":[{"id":15362,"href":"https:\/\/www.emukk.com\/WP\/wp-json\/wp\/v2\/posts\/15347\/revisions\/15362"}],"wp:attachment":[{"href":"https:\/\/www.emukk.com\/WP\/wp-json\/wp\/v2\/media?parent=15347"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.emukk.com\/WP\/wp-json\/wp\/v2\/categories?post=15347"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.emukk.com\/WP\/wp-json\/wp\/v2\/tags?post=15347"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}