Usage of topic modeling method for high dimensional gene expression data analysis

dc.contributor.authorSenadheera, SPBM
dc.contributor.authorWeerasinghe, AR
dc.contributor.editorGanegoda, GU
dc.contributor.editorMahadewa, KT
dc.date.accessioned2022-11-10T03:10:32Z
dc.date.available2022-11-10T03:10:32Z
dc.date.issued2021-12
dc.description.abstractGene expression data analysis is a major area in biological system interpretation. Since, gene expression data have large numbers of variables, high dimensional clustering methods are required for analysis. The objectives of this study were to understand the effectiveness of different clustering methods in gene expression data analysis based on biological relatedness and study of the advantages and disadvantages of different clustering strategies in gene expression analysis. The data was obtained from the GSE19830 dataset and the brain tumor data (TCGA project). To test the hard clustering, hierarchical clustering and fuzzy clustering, the K-means algorithm, HClust and topic modeling were used respectively. Prior knowledge about the dataset was required to define the number of clusters (K). Initially, the GSE19830 (Brain, Lung, Liver tissue mixture) dataset was used for developing the clusters. All models clustered the observations similar to the physical tags in the dataset. Secondly, Clustering methods were developed with the brain tumor dataset consisting of 202 samples (four specified physically categorized tumors). According to hierarchical clustering and topic modeling, when analyzing similar tissues, gene expression tumor subtypes (clusters) were not aligned with physical categorization. Finally, 81 cancer genes were filtered and generated a topic model. In order to understand the biological relevance of the final model, Reactome and PCViz tools were used. Reactome results supported topics developed from topic modeling. According to the results, in high dimensional data analysis, topic modeling was found to be a promising approach for gene expression based clustering while K-means was found to be inappropriate for gene clustering.en_US
dc.identifier.citationS. P. B. M. Senadheera and A. R. Weerasinghe, "Usage of Topic Modeling Method for High Dimensional Gene Expression Data Analysis," 2021 6th International Conference on Information Technology Research (ICITR), 2021, pp. 1-6, doi: 10.1109/ICITR54349.2021.9657380.en_US
dc.identifier.conference6th International Conference in Information Technology Research 2021en_US
dc.identifier.departmentInformation Technology Research Unit, Faculty of Information Technology, University of Moratuwa.en_US
dc.identifier.doidoi: 10.1109/ICITR54349.2021.9657380en_US
dc.identifier.facultyITen_US
dc.identifier.placeMoratuwa, Sri Lankaen_US
dc.identifier.proceedingProceedings of the 6th International Conference in Information Technology Research 2021en_US
dc.identifier.urihttp://dl.lib.uom.lk/handle/123/19455
dc.identifier.year2021en_US
dc.language.isoenen_US
dc.publisherFaculty of Information Technology, University of Moratuwa.en_US
dc.relation.urihttps://ieeexplore.ieee.org/document/9657380en_US
dc.subjectTopic modelingen_US
dc.subjectClusteringen_US
dc.subjectGene expressionen_US
dc.titleUsage of topic modeling method for high dimensional gene expression data analysisen_US
dc.typeConference-Full-texten_US

Files

Collections