Data Science Research Centre

"Data-Information-Knowledge-Application"

F.A.I.R. Guidelines for Machine Learning

How to do reproducible science for A.I.

An ARRE Event: Aid to Research Related Events, Exhibition, Publication and Dissemination Activities Program

Funded by Vice-President, Research and Graduate Studies, & Gina Cody School of Engineering and Computer Science, Concordia University

Organizers: Gregory Butler and Tristan Glatard

Research Student Posters and Mini-Presentations

Sunday 24 March 2019

Time 13:00 to 17:00

SGW Campus (Downtown): EV 11.119

1515 Ste Catherine St, West

Mini-Presentations

14:30 BENIN: Biologically Enhanced Network INference for Transcriptional Regulatory Networks, Stephanie Kamgnia

Abstract BENIN is a general framework that permits jointly considering different types of prior knowledge with expression datasets to boost the network inference. The method states the network inference as a feature selection problem. To solve it, BENIN uses a popular penalized regression method, the Elastic net, combined with bootstrap resampling. Using several datasets ranging from simulated data with the DREAM 4 benchmark, to experimental data with the yeast cell cycle dataset, we show that, when combined with genome-wide location data and knockout gene expression data, BENIN significantly outperforms the-state-of-the-art.

15:30 Predicting transmembrane transport proteins, Munira Alballa

Abstract Membrane proteins, which include transporters, receptors, enzymes, and others, are among the least characterized proteins, owing to their hydrophobic surfaces and their lack of conformational stability. This research aims to build a proteome-wide system that can determine the transporter substrate specificity. This involves distinguishing membrane proteins, differentiating transporters from other functional types of membrane proteins and detecting the substrate specificity of the transporters.

To distinguish membrane from non-membrane proteins, we evaluated the performance of various feature extraction techniques in combination with different learning algorithms. Experimental results show that incorporating evolution information consistently performs better than using traditional amino acid compositions. The highest prediction outcome was achieved by an ensemble classifier that fuses the results of OET-KNN (Optimized Evidence-Theoretic K-Nearest Neighbor) classifiers where protein samples are represented by Pseudo Position-Specific Score Matrix (Pse-PSSM) vectors. We also found that incorporating transmembrane topology prediction tools can further boost the overall accuracy by 2.17%.

Posters

Suggested poster size is 36 inches by 48 inches.

P01 CNN approach for real time road traffic detection and prediction of traffic flow with Random forest and M5base regression trees, Mohsen Amoei and Anjali Awasthi

Abstract Effective strategies to improve traffic congestion situation require profound understanding about their features and relationship. In the age of information these objectives could be efficiently realized through Big Data applications. Traffic congestion can be viewed as a product of the interaction between demand and capacity. Periodic high-demand at specific bottlenecks during peak hours can result in recurrent congestion while anomaly like incidents especially crashes reducing roadway capacity temporarily lead to non-recurrent congestion. To catch this dynamic process, Big Data generated from different sources could be leveraged to develop congestion measurement in real time. Our goal is to develop a model to apply computer vision and data science approaches in order to detect traffic elements and predict future congestions.

P02 A FAIR Approach to Scientific Data Analysis with Boutiques, Tristan Glatard

Abstract Reproducibility has become an important concern in computational data science. In this talk, we present reproducibility challenges arising from data analysis pipelines; in particular, we show how small changes introduced in the computational infrastructure, analysis software, or data impact results reproducibility. We describe preliminary solutions to address these issues using system-level pipeline analysis and bootstrap aggregation of results. We finally present Boutiques, a containerization framework to make data analysis pipelines Findable, Accessible, Interoperable and Reusable. Boutiques describes analysis pipelines with globally persistent records to make them searchable and accessible, and it links them to container images to make them reusable across a variety of computational platforms.

P03 BENIN: Biologically Enhanced Network INference for Transcriptional Regulatory Networks, Stephanie Kamgnia and Gregory Butler

Abstract BENIN is a general framework that permits jointly considering different types of prior knowledge with expression datasets to boost the network inference. The method states the network inference as a feature selection problem. To solve it, BENIN uses a popular penalized regression method, the Elastic net, combined with bootstrap resampling. Using several datasets ranging from simulated data with the DREAM 4 benchmark, to experimental data with the yeast cell cycle dataset, we show that, when combined with genome-wide location data and knockout gene expression data, BENIN significantly outperforms the-state-of-the-art.

P04 TranCEP: Predicting transmembrane transport proteins using information on amino acid composition, evolution, and specificity-determining positions, Munira Alballa, Faizah Aplop, Gregory Butler

Abstract Transporters mediate the movement of compounds across the membranes that separate the cell from its environment, and across inner membranes surrounding cellular compartments. It is estimated that one third of a proteome consists of transmembrane proteins, and many of these are transport proteins. Given the increase in the number of genomes being sequenced, there is a need for computation tools that predict the substrates which are transported by the transmembrane transport proteins. TranCEP is a predictor of the type of substrate transported by a transmembrane transport protein. TranCEP combines the traditional use of the amino acid composition of the protein, with evolutionary information captured in a multiple sequence alignment, and restriction to important positions of the alignment that play a role in determining specificity of the protein. Our experimental results show that TranCEP significantly outperforms the state of the art. The results quantify the contribution made by each of the kinds of information used.