• Home
  • About
  • Mobile
  • Open Content
  • Search

Module Overview


  • Description
  • Facilitators
  • Weblinks
  • Timetable
CS4220 

KNOWLEDGE DISCOVERY METHODS IN BIOINFORMATICS
   2014/2015, Semester 2
   School of Computing (Computer Science)
Modular Credits: 4
  Tags: --

Learning Outcomes

TopPresent-day biomedical researchers are confronted by vast amounts of data from genome sequencing, microscopy, high-throughput analytical techniques for DNA, RNA, and proteins, and a host of other new experimental technologies. Coupled with the advances in computing power, this flow of information should enable scientists to model and understand biological systems in novel ways.

The goals of CS4220 (Knowledge Discovery Methods in Bioinformatics) are:

  1. Expose students to knowledge discovery techniques,
  2. Enhance students' flexible and logical problem solving skills,
  3. Develop students' understanding, in depth, of bioinformatics and issues in analysis of real-life high-throughput biological data.

To achieve these goals, we do a series of in-depth studies and hands-on projects on topics such as gene expression profile analysis, proteomic profile analysis, epistatic interaction detection, protein family recognition, etc.

At the end of the course, students will be able to identify the relevant knowledge discovery techniques for different biological data to uncover new information, as well as be confident in formulating and validating hypothesis underlying observations from biological data.


Prerequisites

TopThe student is assumed to have already taken CS2220, and has an appreciation of computer algorithms, computer programming, and basic molecular biology.

Teaching Modes

TopThe  teaching will primarily be by lectures. Each lecture will focus on one topic, and will typically begin with a simple introduction to the topic, a review of current approachs on the topic, and at least one in-depth case study on the topic.  To increase students' appreciation on a topic, a list of must-read and good-to-read articles will be provided for students to read before each class. Students are especially expected to have read the must-read articles before hand. For some topics, students may also be asked to present and discuss the articles read.

Syllabus

TopThe course comprises the following units:
  1. Essence of Biostatistics
  • Basics of biostatistics
  • Statistical estimation
  • Hypothesis testing
  • Measurement data: z-test, t-test
  • Categorical data: c2-test, fisher’s exact test
  • Non-parametric methods
  1. Essence of Data Mining
  • Clustering
  • Association rules
  • Classification
  • Class-imbalance learning
  1. Gene Expression Profile Analysis
  • Basic gene expression profile analysis
  • Common issues
  • Batch effect & normalization
  • Improving reproduciblity, sensitivity, and precision
  1. Proteomic Profile Analysis
  • Basic proteomic profile analysis
  • Common issues
  • Improving consistency
  • Improving coverage
  1. Protein Interaction Network
  • Overview of biological networks
  • Use of biological networks in enhancing bioinformatics analysis
  • Consistency, comprehensiveness, and compatibility of biological pathway databases
  • Integration of pathway databases
  • Reliability of PPIN
  • Identifying noise and missing edges in PPIN
  • An advanced example on quality assessment of PPIN
  1. Protein Complex Prediction
  • Overview of protein complex prediction
  • A case study: MCL-CAw
  • Impact of PPIN cleansing
  • Detecting overlapping complexes
  • Detecting low-density complexes
  • Detecting small complexes
  1. Protein Function Prediction
  • Basic protein function prediction
  • "Guilt by association" of other properties
  • Protein function prediction from PPIN
  • "Guilt by association" of multiple types of information
  1. Pathway perturbations in a disease context

 

Assessment

TopThere  will be 3-4 assignments, 1 project, and a final exam. The assignments and project will collectively contribute  upto 60% of the course grade; the final exam will contribute upto 40% of the course grade.

Workload

Top2-1-0-4-3

Workload Components : A-B-C-D-E
A: no. of lecture hours per week
B: no. of tutorial hours per week
C: no. of lab hours per week
D: no. of hours for projects, assignments, fieldwork etc per week
E: no. of hours for preparatory work by a student per week

Contact

  • IVLE Webmaster

Social Media

Latest Alerts

  • IVLE scheduled maintenance every Tuesday 0300 hrs - 0700 hrs

Centre for Instructional Technology

Legal  |  Acceptable Use Policy

Copyright © 2015, National University of Singapore. All rights reserved.