BIOINFORMATICS AND COMPUTATIONAL BIOLOGY (CERTIFICATE COURSE)

Coordinated by
Dr. Priyadarshini Mallick
Assistant Professor & HOD
Department of Microbiology

developed by
Dr. Sankar Chandra Basu
Assistant Professor,
Asutosh College (affiliated to University of Calcutta)
Department of Microbiology
&
External Scientific Collaborator (Computational Biophysics) at 3BIO, ULB, Brussels, Belgium
Homepage: www.scinetmol.in,
Email: nemo8130@gmail.com, sankarchandra.basu@asutoshcollege.in

Introduction (about the course)

With the fast-growing influx of experimental data attributed to different branches of natural sciences, contemporary ‘science and technologies’ have an unmistakable inclination towards knowledge-based training and learning. It was realized with time that several complex problems standing unresolved across distant disciplines of natural sciences (e.g., the Transportation problem, the Graph coloring problem, the Protein folding problem etc.) are not alone solvable analytically neither by purely physics-based methods (i.e., by deductive logic based on ‘first principles’) in real-time. The current era and its derived technologies thus have an unambiguous trend towards training algorithms that are based on complex combinations of empirical observations and under continuous evaluations. It is but natural that with the rate and nature of changes in needs and demands of the modern human world, Artificial Intelligence (AI) and robotics are setting new ‘state-of-the-art’ in many challenging real-world problems and thereby taking over manual efforts at many hierarchies of natural and social sciences. The faculty of modern biology is no exception in this trend wherein research and development in many present problems heavily relies on the active intervention of knowledge-based advanced learning coupled with fast computations in the given premise of a biological problem. Understandably, it is literally impossible to imagine contemporary academic curriculums falling under any branch of modern biology without having in it the essential component of ‘Bioinformatics and Computational Biology’. Built with the conjunction of biology and informatics, the subject has existed in its primordial anatomy ever since the days of Turing machines and early computers. Over time, it has grown across decades to mature into its present shape, gradually spreading its wings across a plethora of widely varying and challenging contemporary biological problems. In the process, it had to co-evolve with the ‘state-of-the-art’ of many experimental innovations and/or theoretical advances across a number of distant yet related disciplines. Information theory (attributed to Claude Shannon), evolutionary biology and phylogeny analysis (from the Darwinian era to 16s-rRNA), DNA sequencing techniques (from Frederick Sanger to Craig Venter and the Human genome project), and biomolecular structure solving techniques (from X-ray Crystallography to cryo-EM) are just a few to name. Bioinformatics in modern biology thus gives exclusive coverages of topics and problems spanning from 1D sequences to 3D structures, from molecular evolution to ecological modeling, from protein folding to non-canonical base-pairing in RNAs, from protein design to the design of biotherapeutics, from next-generation sequencing to personalized medicines and so on.

Built under this contemporary background, the current course presents a unique opportunity for inquisitive and enthusiastic BSc students (6th Semester) to learn and apply Bioinformatics and Computational Biology. It provides foundation knowledge and understanding of the very fundamentals of the subject while also adequately covering many of its relevant advanced endeavors. The course, open to a wide spectrum of disciplines (not alone restricted to students of biological science faculties), includes necessary recapitulation to commence from a common conceptual platform for all participants. The course structure is designed to present a ‘broad overview of briefings’ in contemporary bioinformatics and computational biology. While a collection of compulsory basic topics (sections) is integral to the core-course structure, the course (six months, 96 hours) offers peripheral flexibility in the choice of its advanced topics (sections). Each participant (student) needs to select two optional sections (out of 4 choices) in addition to 10 compulsory sections. Each section has ( 6 hrs ). The course intends to offer enjoyable interactive classroom participation and direct first-hand experience in all practical attributes of Bioinformatics such as developing algorithms and coding for them. To the very best of our knowledge and belief, we believe that the certificate attained upon successful completion of the course would be of value to a participant in his / her academic persuasion both in terms of learning and career development.

Syllabus / Curriculum

(Eligible students: Those who are undergoing in B.Sc (University of Calcutta)  with honours in any of the following subjects: Biochemistry, Microbiology, Botany, Zoology / Statistics, Electronics, Mathematics, Computer Science)

Each participant (student) needs to select two optional sections (out of 4 choices) in addition to 10 compulsory sections. Each section has ( 6 hrs ).

Compulsory Sections:

  1. BiC1. Why Bioinformatics? Essential Recapitulations

    Introduction to Informatics and Data Science, Philosophy of Knowledge-based / semi-empirical approaches in biophysical and biological sciences in contrast to First Principle – derived / physics based approaches. Recapitulation of the first and second laws of thermodynamics. Recapitulation of basic calculus: differentiation and Integration. Recapitulation of basics of modern algebra and statistics: Set theory, Probability, Distributions, central tendencies and deviations, t-test & Chi square tests. ( 6 hrs )

  2. BiC2. Algorithms and Networks

    Introduction to Algorithms, Birth of Computer Science: Early Computers and Turing Machines, Birth of Information Technology: Claude Shannon and Shannon Entropy, Introduction to Graph Theory and Network Science, Topologies and the Graph Isomorphism Problem, Network prototypes: Small World, scale-free, modular and bipartite networks, Network properties: local cohesiveness, global reach, Cliquishness, Link Density, Preferential attachment of nodes, Hubs. ( 6 hrs )

  3. BiC3. Biomolecules and their interplay

    Macro-molecules of life and their structural units in brief: Nucleotides and Nucleic Acids, Amino acids and Proteins, Carbohydrates and Polysaccharides, Anionic and Zwitterionic lipids, Introduction to gene and protein sequences, Genomes and Proteomes, Open Reading Frames, Introns – Exons – Cistrons, The central dogma of Molecular Biology, Mutations: synonymous and non-synonymous replacements, silent, neutral, in-dels, gene duplication, Conserved and Variable parts of a gene and its impact on a gene-product, Coding and Non-coding parts of the genome. Impact of non-coding part on gene expression and regulation, Introduction to Epigenetics and chromatin biology, post translational modifications. ( 6 hrs )

  4. BiC4. Sequence analysis

    Brief description of sequencing techniques for nucleic acids and proteins, The Sanger (chain termination) method, the ninhydrin method, Polymerase Chain Reaction (PCR), Multiple Sequence Alignments (MSA), Local (Needleman-Wunsch) and Global (Smith-Watermaan) Alignments, Scoring matrices: PAM and BLOSUM, BLAST, e-value, iterative (psi-BLAST), Molecular Evolution and Phylogeny, Homologs, Orthologs and Paralogs, Human Genome Project, Next generation sequencing. ( 6 hrs )

  5. BiC5.  Learning and Intelligence in advanced computing

    Computational Complexity, Hard Problems, Combinatorial Optimizations, the Graph Coloring Problem, Heuristics and greedy algorithms, Monte Carlo and Genetic Algorithms,  Machine Learning and Artificial Intelligence, Regression and Classification, Neural Networks (NN), Random Forrest Classifiers (RCF), Support Vector Regressions (SVM), Idea of interpretable machine learning for the future, Proposition of Personalized medicines. ( 6 hrs )

  6. BiC6. Biomoleular structural dynamics

    Introduction to Biomolecular Structure and Dynamics, eigenvalues and eigenvectors, Bond lengths, Bond Angles and Torsion Angles, Internal, Local and global Frames of References, Molecular Superposition, Transformations Matrices, Translation and Rotations, Normal Modes, Brief recapitulation of experimental structure determination methods: X-ray Crystallography, cryo-Electron Microscopy, SAXS, NMR. ( 6 hrs )

  7. BiC7. Amino acids and Proteins

    Amino Acid Biochemistry and Protein Structure, Primary, Secondary, Tertiary, Quaternary associations, Backbone dihedrals and the Ramachandran Plot, Conformational Variation and the Dunbrack’s Rotamer Library, Protein Domains, Multi-domain Proteins, Evolution of protein functions, Structural Classification of Proteins (SCOP), Protein taxonomy: Class, family, super-family, fold, Helix bundles and beta barrels, Structural highlight of classic protein examples: Hemoglobin, Cyclophilin, Immunoglobulins, Collagen and the triple helix. ( 6 hrs )

  8. BiC8. Probing Biomolecular Interactions

    Biomolecular Recognition and Molecular Docking, Docking Algorithms, Blind and Guided docking, Receptor and Ligand, Active and allosteric sites, Deep pockets and Grooves, Binding modes, Scoring of Docked poses: Quality Estimates and Scoring Functions, Z-score, Mentioning of CAPRI: the Protein Olympic for Critical Assessment of PRediction of Interaction, Shape and electrostatic complementarity, the Complementarity Plot, Solvent Accessibility and burial, statistical potentials, semi-empirical pseudo energy functions. ( 6 hrs )

  9. BiC9. Computations and coding

    Computer Architectures, Shell and Kernel, Operating Systems, DOS, Unix, Linux, Microsoft, Mac, basic shell commands and their combined use, shell scripting (.csh, .bash), Introduction to Programming, Procedural programming and C, Object Oriented Programming and C++, String handling in PERL, number crunching in FORTRAN 90, Minimalistic syntax and Python, Interactive programming and Haskell, Plotting and curve drawing in MATLAB, Octave. ( 6 hrs )

  10. BiC10. Basic Programming exercises in Bioinformatics.

    Handling FASTA Sequences and coordinate (PDB) files. Identifying recurring motifs a sequence(s), transcribing the complementary strand from a given sequence, translating a given ORF into a protein sequence, Computing main-chain bond lengths, bond angles and torsion angles in proteins, constructing the Ramachandran Plot. Translating and Rotating a molecular object, Third and Forth Atom Fixations.

Optional Sections (select two out of four choices):

  1. BiO1. Macromoleuclar Folding and Structure prediction.

    The Protein Folding Problem, Anfenson’s thermodynamic hypothesis, Second Genetic Code, Protein Structure Prediction, physics-based, knowledge-based and hybrid approaches, Co-evolutionary approaches, Deep learning and alpha folds, Hydrophobic collapse and hydrophobic cores, the Fold Recognition problem, conservation in structures (folds) vs. sequences, decoys, threading and cross-threading, side-chain prediction and SCWRL, twilight and midnight zones of protein sequence alignment. Mentioning of CASP: the Protein Olympic for Critical Assessment of Structure Prediction. Brief mentioning of RNA structure prediction. ( 6 hrs )

  2. BiO2. Macromoleuclar Packing and electrostatics

    Protein packing: jigsaw puzzle, nuts and bolts, oil drop models, packing density and Voronoi polyhedra, packing motif in contrast to secondary structural motifs, point and surface contact networks in proteins, protein electrostatics: continuum and explicit models, Recapitulation of Coulomb's law, Poisson-Boltzmann Method, distance dependent dielectric, DelPhi – Gaussian and multi-dielectric advents, consideration of local pKa, the Inverse protein folding problem: Introduction to protein design, Design of alternatively packed hydrophobic cores, design of novel functionalities in proteins, design targeted at the globular – disorder interface. ( 6 hrs )

  3. BiO3. Intrinsic Disorder in Proteins

    IDPs and IDPRs, anti-thesis of the ‘one sequence→ one structure→ one function’ paradigm in proteins, The globular – disorder interface, disorder – to – order transitioning ‘protean’ residues, Loops and structural flexibility, Transient Salt-bridge dynamics and electrostatics in sustaining protein disorder, structural degeneracy and self-organized criticality, Self-aggregation and amyloid formation, Biomedical relevance, Fold switch proteins, Membrane proteins, Stability, strength and weaknesses of amino acid residues across globular and membrane proteins, Membrane embedding, Drugging of membrane proteins. ( 6 hrs )

  4. BiO4. Introduction to Molecular Dynamic (MD) simulations

    Recapitulation of Newton’s Laws of Motion, Relationship between Force and Potentials, Force-fields: Predominant force-fields at the macro- and micro-scopic dimensions, Bonding and Non-bonding potentials, Periodic Boundary Conditions, Solvation, Restraints and Constraints, Harmonic constraints, Energy minimization – zero Kelvin structure; Leap-Frog algorithm, Replica Exchange MD simulation; Umbrella Sampling; Time scales of MD simulations -  from early days to current state-of-the-art. ( 6 hrs )

  5. Quiz and Problem Solving

    12 hrs

  6. Group wise presentation (~5 groups) of a given Bioinformatic exercise (Topic would be chosen by each group from a collection of theme-topics)

    12 hrs

Benefits (point wise)

  1. A unique opportunity to learn and apply Bioinformatics. 
  2. Aimed to boost logical thinking and develop/improve analytical, numerical as well as coding skills.
  3. Foundation knowledge and concept building in an advanced interface of Computer Science and Modern Biology.
  4. Adequate coverage of relevant advances across disciplines.
  5. Open to a wide spectrum of students (not alone restricted to bio-only students).
  6. Includes necessary recapitulation to commence from a common conceptual platform for all participants.
  7. Presents a ‘broad overview of briefings’ in contemporary bioinformatics and computational biology.
  8. Offers peripheral flexibility in the choice of its advanced topics (optional papers).
  9. Enjoyable interactive classroom participation.
  10. Direct first-hand experience in developing algorithms and coding.
  11. Coding for your own algorithm. The first touch of developing softwares.
  12. The certificate - an important addition to the particpant’s CV for career development (both in Academia and Industries)