Bioinformatics & Scientific Computing

 

  • We provide unique expertise and services in the areas of bioinformatics and scientific computing to academic and industrial researchers.
  • Join our training opportunities and improve your bioinformatics, statistics and computing skills!
  • Our bioinformatics services give you access to the latest approaches for the analysis of high-throughput data sets
  • We create custom hardware and software tools that solve the specific needs of your research group.
  • Make effective use of your data and translate biological results into new insights!
  • Our facility is strongly committed to the sharing and dissemination of knowledge in the field of computational biology.

We offer data analysis services for next-generation sequencing data and develop innovative and intuitive software solutions for biological experiments requiring image and video processing as well as hardware-related programming. Our specific data management and processing tools help researchers to translate biological results into new insights. We also offer trainings and consultations in the areas of bioinformatics, statistics and programming.

Services

    Bioinformatics

    Bioinformatics

    Novel, large-scale measurement techniques are used routinely in bio- and medical sciences. To fully harness the power of these techniques and translate these large data sets to information, the data need to be managed and analyzed. Accordingly, the cutting-edge bioscience has become data and computing intensive.

    Our mission is to develop advanced analysis tools and implement novel approaches for the analysis of high-throughput data sets with special focus on next-generation sequencing (NGS). With close collaboration with the NGS facility, we provide consultation on experimental design and downstream data analysis services for various NGS applications including:

    Transcriptomics

    • Expression analysis for small RNA sequencing (e.g, miRNA, snoRNA, lncRNA)
    • Gene expression data analysis for mRNA sequencing:
      - gene expression level
      - isoform level
      - novel transcript level
      - fusion gene analysis

    Epigenomics

    • ChIP-seq data analysis including transcription factor and histone mark analysis
    • DNA methylation analysis

    Genetic variant analysis including SNPs, insertions and deletions

    Metagenomics

    • 16S rRNA
    • whole genome

    Deletion in the vestigial gene in Drosophila

    A fly line with a deletion in a regulatory region (Polycomb Response Element) of the Drosophila vestigial gene was compared to the wild type. The data analysis focused on the effects of this deletion on the gene expression of the vestigial gene and its neighboring genes and also on the effects to global gene expression.

    This study will be further investigated by introducing rescue constructs, which bring back parts of the regulatory vestigial gene region, to the deletion fly line and analyzing their effect on gene expression.

    The effect of over-expressed lncRNAs to global gene expression

    The function of two long non-coding RNAs that are transcribed from the same locus in the Drosophila genome but in opposite direction („forward“ or „reverse“) was investigated. The two non-coding RNAs are transcribed from this DNA element antisense to each other and have opposite effects on regulating the vestigial gene.

    The aim of this experiment was to address the question whether the non-coding RNAs only regulate the vestigial gene or whether they have more targets genome-wide. The interest was especially in genes that are differentially expressed upon over-expression of either the forward or reverse non-coding RNA, but not in the control over-expression.

    Biological Experiments Engineering

    Biological Experiments Engineering

    We help you to implement your next biological experiments, working together with you from an idea about a new experimental protocol or improvement to a solid solution and implementation. In projects of short to medium duration ( from 3-6 weeks ) we develop hardware and write the software that is needed to control experiments. These projects can include:

    Analyzing Video and Image Data

    • Analyzing Video recordings of animal experiments to detect behavioral patterns
    • Processing microscopy images to extract features or quantify behaviour
    • Perform real time video analysis to enable closed loop experiments

    Controlling and Monitoring Experiments

    • Monitor end record events in order to enable event driven protocols
    • Control and induce stimuli

    Enabling new Protocols

    • Develop sensors to perform new kind of measurements
    • Extend previous experimental system to increase accuracy or performance

    Please find below a variety of selected tools that have been developed by our facility:

    Magnetolaser

    Applying laser stimuli to specific mouse brain regions during a fMRI scan.

    In cooperation with the Preclinical Imaging Facility (pcIMAG) at the CSF a system was developed that synchronizes laser stimuli to the imaging process of the fMRI. Making it possible to define a protocol of different laser stimuli, varying in laser intensity, duration active, laser frequency and flickering frequency, that are correlated to the scanning results. The fMRI and the Magnetolaser are synchronized; meaning that it is known which laser stimuli was present for each volume measurement. Giving us close to real time visualization of brain activity and a method to activate or dampen specific neuronal populations.

    The laser is controlled by a Arduino embedded system that is synchronized to the fMRI through triggering pulses, and application on a PC (implemented in python and Qt) that defines the laser stimuli protocol for the arduino.

    AggTrack

    Analyzing confocal microscopy data to track chromosome movement during meiosis in C.elegans.

    In cooperation with the Jantsch-Plunger group at the MFPL we developed automatic image analysis software that takes image stacks from a Deltavision deconvolution microscopy and tracks the movement of marked chromosome ends.This is achieved in three stages where first the input image is filtered and enhanced to remove noise and compensate for the drifting nucleus. Secondly foreground components are segmented based on their illumination as well as biological constrains. In the last step an identity is assigned to each component, making it possible to track the movement as well as the joining and forking of aggregates.The software was developed using python and the scikit image library.

    We are currently preparing for a public release of the software.

    Hirngespinst

    Estimating in vivo behavior dependent neuronal activity by measuring GFP fluorescence in specific neuronal populations.

    Through fibers implanted unilaterally or bilaterally into mice, specific brain regions are excited using a 473nm laser while behavioral tests are performed. GCAMP6 is expressed in these regions and when excited emits GFP fluorescence during neuron activation. Filtering this GFP signal ( with a typical wave length of about 509nm ) from the autofluorescence produced by brain tissue, makes it possible to estimate the neuronal activity of a volume of neurons in close proximity to the end of the fiber.

    The experiment is controlled through a Matlab application that uses a National Instruments card to control the excitation laser and a single photon counter that is used to measure the GFP signal.

    Correlation Analysis

    Quantifying the learning performance of mice during a multi session based conditioning experiment.

    The performance to learn is compared between different group of mice that are conditioned to enter a port on a specific auditory stimulus. In order to quantify and and compare the performance of those groups, a system was developed that would extract behavioral data gathered during the experiment and correlate them with the presented stimuli. Based on these results for multiple presentations on the same day, a value is calculated, that quantifies the performance of the mouse for a single session. By comparing this value over multiple sessions on sequential days, it is possible to quantify the learning progress for every mouse.

    The system is implemented as a Matlabscript that reads data files, gathered during the experiments and generates images that visualize the learning for each session, as well as a csv file that contains the results for the whole experiment and that can easily be processed further.

    Lagtest

    Measure the effective latency of displays, including all delays introduced by hard and software.

    A multi platform, easy to use embedded system is developed that can be used to measure and compare the latency of monitors, television screens or projectors. An application on Windows or Linux is used to generate an alternating black and white image, where in the meantime an embedded system is measuring the time it takes, for the change in the application to be visible on screen.

    The PC application is implemented in Qt in order to be able to run on multiple platforms, and as an embedded system an Arduinois used in combination with a simple Light Sensor.

    Nosemeter

    Detecting partial entrances to a licking port during behavioral experiments.

    In order to improve the detection of a mouse when it enters a specific licking port, an Infrared beam based sensor was developed and integrated into a existing behavioral apparatus. Making it possible to accurately distinguish between a mouse probing the port by only entering with its nose and when it enters with its full head during licking.

    The sensor is implemented using an Infrared Beam breaker that uses a encoded IR signal generated by an Arduino embedded system and measured by a National Instruments controller card. This sensor data is than integrated into a Matlab script that controls the behavioral experiment.

    Freezdetect

    Detecting locomotion and behavioral patterns through video analysis.

    Videos recorded during mice behavioral experiments are analyzed for movement in correlation to stimuli, for specific user defined regions of interest. This was, it is possible to measure the time a mouse stays in a specific region when a stimulus is given, and the tendency to avoid or approach a target. The analysis of the video is not simply happening during the whole experiment, but is for each presentation divided in three groups; before, during and after the the stimuli is given.

    Matlab is used to perform the video analysis based on different fundamental image operations and the results are written to a csv file to enable further processing.

    Strokelitude

    The position of flapping wings in a tethered fly experiment is measured analyzing real time video recordings.

    In order to detect the direction of heading in a tethered fly experiment a video analysis of live data from a high speed camera is performed. The system reliably detects the angle to which the fly extends its wings, and sends this data to other parts of the experimental system that are used to control the visual stimuli presented to the fly.

    Based on a previous Python implementation a newly design and feature optimized C++ implementation was written and integrated into the experiment as a ROS node.

    Data management and processing

    Data management and processing

    Our data management and processing service provides software solutions that help you to make effective use of your research data and translate your results into new information and insights. The tools we can create for you include:

    Scientific data management tools that

    • Easily collect, classify and review your scientific results
    • Guarantee consistency through validation, security and provenance tracking
    • Make your data available to collaborators or the general public

    Automated analysis and reporting

    • Automate tasks and explore your data interactively
    • Gain insights on your results by analysis, transformation and visualization

    Laboratory information systems

    • Manage lab resources efficiently
    • Streamline daily administration tasks
    • Reduce non-scientific workload on researchers

    Please find below some examples of data management and processing tools that have been developed by our facility:

    Plant Embryomics Central

    PlantEmbryomics.org is platform for collaborative research on plant embryo genomics. Initially developed for the Nodine group at the the GMI, it allows scientists to browse common sequencing data tracks, share files and discuss their results.

    The web portal integrates multiple open source tools to provide the desired functionality. Sequencing tracks uploaded to the server can be browsed by remote users using IGV, and can be restricted to users with different levels of access. Other types of files can be easily shared by using OwnCloud. And for discussion, Q2A provides an easy to use question-and-answer like forum. 

     

     

    go to Plant Embryomics Central

    Brat Tumor Suppressor Screen

    A work-log and phenotype annotation tool for a genome-wide screen to analyze brat xxxIR double knockdown in drosophila fly lines, that allows the scientists to easily track their progress in the screening process.

    Crossed lines are automatically added to the database by scanning the vial's barcode. After the specified development period, the system will highlight the lines that are ready for the second cross. The second cross result is then evaluated by the survival rate and the number of female flies. Lines of interest are tagged as to be analyzed.

    Stock Manager

    The Stock Manager is a software application that allows a research laboratory to manage their fly stocks collection. The database stores descriptive information about the stocks, keeps track of their physical location in vials and boxes and allows the lab members to organize the lines by assigning them to categories and relevant publications. Additionally tools for information export and label printing are provided, to facilitate identification of vials and processing of stock requests by other labs.

    For more information on the software, please take a look at this extract of the user manual.

    The Stock Manager is built with Filemaker Pro Advanced. Attached files are on a network share, and referenced as a link from the database.

    MIMAS

    MIMAS is a web based application created by the Electron Microscopy (EM) Facility to organize the images generated by facility staff and customers using the facility's instruments.

    MIMAS was initially designed to be accessible only to the IMP/IMBA users via local network. As the EM facility became part of the CSF, MIMAS had be made available also for external customers. Security concerns that could be ignored in the initial set-up, had to be taken into account now. The Biocomp facility was responsible for migrating MIMAS to the new configuration. We performed a security audit and carried out the required changes to make the application safe against possible attacks. We also added the possibility for external customers to access the system. Additionaly, we improved the performance of the application and fixed various bugs that were identified in our tests.

    Preclinical Phenotyping Data Processing

    The Preclinical Phenotyping facility at the CSF provides behavioral tests and physiological assays for mouse phenotyping to researchers.

    Within the facility, many experimental set-ups and machines are used, giving as a result large amounts of data that must be analyzed to identify relevant information. We have created several tools for the facility, that process the various experimental results and produce a series of reports that can provided to the customer or used as a basis for further in-depth analysis.

    The reporting tools are built with Excel and Visual Basic for Applications.

    Mass Spectrometry Facility Reports

    The Mass Spectrometry facility at the CSF uses a custom-made software application to keep track of the usage of their systems and the work performed by the technicians.

    We take advantage of the data produced by that software, and provide the facility with a reporting tool that translates the information to a series of summary tables and charts. With two mouse clicks the data is imported and the report is generated. The report provides a visual representation of the resources usage, that can be used to evaluate the facility's performance within a given time period. These include, among others, the systems status (working time, maintenance, repair or idle), the technician time per system and the facility usage per research group.

    The reporting tool is built with Excel and Visual Basic for Applications.

    Training

    Training

    We provide training courses on the following topics:

    If you are interested in any of our courses, there is some important information you would like to check out first.

    Registration

    Courses open for registration are announced via e-mail. You may register online by clicking on the "Register" button for a course in the "Timetable".

    We repeat the courses if there is suitable interest. If you cannot register for one, don't worry, the next will be coming!

    Contact

    The courses are organized (and the majority of them given) by András Aszódi. Send an e-mail to András.

    Please note that the course plan is subject to change without notice.There is no guarantee that the courses indicated in the timetable will actually be held at the planned date.

    In general, no courses are planned from mid-July to the end of August because people are on holiday. For the same reason there are no courses between mid-December and mid-January.

    Consulting

    Consulting

    State-of-the-art biological experiments are getting more and more complex and require sophisticated data analysis techniques. The BioComp facility is dedicated to help researchers with advice on computational issues they may encounter during their work.

    We offer the following consultancy services:

    1. Statistics consulting: we provide advice on the proper design of experiments, selecting appropriate statistical methods during data analysis, and help with the preparation of manuscripts and grant proposals.
    2. Knowledge Hub: you can ask questions on this online forum and experts from all around the VBC will try their best providing an answer. 
    3. Walk-in support: should you have a quick question, then do not hesitate to visit us at the BioComp office for a friendly chat.

    Resources

    Knowledge hub

    The Knowledge Hub is a question and answer site where researchers can reach the BioComp experts and other computational biologists. You can ask questions related to bioinformatics, statistics, computational biology, scientific programming and similar subjects. You can also answer questions and rate other users' contributions!

    Multovl

    The MULTOVL program finds multiple overlaps among genomic regions. It identifies multiple intersections between regions from any number of tracks, unions of overlapping regions, and can also detect solitary regions that do not overlap with any other region in the input dataset.

    The MULTOVLPROB tool calculates the significance of multiple overlaps by reshuffling the input regions many times. This way the null distribution of the overlap lengths can be estimated. Probabilities (p-values) can then be assigned to the actually observed overlap lengths. Small p-values indicate that the overlap combinations were unlikely to occur by chance.

    MULTOVL was developed by András Aszódi and extensively tested by Markus Jaritz, Roman Stocsits and numerous other colleagues. The tools are described in a Bioinformatics Application Note (abstract) (PDF).

    Downloads:

     

    The main MULTOVL repository is kept at BitBucket from where you can download the source and pre-compiled binaries. The package is also available at SourceForge but that repository is not always up-to-date.

    User Information

    Practical information

    Coming soon

    Publications

    Linear ubiquitination by LUBEL has a role in Drosophila heat stress response.
    Asaoka T, Almargo J, Ehrhardt C, Tsai I, Schleiffer A, Deszcz L, Junttila S, Ringrose L, Mechtler K, Kavirayani A, Gyenesei A, Hoffmann K, Duchek P, Rittinger K, Ikeda F. EMBO Reports 2016 Nov;17(11):1624-1640. (abstract)

    Genetic code expansion for multiprotein complex engineering.
    Koehler C, Sauter P, Wawryszyn M, Estrada Girona G, Gupta K, Landry J, Fritz MHY, Radic K, Hoffmann JE, Gyenesei A, Galik B, Junttila S, Stolt-Bergner P, Pruneri G, Bräse S, Schultz C, Biskup M, Besir H, Benes V, Jechlinger M, Korbel J, Berger I, Chen Z, Zou J, Tan PS, Rappsilber J, Lemke, E. Nature Methods 2016 Oct 17. doi: 10.1038/nmeth.4032. (abstract)

    Retene causes multifunctional transcriptomic changes in the heart of rainbow trout (Oncorhynchus mykiss) embryos. Vehniäinen ER, Bremer K, Scott JA, Junttila S, Laiho A, Gyenesei A, Hodson PV, Oikari AO.
    Environmental Toxicology and Pharmacology 2016 41:95-102. doi: 10.1016/j.etap.2015.11.015. (abstract)

    Differential Promoter Methylation of Macrophage Genes Is Associated With Impaired Vascular Growth in Ischemic Muscles of Hyperlipidemic and Type 2 Diabetic Mice: Genome-Wide Promoter Methylation Study. Babu M, Durga Devi T, Mäkinen P, Kaikkonen M, Lesch HP, Junttila S, Laiho A, Ghimire B, Gyenesei A, Ylä-Herttuala S. Circ Res. 2015 Jul 17;117(3):289-99. doi: 10.1161/CIRCRESAHA.115.306424, PMID: 26085133 (abstract)

    Identification of Reproduction-Related Gene Polymorphisms Using Whole Transcriptome Sequencing in the Large White Pig Population. Fischer D, Laiho A, Gyenesei A, Sironen A. G3 (Bethesda) - 2015 Apr 27;5(7):1351-60. doi: 10.1534/g3.115.018382, PMID: 25917919 (abstract)

    Promoter-specific alterations of APC are a rare cause for mutation-negative familial adenomatous polyposis.
    Pavicic W, Nieminen TT, Gylling A, Pursiheimo JP, Laiho A, Gyenesei A, Järvinen HJ, Peltomäki P
    Genes Chromosomes Cancer - Epub 2014 Jun 20. (abstract)

    Gene Expression Differences between Noccaea caerulescens Ecotypes Help to Identify Candidate Genes for Metal Phytoremediation.
    Halimaa P, Lin YF, Ahonen VH, Blande D, Clemens S, Gyenesei A, Häikiö E, Kärenlampi SO, Laiho A, Aarts MG, Pursiheimo JP, Schat H, Schmidt H, Tuomainen MH, Tervahauta AI.
    Environ Sci Technol. 2014 48 (6), pp 3344–3353. (abstract)

    Novel techniques and an efficient algorithm for closed pattern mining.
    Kiraly A, Laiho A, Abonyi J, Gyenesei A.
    Expert Systems With Applications 2014, 41/11, pp. 5105-5114. (abstract)

    A recent L1 insertion within SPEF2 gene is associated with changes in PRLR expression in sow reproductive organs.
    Sironen A, Fischer D, Laiho A, Gyenesei A, Vilkki J.
    Animal Genetics 2014 Apr 9. (abstract)

    Bit-Table Based Biclustering and Frequent Closed Itemset Mining in High-Dimensional Binary Data.
    Király A., Gyenesei A., Abonyi J.
    The Scientific World Journal 2014, doi: 10.1155/2014/870406 (abstract)

    Effect of ciprofloxacin exposure on DNA repair mechanisms in Campylobacter jejuni.
    Hyytiäinen H, Juntunen P, Scott T, Kytömäki L, Venho R, Laiho A, Junttila S, Gyenesei A, Revez J, Hänninen ML.
    Microbiology 2013 Dec;159 (12):2513-23, doi: 10.1099/mic.0.069203-0. (abstract)

    Whole transcriptome characterization of the effects of dehydration and rehydration on Cladonia rangiferina, the grey reindeer lichen.
    Junttila S, Laiho A, Gyenesei A, Rudd S.
    BMC Genomics 2013, doi: 10.1186/1471-2164-14-870 (abstract)

    Properties of local interactions and their potential value in complementing genome-wide association studies.
    Wei W, Gyenesei A, Semple CA, Haley CS.
    PLoS One 2013 Aug 5;8(8): e71203, doi: 10.1371/journal.pone.0071203 (abstract)

    Transcriptome profiling of the murine testis during the first wave of spermatogenesis
    Laiho A., Kotaja N., Gyenesei A., Sironen A.
    PLOS ONE 2013, 8(4): e61558, doi: 10.1371/journal.pone.0061558 (abstract)

    Expression of small nucleolar RNAs in leukemic cells.
    Teittinen KJ., Laiho A., Uusimäki A., Pursiheimo JP., Gyenesei A., Lohi O.
    Cellular Oncology 2012, 36(1):55-63, doi: 10.1007/s13402-012-0113-5 (abstract)

    MULTOVL: Fast multiple overlaps of genomic regions.
    Aszódi A.
    Bioinformatics 2012 (28): 3318-3319 (abstract)

    SAP30L (Sin3A-associated protein 30-like) is involved in regulation of cardiac development and hematopoiesis in zebrafish embryos.
    Teittinen KJ., Grönroos T., Parikka M, Junttila S., Uusimäki A., Laiho A, Korkeamäki H., Kurppa K., Turpeinen H., Pesu M., Gyenesei A., Rämet M., Lohi O.
    Journal of Cellular Biochemistry 2012, 113(12):3843-52, doi: 10.1002/jcb.24298 (abstract)

    A high-resolution anatomical atlas of the transcriptome in the mouse embryo.
    Diez-Roux G., Banfi S., Sultan M., Geffers L., Anand S., Rozado D., Magen A., Canidio E., Pagani M., Peluso I., Lin-Marq N., Koch M., Bilio M., Cantiello I., Verde R., De Masi C., Bianchi SA., Cicchini J., Perroud E., Mehmeti S., Dagand E., Schrinner S., Nürnberger A., Schmidt K., Metz K., Zwingmann C., Brieske N., Springer C., Hernandez AM., Herzog S., Grabbe F., Sieverding C., Fischer B., Schrader K., Brockmeyer M., Dettmer S., Helbig C., Alunni V., Battaini MA., Mura C., Henrichsen CN., Garcia-Lopez R., Echevarria D., Puelles E., Garcia-Calero E., Kruse S., Uhr M., Kauck C., Feng G., Milyaev N., Ong CK., Kumar L., Lam M., Semple CA., Gyenesei A., Mundlos S., Radelof U., Lehrach H., Sarmientos P., Reymond A., Davidson DR., Dollé P., Antonarakis SE., Yaspo ML., Martinez S., Baldock RA., Eichele G., Ballabio A.
    PLoS Biology 2011, 9(1):e1000582, doi. (abstract)

    Characterization of a transcriptome from a non-model organism, Cladonia rangiferina, the grey reindeer lichen, using high-throughput next generation sequencing and EST sequence data.
    Junttila S., Rudd S.
    BMC Genomics. 2012 Oct 30;13:575. doi: 10.1186/1471-2164-13-575. (abstract)

    Biclustering of High-throughput Gene Expression Data with Bicluster Miner
    Abonyi J., Laiho A., Gyenesei A.,
    IEEE International Conference on Data Mining Workshops 2012, pp:131-138, doi: 10.1109/ICDMW.2012.42 (abstract)

    Genome-wide analysis of epistasis in body mass index using multiple human populations.
    Wei WH., Hemani G., Gyenesei A., Vitart V., Navarro P., Hayward C., Cabrera CP., Huffman JE., Knott SA., Hicks AA., Rudan I., Pramstaller PP., Wild SH., Wilson JF., Campbell H., Hastie ND., Wright AF., Haley CS.
    European Journal of Human Genetics 2012, 20(8):857-62, doi: 10.1038/ejhg.2012.17 (abstract)

    BiForce Toolbox: powerful high-throughput computational analysis of gene-gene interactions in genome-wide association studies.
    Gyenesei A., Moody J., Laiho A., Semple CA., Haley CS., Wei WH.
    Nucleic Acids Research 2012, 40:W628-32, doi: 10.1093/nar/gks550. (abstract)

    High-throughput analysis of epistasis in genome-wide association studies with BiForce.
    Gyenesei A., Moody J., Semple CA., Haley CS., Wei WH.
    Bioinformatics 2012, 28(15):1957-64, doi: 10.1093/bioinformatics/bts304 (abstract)

    GeneFuncster: A Web Tool for Gene Functional Enrichment Analysis and Visualisation.
    Asta Laiho, András Király, Attila Gyenesei
    Computational Methods in System Biology 2012, Lecture Notes in Bioinformatics: LNBI 7605, 382-386, Springer, doi: 10.1007/978-3-642-33636-2_26 (abstract)

    Intergenic Polycomb target sites are dynamically marked by non-coding transcription during lineage commitment.
    Hekimoglu-Balkan B, Aszodi A, Heinen R, Jaritz M, Ringrose L.
    RNA Biology 2012 (9): 314-325. (abstract)

    Cd-specific mutants of mercury-sensing regulatory protein MerR, generated by directed evolution.
    Hakkila KM1., Nikander PA., Junttila SM., Lamminmäki UJ., Virta MP.
    Appl Environ Microbiol. 2011 Sep;77(17):6215-24 (abstract)

    Alternate pathways for Bcl6-mediated regulation of B cell to plasma cell differentiation.
    Alinikula J1., Nera KP., Junttila S., Lassila O.
    Eur J Immunol. 2011 Aug;41(8):2404-13.(abstract)

    Team

    Attila Gyenesei

    Attila Gyenesei

    Core Facility Head
    VBC2 / PG13

    András Aszódi

    András Aszódi

    Biocomputing Specialist
    VBC2 / PG13

    Bence Galik

    Bence Galik

    Bioinformatician
    VBC2 / PG13

    Sini Junttila

    Sini Junttila

    Bioinformatician
    VBC2 / PG13

    Manuel Pasieka

    Manuel Pasieka

    Software Engineer
    VBC2 / PG13

    Pedro Serrano Drozdowskyj

    Pedro Serrano Drozdowskyj

    Software Engineer
    VBC2 / PG13

    Contact

    How to reach us?

    You can reach us by calling +43 1 7962324 7080.

    You can also send us an email to biocomp@vbcf.ac.at and we'll get back to you shortly.

    Or you can find us in Office PG.13 in the VBC2 building. The easiest approach is to come from the IMBA PG floor (yellow) until you reach a big steel door. This is the CSF entrance and it is always open. Then just follow the red arrows on the plan.

    Practical information

    General procedure

    The number of participants is limited because the courses are very interactive, and therefore the participants will be selected on a "first come, first served" basis. This is standard procedure for scientific workshops with a limited number of participants.

    Who may attend a course?

    Anyone who is interested in learning. We welcome participants from outside the Biocenter campus as well.

    Please always read the description of the course before applying. There can be pre-requisites such as familiarity with a given technique or application. If you do not fulfil those criteria then please do not apply as you would not be able to profit from the training and only would take somebody else's place.

    Application procedure

    1. Upcoming courses are announced via e-mail including approximate time, topics and an estimated participation fee. Those who are interested should register at the BioComp training service.
    2. Accepted registration involves a commitment to pay the registration fee. Potential participants are requested to obtain permission from their PIs to attend the course. The PI must approve the course fee and provide the potential participant with an accurate billing address.
    3. Participants are accepted on a "first come, first served" basis. Once all places are assigned, registration will be closed. Please note: you may register only yourself! Registration mails sent on behalf of colleagues will not be considered.
    4. The accepted participants are notified. If an accepted participant cannot attend the course, then a new participant will be selected from the rest of potential participants.
    5. Those colleagues who could not register will be kept on a waiting list. If there is sufficient interest then a second course will be held and the remaining potential participants will again "compete" for the places until everyone could be trained.

    Invoicing

    Billing address

    If you work at the IMP, IMBA or GMI : In general there is no need to provide a billing address unless your course participation is financed from a special source (e.g. PhD training budget).

    If you work at the MFPL, a campus company or any other institution outside the VBC: Please provide a billing address exactly in the format as requested by your institution's accounting department.

    If you are unemployed: The "Arbeitsmarktservice" (AMS) can pay for the course. Please ask them what kind of confirmation they need, and let us know before you register for the course.

    Cancellation and refund policy

    If you cannot attend a course after registration, we will try to replace you with another colleague on the waiting list, or you can suggest a replacement. If there is someone who registers as a replacement, then your registration fee will be reimbursed in full. If neither you nor us can find another participant, you will have to pay 50% of the course fee if you cancel your registration 8 (eight) or more days before the course begins, and the full fee if you cancel 7 (seven) days or less before the course starts.

    If the number of participants falls below the minimal number specified for the course 8 or more days before the course starts, then the course will be cancelled and the already paid registration fees reimbursed, minus non-recoverable expenses incurred by the VBCF.

    If we are forced to cancel a course (e.g. due to illness of the instructor etc.) then we will try our best to hold the course at a later date. Your registration will be valid for the replacement course. Should you decide to cancel your registration, the rules above will apply.

    Seminar rooms

    We have access to two seminar rooms:

    Please make sure you go to the correct one for your training!

    The Ten Rules of Reproducible Research

    These simple rules have been published in PLoS Computational Biology.

    1. For Every Result, Keep Track of How It Was Produced
    2. Avoid Manual Data Manipulation Steps
    3. Archive the Exact Versions of All External Programs Used
    4. Version Control All Custom Scripts
    5. Record All Intermediate Results, When Possible in Standardized Formats
    6. For Analyses That Include Randomness, Note Underlying Random Seeds
    7. Always Store Raw Data behind Plots
    8. Generate Hierarchical Analysis Output, Allowing Layers of Increasing Detail to Be Inspected
    9. Connect Textual Statements to Underlying Results
    10. Provide Public Access to Scripts, Runs, and Results

    Know your numbers

    ...or, more importantly, "Know when your numbers are significant". This Comment by David L. Vaux, published in Nature Vol 492, 180-181 (2012) summarizes why we need better statistics:

    The incidence of papers in cell and molecular biology that have basic statistical mistakes is alarming. I see figures with error bars that do not say what they describe, and error bars and P values for single, ‘representative’ experiments. So, as an increasingly weary reviewer of many a biology publication, I’m going to spell out again the basics that every experimental biologist should know.

    There is another very useful Comment by Daniel MacArthur (Nature Vol. 487, 427-428, 2012) on the dangers of false positives in genomic research that addresses the same problem.

    Fortunately, we can help you by offering a statistical consultancy service.

    Statistical consulting

    The statistical consulting service is intended to help researchers apply statistical methods properly when planning experiments or analyzing data. The following activities are foreseen:

    • Design of experiments: the consultant will advise on how to set up experiments: treatment vs. control, number of necessary biological and/or technical replicates.
    • Data analysis advice: which statistical methodology is appropriate for answering a biological question? Which tests shall I use? Are the prerequisites for a test (e.g. normality) satisfied? What are the best ways of visualizing the results?
    • Help with publications: the consultant will play the role of a reviewer and identify possible problems with a manuscript. Or, if reviewers ask for manuscript changes, the consultant can help understanding the issues and formulate appropriate responses.

    IMPORTANT: This consulting service provides advice only. The consultants cannot perform a complete data analysis for you. If you need detailed statistical support for a large project, then it is advisable to set up a collaboration.

    How to use the statistical consulting service

    This service works as follows:

    1. Researchers are kindly asked to contact András Aszódi with a detailed description of their problem. Electronic copies of relevant publications and manuscripts are welcome!
    2. If András can solve your problem, then he'll get back to you ASAP. Otherwise, the problem description will be sent out to one of the statistics consultants affiliated with the service.
    3. The consultant accepts the problem, and the SCC books his/her time for the researcher. The consultant prepares him/herself for the problem.
    4. Consultations will take place at the campus, the SCC arranges the room and notifies both parties about the date/time. Consultations will take place regularly, most probably every second week (depends on consultant availability and demand).
    5. Consulting time will be billed by the hour. A limited number of hours will be available free of charge to every research group.

    Practical considerations

    1. Please submit your question early! The consultants need time to prepare, and since they are not our employees, they are not available instantaneously. So please plan in advance.
    2. The more information you provide in advance, the better.
    3. Formulate your question clearly. Our consultants have biometrics / bioinformatics backgrounds, but you have to consider that your project is completely new for them.

    Biostatistics

    The BioComp facility offers courses on statistical software and on practical data analysis.

    Introductory courses

    These are intended for colleagues who would like to refresh their statistical knowledge.

    Courses currently offered

    Previous courses

    These courses are not offered any more.

    • R for biologists
    • Introduction to applied statistics

    Advanced courses

    These courses have been tailor-made for colleagues with solid statistical knowledge who wish to learn more about special methodologies and techniques.

    Computational biologists are actively encouraged to suggest topics for additional advanced courses!

    Computing skills

    The purpose of these courses is to help participants use computers more efficiently.

    Courses

    Bioinformatics

    The BioComp facility offers basic bioinformatics trainings for molecular biologists. The aim is to help experimentalists to interpret the results of bioinformatics data analyses.

    Trainings

    Lectures

    Statistics with R

    The aim of this course is to teach you how to perform basic statistical analysis using R. First we review the foundations (sampling theory, discrete and continuous distributions), then we focus on classical hypothesis testing. This course will improve your generic statistics knowledge. We cannot go into the specific data analysis problems of your particular project.

    Out of scope: this course will not teach you bioinformatics. In particular, no high-throughput sequencing data will be used because i) they are impractically large, and ii) not everyone on campus is working with sequencing.

    Instructor: András Aszódi, VBCF BioComp.

    Number of participants: minimum 5, maximum 10.

    Topics and schedule

    Length: the course takes two half-days, 09:00 - 13:00 each day.

    Topics:

    Day 1:

    • Sampling theory: obtaining information about a population via sampling. Sample characteristics (location, dispersion, skewness), estimation of the mean, standard error of the mean.
    • Discrete and continuous probability distributions. Central limit theorem.

    Day 2:

    • Hypothesis testing. Basic principles, one- and two-sided testing, types of errors, power calculations.
    • "Cookbook of tests": location testing, normality, variance comparisons, counting statistics, contingency tables, regression tests.

    Prerequisites

    Previous knowledge

    This course requires basic familiarity with R. The following skills are necessary:

    • Using the R interpreter, either the command-line program or R Studio
    • How to invoke R functions, pass optional/named parameters
    • Some familiarity with simple plotting commands

    If you have participated in the "R as a programming language" course then you are well equipped. These two courses are offered in "tandem" but booking one of them separately is also possible.

    Hardware and software

    This is an interactive "hands-on" course. Please take a laptop with you. We will use the following software:

    The R system

    The RStudio visual environment for R

    Both of them will be provided in a virtual machine to which participants will be able to log into. Alternatively, if you already have R and RStudio installed on your laptop, then you can work locally as well.

    Further reading

    There are lots of books on R and statistics, usually mixing the two topics. The instructor happens to like this book:

    • The R Book (2nd edition), by Michael J Crawley. John Wiley & Co, 2012. ISBN 978-0470973929

    R as a programming language

    This course is aimed at colleagues who would like to learn how to use the statistical programming language "R". 

    Please note that this course focuses on the R language itself. We will learn about the data structures and functions in R, and how to write R scripts.This course will not teach you statistics or bioinformatics. To learn how to do basic statistics with R, it is recommended to take the "Statistics with R" course which is offered "in tandem" with this one.

    Instructor: András Aszódi, CSF BioComp.

    Number of participants: minimum 5, maximum 10.

    Topics and schedule

    Length: the course takes two half-days, 09:00 - 13:00 each day.

    Topics:

    Day 1:

    • Introduction to R: general principles.
    • Data structures in R: vectors, matrices, arrays, lists, data frames. Data import/export.

    Day 2:

    • Functions in R: how to write your own functions and scripts.
    • Visualization in R: various plot types.

    Prerequisites

    Previous knowledge

    Programming experience is not required but desirable. Note that this is a basic introductory course that aims to explain R from the ground up. If you are an experienced R user, then this course is not appropriate for you.

    Hardware and software

    This is an interactive "hands-on" course. Please take a laptop with you. We will use the following software:

    The R system

    The RStudio visual environment for R

    Both of them will be provided in a virtual machine running on the instructor's laptop. You will be able to log into an "ad-hoc" training wireless network to access the system. Alternatively, if you already have R and RStudio installed on your laptop, then you can work locally as well.

    Further reading

    There are lots of books on R and statistics, usually mixing the two topics. The instructor happens to like this book:

    • The R Book (2nd edition), by Michael J Crawley.
    • John Wiley & Co, 2012. ISBN 978-0470973929

    Course history

    Date Participants Status
    2014-04-24 7 completed
    2014-05-13 10 completed at CeMM
    2015-02-26/27 7 completed
         
         
         

    Advanced regression methods in biostatistics

    The aim of this course is to help computational biologists with complex data analysis problems. After discussing the theoretical foundations, the course will provide practical advice on how to use the presented methodogies with R. 

    Participants are encouraged to bring use cases from their own work. However, please note that individual data analysis needs cannot be discussed in depth.

    Instructor: Bettina Grün.

    Number of participants: minimum 6, maximum 10.

    Topics and schedule

    Length: two half-days, 09:00 - 12:30 both days.

    Topics:

    • Linear models: different regressors with contrasts and interactions, ANOVA, MANOVA
    • Generalized linear models: Poisson, binomial (counting statistics)
    • Regularization methods: LASSO, Ridge and Elastic Net
    • Spline regression
    • Mixed-effects regression

    Prerequisites

    Previous knowledge

    This is an advanced course. Solid statistical foundations and a working knowledge of R is required. Please bring your own laptop with a working R installation.

    Course history

    Date Participants Status
    2013-06-24 9 completed
    2013-06-03 10 completed

    Machine learning methods

    The aim of this course is to help computational biologists with complex data analysis problems. After discussing the theoretical foundations, the course will provide practical advice on how to use selected machine learning methodogies with R. 

    Participants are encouraged to bring use cases from their own work. However, please note that individual data analysis needs cannot be discussed in depth.

    Topics and schedule

    Length: two half-days, 09:00 - 12:30 both days.

    Topics:

    • Day 1: Classification. Support vector machines, trees and random forests, boosting. Performance evaluation.
    • Day 2: Clustering. Feature selection, hierarchical methods, partitioning methods, model-based approaches. Performance evaluation.

    Prerequisites

    Previous knowledge

    This is an advanced course. Solid statistical foundations and a working knowledge of R is required. Please bring your own laptop with a working R installation.

    Course history

    Date Participants Status
    2013-11-18 10 completed

    UNIX command line and scripting

    The first part of this course introduces the UNIX command line on a Linux or Mac. The second part provides a basic introduction to string searches and scripting with AWK and BASH on a Mac or under Linux. It is aimed at colleagues who are going to run analysis programs on their own or wish to use computing clusters.

    Instructor: András Aszódi (VBCF BioComp).

    Number of participants: minimum 3, maximum 6.

    Topics and schedule

    Length: the course takes two half-days, the second part is slightly longer than the first one.

    Part 1: Introduction to the command line

    • UNIX overview
    • Working with files and directories
    • Users, groups, permissions, environment variables
    • Command pipelines, job control

    Part 2: string searches and shell scripting

    • Introduction to regular expressions
    • Searching string patterns using grep and variants
    • Advanced text processing with awk
    • Bash shell scripting

    Prerequisites

    No previous programming experience is necessary.

    This is a hands-on course. Participants must bring a laptop with them and have certain free programs installed (see below). We will log in to a wireless training network provided by BioComp and edit scripts remotely on a virtual machine (VM). 

    Participants must be familiar with the basic usage of the recommended editors: Notepad++ (Windows) or TextWrangler (Mac).

    If YOU HAVE A WINDOWS LAPTOP, INSTALL THESE PROGRAMS:

    1. The PuTTY remote login utility to access the VM
    2. The Notepad++ text editor to edit scripts remotely on the VM.

    IF YOU HAVE A MAC LAPTOP, INSTALL THIS PROGRAM:

    • The TextWrangler text editor to edit scripts remotely on the VM.

    Supporting material

    Seriously useful resources

    EMBNet Short Guide to UNIX Commands

    The Bash Hackers Wiki

    Not-so-serious resources

    The UNIX Hater's Handbook

    GMI "Mendel" Supercomputing Training

    This course provides an introduction to the GMI's "Mendel" HPC system.

    Instructor: András Aszódi (VBCF BioComp).

    Number of participants: minimum 2, maximum 6. Please note that currently participation is restricted to GMI employees, their collaborators, or colleagues who rent CPU time on "Mendel".

    Topics and schedule

    Length: The course takes one half-day. Depending on seminar room availability, they will be held either in the morning from 09:30 to 12:30 or in the afternoon from 14:00 to 17:00.

    Topics:

    • Introduction to Mendel: architecture, storage layout, access.
    • Work environment setup using modules
    • Data management, job submission, queuing system, job scripting

    Prerequisites

    Participants must have an account on "Mendel". The GMI IT department is notified in advance so no action is needed from the participants.

    Because the "Mendel" supercomputer can be accessed only through the command line, familiarity with basic Unix commands such as ls, cd, cp, pwd is required.

    This is a hands-on course. Participants must bring a laptop with them. Under Windows, they must have the PuTTY remote login utility installed; Linux or Mac computers already have the necessary tools to connect to Mendel, so no extra software installation is necessary.

    IMP/IMBA cluster training

    This course provides an introduction to the IMP/IMBA HPC system.

    Instructor: András Aszódi (VBCF BioComp).

    Number of participants: minimum 2, maximum 6. Please note that currently participation is restricted to IMP/IMBA employees and their collaborators.

    Topics and schedule

    Length: The course takes one half-day. Depending on seminar room availability, they will be held either in the morning from 09:30 to 12:30 or in the afternoon from 14:00 to 17:00.

    Topics:

    • Introduction to the cluster architecture, storage layout, access.
    • Work environment setup using modules
    • Data management, job submission, queuing system, job scripting

    Prerequisites

    Participants must have a UNIX account at the IMP or IMBA.

    Because the cluster can be accessed only through the command line, familiarity with basic Unix commands such as ls, cd, cp, pwd is required.

    This is a hands-on course. Participants must bring a laptop with them. Under Windows, they must have the PuTTY remote login utility installed; Linux or Mac computers already have the necessary tools to connect to Mendel, so no extra software installation is necessary.

    Python programming primer

    The purpose of this training is to teach general programming concepts using Python as an instruction tool. We will use Python3 only.

    Please note that this course is not intended to teach you data analysis skills or bioinformatics. For those additional courses are planned.

    Instructor: András Aszódi (VBCF BioComp).

    Number of participants: minimum 5, maximum 10.

    Topics and schedule

    Length: three half-days, 9.00 AM - 13:00 every day.

    Topics:

    • Introduction to Python: basic principles.
    • Python data structures: strings, tuples, lists, dictionaries, sets.
    • Object-oriented programming: how to model coffee machines in Python :-). Inheritance (base and derived classes), polymorphism.
    • Write your own script to convert BED files to GFF. Command-line option processing, file I/O, error handling.

     

    Prerequisites

    Some basic programming knowledge (e.g. having attended the BioComp UNIX scripting and/or the R language courses) is advantageous, but not strictly necessary. Everything will be explained! :-) The ability to type with a low error rate is necessary though, because this is a hands-on training.

    Participants are kindly requested to bring their own laptop to the course.

    RNA-Seq data analysis

    The aim of the course is to familiarise experimental biologists with how bioinformaticians analyse RNA-Seq data. The statistical background will be explained in easy-to-understand terms with practical examples. This course is recommended to colleagues who are planning to do RNA-Seq experiments and want to understand better what happens to their raw data in the bioinformatician's hands.

    Please note that this course will not teach you how to become a bioinformatician. Such a training takes years.

    The course examples will be quite generic. Please keep in mind that due to the diverse interests of the participants we cannot cater for individual data analysis needs.

    Instructor: Elin Axelsson (GMI)

    Number of participants: minimum 6, maximum 10.

    Topics and schedule

    Length: the course takes two half-days, from 09:00 to 12:30 on each day.

    • Introduction. Counting reads.
    • Relative expression estimates (RPKM and friends).
    • Normalisation techniques. Variance stabilising transformations.
    • Relative expression distribution and parameter estimation.
    • Variance and over-dispersion.
    • Statistical testing, multiple testing correction.
    • Fold changes, significance and ranking.

    Prerequisites

    This is not an "interactive" course. Just bring an open, curious mind and a desire to learn.

     

    Mass spectrometry data analysis

    This course focuses on protein/peptide MS data analysis and peptide identification. MS2 spectra will be analyzed by hand to show and understand the concept of peptide fragmentation. We will discuss existing analysis software (database search engines, analysis frameworks) and apply them to example datasets.

    The course is aimed at colleagues who will soon start or have already started to work in mass spectrometry data analysis.

    Instructor: Viktoria Dorfer (FH Hagenberg)

    Number of participants: minimum 10, maximum 20.

    Topics and schedule

    Length: the course takes two full days, from 09:00 to 17:00 on each day with coffee and lunch breaks.

    • Introduction to mass spectrometry. Overview of existing spectrum identification strategies.
    • Tour and presentation of different mass spectrometer instruments at the IMP/IMBA Protein Chemistry facility.
    • Peptide fragmentation (hands-on analysis).
    • De novo tandem mass spectra identification (hands-on analysis).
    • Challenges, advantages and drawbacks of de novo identification.
    • Mass spectra identification by database search. Interpretation of the results (hands-on analysis).
    • Validation (FDR calculation, Percolator). Peptide and protein grouping (hands-on analysis).
    • Challenges, advantages and drawbacks of database identification.

    Prerequisites

    This is a "hands-on" course. Please bring a (preferably Windows) laptop with you. The necessary software and data files will be distributed before the course, some installation is necessary.

     

    Introduction to Biocybernetics

    This lecture is offered by András Aszódi as a "pro bono publico" service to master students at the University of Vienna, usually as part of a "Ringvorlesung". We discuss the basic features of biological regulation, the application of engineering principles to biological problems and molecular mechanisms of information processing in genomic and metabolic regulation.

    Instructor: András Aszódi, VBCF BioComp.

    Number of participants: variable. No registration needed.

    Topics and schedule

    Length: the lecture takes 2 x 50 minutes.

    Topics:

    • Principles of systems analysis. Foundations of cybernetics.
    • Positive and negative feedback loops in natural and artificial systems.
    • Molecular information transfer.
    • Principles of genomic regulation.
    • Principles of metabolic regulation.
    • "Computing" with enzymatic networks: logic gates, emulation of associative learning.
    • Homeostasis and robustness.

    Prerequisites

    Previous knowledge

    High-school mathematics and biology.

    CSF seminar room

    The CSF seminar room is on the PG floor of the VBC building where most of the CSF offices are.

    The CSF seminar room can take max. 10 people so we use it for smaller training courses.

    To find the room, come over from IMBA's PG floor (light yellow) through the open steel door separating IMBA from VBC2. Then proceed as indicated on the floor plan below.

    "Red room"

    The CSF seminar room is on the PG floor of the VBC building where most of the CSF offices are.

    The CSF seminar room can take max. 10 people so we use it for smaller training courses.

    To find the room, come over from IMBA's PG floor (light yellow) through the open steel door separating IMBA from VBC2. Then proceed as indicated on the floor plan below.