Novel Methodologies for Gene Network Interaction Analysis and Network Modelling (with Applications to Cancer Research & Cardiovascular Disease)
by Bala Rajaratnam – Stanford University -Â seminar held on 7 September 2010 at ANU
(This talk about statistical methods in genomics was given to a group of statistics professors, so there were parts of the talk that I had very little understanding about. I apologise to Professor Rajaratnam for any mistakes I make in this summary.)
Biostatisticians work with high-dimensional throughput data from DNA sequencing. Due to the nature of the data, they are not able to use standard statistical methodology to infer complex multivariate dependencies. This type of data analysis issue is also seen in finance and environmental science (e.g. data from large numbers of weather stations). Standard methodology is to use an experimental group and a control group. But with more than 20,000 genes, how do you identify the most important genes to test?
The genes of humans and chimpanzees are very similar. However, the connectivity between pairs of genes in humans is very different to those in chimpanzees. It is the differences in connectivity between genes that better explains the dramatic differences in biological function. Some genes are network hubs. Hub genes are the ones that talk to many other genes. If the hub genes are “knocked down” in experiments, it has a greater impact on the organism than if random genes are “knocked down”. Therefore experiments can be simplified by focusing on the hub genes rather than analysing every gene in detail.
Tools for discovering complex multivariate dependencies use network models. (The methods mentioned were GLASSO, SCAD, SPACE and TLASSO.) A further four methods were developed by combining properties of the original methods. Each of the eight methods was used to map a network diagram for the genes. This enabled the hub genes to be identified. Each method resulted in a different network diagram and therefore a different set of hub genes. Taking the top ten genes from each method identified a combined set of only 24 hub genes. This identified new genes that were related to breast cancer. This was validated in collaboration with biologists. The use of multiple methods demonstrated that there is no statistical method that is the “silver bullet”, there was valuable information in all the methods. A graphical representation was used: Vertex Degree Visualisation.
These methods are also used in finance and environmental science. An example that connects both these fields is that the price of oil is a “hub gene” that drives stock prices.
Working with this type of data there is a lot of noise. Even though the multiple methods found common hub genes in the context of the breast cancer data, this may not necessarily occur with other data sets (from other fields).
Tools to Address Complexity
This is a highly technical example of analysing a complex problem. It highlights the benefits of using more than one method to analyse complex problems. These methods include:
- collaboration with experts from other disciplines
- statistics
- network models
- understand physical reality – biologists confirmed that the genes were really related to breast cancer
Other seminars
- Should candidates smile to win elections? – Yusaku Horiuchi (video)
- Jesus in Canberra: hopes and fears surrounding Christianity and Australian politics – Dr Greg Clarke