
Deep Genomics是什么??

A University of Toronto computer scientist known for combining artificial intelligence with big data genomics is launching a company that could create a roadmap for DNA-based therapy.

The company, called Deep Genomics, is set to launch on Wednesday.

While it has become common for researchers to identify genetic mutations that appear to correlate with various diseases – thousands of mutations have been linked to cancer, for example – the technology behind Deep Genomics involves the use of computer algorithms that can tease out cause and effect relationships.

It’s a method that was developed by Brendan Frey, a professor in the university’s department of computer and electrical engineering.

“My approach was let’s train a neural network to figure out why a mutation leads to a disease,” said Prof. Frey, who is the company’s president and CEO. “That’s what makes our technology unique.”

The method draws on a rapidly growing discipline in computer science known as deep learning, which has lately been making inroads in a range of tough computational problems – including visual identification and speech recognition – where context plays an important role in arriving at the right answer. The field has a long history as a branch of artificial intelligence in which a computer program can adjust itself to become better and better at a complex task. In deep learning, the software is one step further removed from human guidance and can learn to make associations beyond what a human expert might discern.

In recent years, Prof. Frey has taken the same principles to create a set of computer algorithms that can look at the pattern of mutations in an individual’s DNA and make inferences about how those mutations collectively affect the operation of different types of cells in the body.

It’s the predictive power of the algorithms that are key to how Deep Genomics will serve its clients, including other companies that offer diagnostic services to hospitals and health care providers. By linking the DNA sequence to cellular function, the objective is to help determine not only what may be the source of a health problem but what treatment may be more likely to succeed.

Prof. Frey said that when he and his colleagues first published work on the approach in 2010, he expected it would quickly be taken up by entrepreneurs in the biomedical sector and was surprised when that didn’t happen. He concluded that there were too few research groups with the combined expertise in genomics and artificial intelligence to turn the approach into something that would be of use to clinicians and patients.

He decided to launch the company last fall, he said. Angel investors have provided the initial capital to launch the eight-person Toronto-based company, which Prof. Frey hopes will double by the end of the year and quickly find a niche in genetic diagnostics.

Yann LeCun, the New York-based director of artificial intelligence at Facebook and an adviser to the new company, said that while the importance of deep learning is now well established in many areas of data analysis, its impact on medicine is just beginning to be felt.

“The potential applications are really huge,” he said.

Deep Genomics,源自多伦多大学的一家创业公司。对人工智能稍有了解的人就能看出这个名字的含义:Deep Genomics = Deep Learning + Genomics。Deep Genomics致力于利用机器学习算法来预测基因组上的突变会如何改变细胞,进而知道会给人体带来什么改变。Deep Genomics的第一个产品是SPIDEX,预测基因组突变对RNA剪切的影响。关于SPIDEX的方法学细节,可以查看2015年1月的Scie优艾设计网_设计百科nce。机器学习专家、基因组学专家和精准医疗专家组成的团队,让这个公司很快就登上了Nature Biotechnology、科学美国人、WIRED、CBC新闻等学术、科普、科技和传统媒体。

Deep Learning,深度学习,一种模拟人类大脑工作方式的机器学习算法,近年来被广泛用于语音识别、图像识别等海量非结构化数据学习。Google、Facebook、微软等IT巨头在深度学习领域都投入巨资,国内的百度挖来深度学习领域的华人大神余凯成立百度深度学习研究院,随后再引入深度学习巨匠Andrea Ng担任研究院院长。余凯老师今年从百度离职,据悉将在人工智能芯片方面开始创业----可以相信,余凯老师看到了以深度学习为代表的机器学习技术更广阔的应用场景和商业化前景。



Deep Genomics 在技术方面的工作已经发表在了今年一月份的Science和去年6月份的Bioinformatics上,最新的工作即将在Nature Biotechnology上发表。例如,在Bioinformatics上发表的文章中,主要的数据是11019份小鼠的可变外显子(alternative exons)的RNA-Seq数据,然后据此构建深度神经网络来预测组织间的剪切模式。显然,使用小鼠和局限于外显子,大幅度的降低了构建大规模训练数据集的成本。一万多份样本,在图形、机器感知等领域还是很小的样本,但在生命科学领域,尤其是高通量组学领域,已经算得上是不小的样本量了。但这并不是高不可攀,安徽医科大学发表在Nature Genetics上的银屑病项目也对约两万人的样本进行了目标区域捕获测序。对这样规模的数据构建深度学习模型,在模型设计和计算效率上都不存在太大的困难。




