/images/avatar.jpg

TF-IDF

TF-IDF简要介绍 TF-IDF(term frequency–inverse document frequency),一种常用于挖掘文章中关键词的加权技术。某个词在文章(一篇文章

PubTator

API调用 1 2 3 4 5 6 7 8 9 10 11 12 13 #! /bin/bash id_list='/home/yh/dzx/work/BioNLP/EntityAnnotation/pmid.txt' #读取id时跳过pmid.txt第一行列名 while read line do if [ $line != 'pmid' ] then curl https://www.ncbi.nlm.nih.gov/research/pubtator-api/publications/export/pubtator?pmids=$line >> /home/yh/dzx/work/BioNLP/EntityAnnotation/abstract_pubtator.txt echo >> /home/yh/dzx/work/BioNLP/EntityAnnotation/abstract_pubtator.txt fi sleep 2 done < $id_list 1 cat abstract_pubtator.txt|grep -v

perl中正则表达式

运用实例 1 2 3 4 5 6 7 8 9 10 11 12 =pod 下列是一条蛋白质序列,请统计该序列长度、丙氨酸(A)的个数及所占的比例; MNAPERQPQPDGGDAPGHEPGGSPQDELDFSILFDYEYLNPNEEEPNAHKVASPPSGPAYPDDVLDYGLKPYSPLASLSGEPPGRFGEPDRVGPQKFLSAAKPAGASGLSPRIEITPSHELIQAVGPLRMRDAGLLVEQPPLAGVAASPRFTLPVPGFEGYREPLCLSPASSGSSASFISDTFSPYTSPCVSPNNGGPDDLCPQFQNIPAHYSPRTSPIMSPRTSLAEDSCLGRHSPVPRPASRSSSPGAKRRHSCAEALVALPPGASPQRSRSPSPQPSSHVAPQDHGSPAGYPPVAGSAVIMDALNSLATDSPCGIPPKMWKTSP =cut $protein_sequence = "MNAPERQPQPDGGDAPGHEPGGSPQDELDFSILFDYEYLNPNEEEPNAHKVASPPSGPAYPDDVLDYGLKPYSPLASLSGEPPGRFGEPDRVGPQKFLSAAKPAGASGLSPRIEITPSHELIQAVGPLRMRDAGLLVEQPPLAGVAASPRFTLPVPGFEGYREPLCLSPASSGSSASFISDTFSPYTSPCVSPNNGGPDDLCPQFQNIPAHYSPRTSPIMSPRTSLAEDSCLGRHSPVPRPASRSSSPGAKRRHSCAEALVALPPGASPQRSRSPSPQPSSHVAPQDHGSPAGYPPVAGSAVIMDALNSLATDSPCGIPPKMWKTSP"; $len = length($protein_sequence); $A_count = $protein_sequence =~ s/A/A/g; $A_per = ($A_count/$len)*100; print