五月天青色头像情侣网名,国产亚洲av片在线观看18女人,黑人巨茎大战俄罗斯美女,扒下她的小内裤打屁股

歡迎光臨散文網(wǎng) 會員登陸 & 注冊

【ROSALIND】【練Python,學生信】25 序列拼接

2020-01-11 20:40 作者:未琢  | 我要投稿

如果第一次閱讀本系列文檔請先移步閱讀【ROSALIND】【練Python,學生信】00 寫在前面 ?謝謝配合~

題目:

將基因組片段組裝為最短“超串”

Given: At most 50 DNA strings of approximately equal length, not exceeding 1 kbp, in FASTA format (which represent reads deriving from the same strand of a single linear chromosome). The dataset is guaranteed to satisfy the following condition: there exists a unique way to reconstruct the entire chromosome from these reads by gluing together pairs of reads that overlap by more than half their length..

所給:不超過50個DNA序列,長度相近且不會超過1kb,,以fasta格式給出,代表來自一個線性染色體上同一條鏈的片段。同時數(shù)據(jù)集滿足如下要求:肯定有唯一解法,使這些片段兩兩重合長度大于其一般長度,從而組裝成一個完整序列。

Return: A shortest superstring containing all the given strings (thus corresponding to a reconstructed chromosome).

需得:由所給的所有序列組成的最短完整序列。

?

特別提醒

所給數(shù)據(jù)集是極其理想化的,在實際中reads不可能只來自同一條鏈。而且在實際中直接拼接出完整基因組的可能性極小,通常只能得到臨近的部分序列,稱為contigs(重疊群)。

?

測試數(shù)據(jù)

>Rosalind_56

ATTAGACCTG

>Rosalind_57

CCTGCCGGAA

>Rosalind_58

AGACCTGCCG

>Rosalind_59

GCCGGAATAC

測試輸出

ATTAGACCTGCCGGAATAC

?

背景

測序使我們進行生物學研究的重要方法,現(xiàn)代高通量測序中,我們只能得到長度有限的部分序列片段,稱為reads,如何把大量reads拼接成完整的基因組成為亟待解決的重要問題。序列拼接的示意圖如下:

?

思路

我的想法是將所有序列兩兩比較,每次都將重合度最大的兩個序列合并,重復這個過程直至所有序列合并為一個序列。在這個思路的指導下,我用了大量的循環(huán),最多處達到三層循環(huán),但只要不被循環(huán)繞暈,整體思路還是較為簡單直接的。

?

?

代碼

def readfasta(lines):

"""讀入fasta格式文件的函數(shù)"""

??? seq = []
??? index = []
??? seqplast =
""
???
numlines = 0
???
for i in lines:
???????
if '>' in i:
??????????? index.append(i.replace(
"\n", "").replace(">", ""))
??????????? seq.append(seqplast.replace(
"\n", ""))
??????????? seqplast =
""
???????????
numlines += 1
???????
else:
??????????? seqplast = seqplast + i.replace(
"\n", "")
??????????? numlines +=
1
???????
if numlines == len(lines):
??????????? seq.append(seqplast.replace(
"\n", ""))
??? seq = seq[
1:]
???
return index, seq

?


def findoverlap(s1,s2):
???
"""返回兩個序列重疊堿基數(shù)量的函數(shù)"""
???
n1 = len(s1)
??? n2 =
len(s2)
???
if n1 >= n2:
??????? i = n2
??????? flag =
0
???????
while i > 0:
???????????
if s1[-i:] == s2[:i]: # 把前一個序列的后部與第二個序列的前部依次比較
???????????????
flag = i # 用flag變量記錄重疊的堿基數(shù)
???????????????
break
???????????
i -= 1
???
else:
??????? i = n1 -
1
???????
flag = 0
???????
while i > 1:
???????????
if s1[-i:] == s2[:i]:
??????????????? flag = i
???????????????
break
?????????? ?
i -= 1

???
return flag


def isoverlap(s1, s2):
???
"""判斷兩個序列前后順序的函數(shù)"""

???
flag1 = findoverlap(s1, s2)
??? flag2 = findoverlap(s2, s1)

???
if flag1 >= flag2: # 比較兩個序列的先后位置對重疊堿基數(shù)的影響
???????
flag = flag1
??????? orders =
1 # 用orders變量記錄先后順序
???
elif flag2 > flag1:
??????? flag = flag2
??????? orders =
2

???
return flag, orders


def addseq(n, seq1, seq2):
???
"""將兩個序列合并成一個的函數(shù)"""
???
n1 = len(seq1)
??? aseq = seq1[:n1 - n] + seq2
???
return aseq


f =
open('input.txt', 'r')
lines = f.readlines()
f.close()
[index, seq] = readfasta(lines)

seq1 =
''
seq2 = ''
while len(seq) > 1: # 用一個大循環(huán)控制,直至所有片段合成一個
???
for i in range(len(seq)): # 下面的兩個for循環(huán)實現(xiàn)seq中所有序列兩兩比對
???????
miniseq = [] # miniseq用來防止序列自身比對
???????
k =0
???????
while k < len(seq):
???????????
if k != i:
??????????????? miniseq.append(seq[k])
??????????? k +=
1
???????
maxoverlap = 0 # 用maxoverlap記錄當前輪比較最多的重合堿基數(shù)
???????
for j in range(len(miniseq)):
??????????? [overlap, orders] = isoverlap(seq[i], miniseq[j])
???????????
if overlap > maxoverlap: # 本次比較與之前的最大重合數(shù)比,若更大,則代替后者稱為新的最大重合堿基數(shù)
???????????????
maxoverlap = overlap
???????????????
if orders == 1: # seq1和seq2記錄帶來最大重合堿基數(shù)的兩個序列,orders指示順序
???????????????????
seq1 = seq[i]
??????????????????? seq2 = miniseq[j]
???????????????
else:
??????????????????? seq1 = miniseq[j]
??????????????????? seq2 = seq[i]
??? totalseq = addseq(maxoverlap, seq1, seq2)
# 將本輪具有最大重合度的兩個序列合并
???
seq.remove(seq1) # 刪去已被合并的兩個序列
???
seq.remove(seq2)
??? seq.append(totalseq)
# 將合并的新串加入seq序列,進入下一輪比較

f = open('output.txt', 'w')
f.write(totalseq)
f.close()

?


【ROSALIND】【練Python,學生信】25 序列拼接的評論 (共 條)

分享到微博請遵守國家法律
手游| 宣化县| 双桥区| 惠东县| 平果县| 米易县| 嘉善县| 客服| 昌都县| 偃师市| 丹江口市| 望谟县| 松原市| 南丹县| 涿鹿县| 丰顺县| 汶川县| 运城市| 浏阳市| 应城市| 江永县| 无棣县| 肥城市| 衡东县| 伊川县| 泽库县| 德令哈市| 牡丹江市| 民县| 明水县| 临清市| 巢湖市| 醴陵市| 新干县| 封丘县| 东明县| 定襄县| 虎林市| 建湖县| 方城县| 崇文区|