-
Notifications
You must be signed in to change notification settings - Fork 1
/
example_corpus.dat
195 lines (195 loc) · 9.67 KB
/
example_corpus.dat
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
ID 001R_FRG3G Reviewed; 256 AA.
AC Q6GZX4;
DT 28-JUN-2011, integrated into UniProtKB/Swiss-Prot.
DT 19-JUL-2004, sequence version 1.
DT 01-APR-2015, entry version 30.
DE RecName: Full=Putative transcription factor 001R;
GN ORFNames=FV3-001R;
OS Frog virus 3 (isolate Goorha) (FV-3).
OC Viruses; dsDNA viruses, no RNA stage; Iridoviridae; Ranavirus.
OX NCBI_TaxID=654924;
OH NCBI_TaxID=8295; Ambystoma (mole salamanders).
OH NCBI_TaxID=30343; Hyla versicolor (chameleon treefrog).
OH NCBI_TaxID=8404; Lithobates pipiens (Northern leopard frog) (Rana pipiens).
OH NCBI_TaxID=8316; Notophthalmus viridescens (Eastern newt) (Triturus viridescens).
OH NCBI_TaxID=45438; Rana sylvatica (Wood frog).
RN [1]
RP NUCLEOTIDE SEQUENCE [LARGE SCALE GENOMIC DNA].
RX PubMed=15165820; DOI=10.1016/j.virol.2004.02.019;
RA Tan W.G., Barkman T.J., Gregory Chinchar V., Essani K.;
RT "Comparative genomic analyses of frog virus 3, type species of the
RT genus Ranavirus (family Iridoviridae).";
RL Virology 323:70-84(2004).
CC -!- FUNCTION: Transcription activation. {ECO:0000305}.
CC -----------------------------------------------------------------------
CC Copyrighted by the UniProt Consortium, see http://www.uniprot.org/terms
CC Distributed under the Creative Commons Attribution-NoDerivs License
CC -----------------------------------------------------------------------
DR EMBL; AY548484; AAT09660.1; -; Genomic_DNA.
DR RefSeq; YP_031579.1; NC_005946.1.
DR ProteinModelPortal; Q6GZX4; -.
DR GeneID; 2947773; -.
DR KEGG; vg:2947773; -.
DR Proteomes; UP000008770; Genome.
DR GO; GO:0006355; P:regulation of transcription, DNA-templated; IEA:UniProtKB-KW.
DR GO; GO:0046782; P:regulation of viral transcription; IEA:InterPro.
DR GO; GO:0006351; P:transcription, DNA-templated; IEA:UniProtKB-KW.
DR InterPro; IPR007031; Poxvirus_VLTF3.
DR Pfam; PF04947; Pox_VLTF3; 1.
PE 4: Predicted;
KW Activator; Complete proteome; Reference proteome; Transcription;
KW Transcription regulation.
FT CHAIN 1 256 Putative transcription factor 001R.
FT /FTId=PRO_0000410512.
FT COMPBIAS 14 17 Poly-Arg.
SQ SEQUENCE 256 AA; 29735 MW; B4840739BF7D4121 CRC64;
MAFSAEDVLK EYDRRRRMEA LLLSLYYPND RKLLDYKEWS PPRVQVECPK APVEWNNPPS
EKGLIVGHFS GIKYKGEKAQ ASEVDVNKMC CWVSKFKDAM RRYQGIQTCK IPGKVLSDLD
AKIKAYNLTV EGVEGFVRYS RVTKQHVAAF LKELRHSKQY ENVNLIHYIL TDKRVDIQHL
EKDLVKDFKA LVESAHRMRQ GHMINVKYIL YQLLKKHGHG PDGPDILTVK TGSKGVLYDD
SFRKIYTDLG WKFTPL
//
ID 002L_FRG3G Reviewed; 320 AA.
AC Q6GZX3;
DT 28-JUN-2011, integrated into UniProtKB/Swiss-Prot.
DT 19-JUL-2004, sequence version 1.
DT 01-APR-2015, entry version 30.
DE RecName: Full=Uncharacterized protein 002L;
GN ORFNames=FV3-002L;
OS Frog virus 3 (isolate Goorha) (FV-3).
OC Viruses; dsDNA viruses, no RNA stage; Iridoviridae; Ranavirus.
OX NCBI_TaxID=654924;
OH NCBI_TaxID=8295; Ambystoma (mole salamanders).
OH NCBI_TaxID=30343; Hyla versicolor (chameleon treefrog).
OH NCBI_TaxID=8404; Lithobates pipiens (Northern leopard frog) (Rana pipiens).
OH NCBI_TaxID=8316; Notophthalmus viridescens (Eastern newt) (Triturus viridescens).
OH NCBI_TaxID=45438; Rana sylvatica (Wood frog).
RN [1]
RP NUCLEOTIDE SEQUENCE [LARGE SCALE GENOMIC DNA].
RX PubMed=15165820; DOI=10.1016/j.virol.2004.02.019;
RA Tan W.G., Barkman T.J., Gregory Chinchar V., Essani K.;
RT "Comparative genomic analyses of frog virus 3, type species of the
RT genus Ranavirus (family Iridoviridae).";
RL Virology 323:70-84(2004).
CC -!- SUBCELLULAR LOCATION: Host membrane {ECO:0000305}; Single-pass
CC membrane protein {ECO:0000305}.
CC -----------------------------------------------------------------------
CC Copyrighted by the UniProt Consortium, see http://www.uniprot.org/terms
CC Distributed under the Creative Commons Attribution-NoDerivs License
CC -----------------------------------------------------------------------
DR EMBL; AY548484; AAT09661.1; -; Genomic_DNA.
DR RefSeq; YP_031580.1; NC_005946.1.
DR ProteinModelPortal; Q6GZX3; -.
DR GeneID; 2947774; -.
DR KEGG; vg:2947774; -.
DR Proteomes; UP000008770; Genome.
DR GO; GO:0033644; C:host cell membrane; IEA:UniProtKB-SubCell.
DR GO; GO:0016021; C:integral component of membrane; IEA:UniProtKB-KW.
DR InterPro; IPR004251; Vaccinia_virus_A16.
DR Pfam; PF03003; DUF230; 1.
PE 4: Predicted;
KW Complete proteome; Host membrane; Membrane; Reference proteome;
KW Transmembrane; Transmembrane helix.
FT CHAIN 1 320 Uncharacterized protein 002L.
FT /FTId=PRO_0000410509.
FT TRANSMEM 301 318 Helical. {ECO:0000255}.
FT COMPBIAS 263 295 Pro-rich.
SQ SEQUENCE 320 AA; 34642 MW; 9E110808B6E328E0 CRC64;
MSIIGATRLQ NDKSDTYSAG PCYAGGCSAF TPRGTCGKDW DLGEQTCASG FCTSQPLCAR
IKKTQVCGLR YSSKGKDPLV SAEWDSRGAP YVRCTYDADL IDTQAQVDQF VSMFGESPSL
AERYCMRGVK NTAGELVSRV SSDADPAGGW CRKWYSAHRG PDQDAALGSF CIKNPGAADC
KCINRASDPV YQKVKTLHAY PDQCWYVPCA ADVGELKMGT QRDTPTNCPT QVCQIVFNML
DDGSVTMDDV KNTINCDFSK YVPPPPPPKP TPPTPPTPPT PPTPPTPPTP PTPRPVHNRK
VMFFVAGAVL VAILISTVRW
//
ID 002R_IIV3 Reviewed; 458 AA.
AC Q197F8;
DT 16-JUN-2009, integrated into UniProtKB/Swiss-Prot.
DT 11-JUL-2006, sequence version 1.
DT 01-APR-2015, entry version 17.
DE RecName: Full=Uncharacterized protein 002R;
GN ORFNames=IIV3-002R;
OS Invertebrate iridescent virus 3 (IIV-3) (Mosquito iridescent virus).
OC Viruses; dsDNA viruses, no RNA stage; Iridoviridae; Chloriridovirus.
OX NCBI_TaxID=345201;
OH NCBI_TaxID=7163; Aedes vexans (Inland floodwater mosquito) (Culex vexans).
OH NCBI_TaxID=42431; Culex territans.
OH NCBI_TaxID=332058; Culiseta annulata.
OH NCBI_TaxID=310513; Ochlerotatus sollicitans (eastern saltmarsh mosquito).
OH NCBI_TaxID=329105; Ochlerotatus taeniorhynchus (Black salt marsh mosquito) (Aedes taeniorhynchus).
OH NCBI_TaxID=7183; Psorophora ferox.
RN [1]
RP NUCLEOTIDE SEQUENCE [LARGE SCALE GENOMIC DNA].
RX PubMed=16912294; DOI=10.1128/JVI.00464-06;
RA Delhon G., Tulman E.R., Afonso C.L., Lu Z., Becnel J.J., Moser B.A.,
RA Kutish G.F., Rock D.L.;
RT "Genome of invertebrate iridescent virus type 3 (mosquito iridescent
RT virus).";
RL J. Virol. 80:8439-8449(2006).
CC -----------------------------------------------------------------------
CC Copyrighted by the UniProt Consortium, see http://www.uniprot.org/terms
CC Distributed under the Creative Commons Attribution-NoDerivs License
CC -----------------------------------------------------------------------
DR EMBL; DQ643392; ABF82032.1; -; Genomic_DNA.
DR RefSeq; YP_654574.1; NC_008187.1.
DR ProteinModelPortal; Q197F8; -.
DR GeneID; 4156251; -.
DR KEGG; vg:4156251; -.
DR Proteomes; UP000001358; Genome.
PE 4: Predicted;
KW Complete proteome; Reference proteome.
FT CHAIN 1 458 Uncharacterized protein 002R.
FT /FTId=PRO_0000377938.
SQ SEQUENCE 458 AA; 53921 MW; E46E5C85D7ACA139 CRC64;
MASNTVSAQG GSNRPVRDFS NIQDVAQFLL FDPIWNEQPG SIVPWKMNRE QALAERYPEL
QTSEPSEDYS GPVESLELLP LEIKLDIMQY LSWEQISWCK HPWLWTRWYK DNVVRVSAIT
FEDFQREYAF PEKIQEIHFT DTRAEEIKAI LETTPNVTRL VIRRIDDMNY NTHGDLGLDD
LEFLTHLMVE DACGFTDFWA PSLTHLTIKN LDMHPRWFGP VMDGIKSMQS TLKYLYIFET
YGVNKPFVQW CTDNIETFYC TNSYRYENVP RPIYVWVLFQ EDEWHGYRVE DNKFHRRYMY
STILHKRDTD WVENNPLKTP AQVEMYKFLL RISQLNRDGT GYESDSDPEN EHFDDESFSS
GEEDSSDEDD PTWAPDSDDS DWETETEEEP SVAARILEKG KLTITNLMKS LGFKPKPKKI
QSIDRYFCSL DSNYNSEDED FEYDSDSEDD DSDSEDDC
//
ID 003L_IIV3 Reviewed; 156 AA.
AC Q197F7;
DT 16-JUN-2009, integrated into UniProtKB/Swiss-Prot.
DT 11-JUL-2006, sequence version 1.
DT 01-APR-2015, entry version 17.
DE RecName: Full=Uncharacterized protein 003L;
GN ORFNames=IIV3-003L;
OS Invertebrate iridescent virus 3 (IIV-3) (Mosquito iridescent virus).
OC Viruses; dsDNA viruses, no RNA stage; Iridoviridae; Chloriridovirus.
OX NCBI_TaxID=345201;
OH NCBI_TaxID=7163; Aedes vexans (Inland floodwater mosquito) (Culex vexans).
OH NCBI_TaxID=42431; Culex territans.
OH NCBI_TaxID=332058; Culiseta annulata.
OH NCBI_TaxID=310513; Ochlerotatus sollicitans (eastern saltmarsh mosquito).
OH NCBI_TaxID=329105; Ochlerotatus taeniorhynchus (Black salt marsh mosquito) (Aedes taeniorhynchus).
OH NCBI_TaxID=7183; Psorophora ferox.
RN [1]
RP NUCLEOTIDE SEQUENCE [LARGE SCALE GENOMIC DNA].
RX PubMed=16912294; DOI=10.1128/JVI.00464-06;
RA Delhon G., Tulman E.R., Afonso C.L., Lu Z., Becnel J.J., Moser B.A.,
RA Kutish G.F., Rock D.L.;
RT "Genome of invertebrate iridescent virus type 3 (mosquito iridescent
RT virus).";
RL J. Virol. 80:8439-8449(2006).
CC -----------------------------------------------------------------------
CC Copyrighted by the UniProt Consortium, see http://www.uniprot.org/terms
CC Distributed under the Creative Commons Attribution-NoDerivs License
CC -----------------------------------------------------------------------
DR EMBL; DQ643392; ABF82033.1; -; Genomic_DNA.
DR RefSeq; YP_654575.1; NC_008187.1.
DR ProteinModelPortal; Q197F7; -.
DR GeneID; 4156252; -.
DR KEGG; vg:4156252; -.
DR Proteomes; UP000001358; Genome.
PE 4: Predicted;
KW Complete proteome; Reference proteome.
FT CHAIN 1 156 Uncharacterized protein 003L.
FT /FTId=PRO_0000377939.
SQ SEQUENCE 156 AA; 17043 MW; D48A43940FF8C815 CRC64;
MYQAINPCPQ SWYGSPQLER EIVCKMSGAP HYPNYYPVHP NALGGAWFDT SLNARSLTTT
PSLTTCTPPS LAACTPPTSL GMVDSPPHIN PPRRIGTLCF DFGSAKSPQR CECVASDRPS
TTSNTAPDTY RLLITNSKTR KNNYGTCRLE PLTYGI
//