1. 前言
提起多倍化(WGD)分析软件,大家第一时间想到的都是WGDI,但是其实还有另外一款软件也能做多倍化,相比于WGDI,WGD这个软件对于Ks的峰值的拟合更加严格,并且可以得出多倍化的时间,因此今天就只用WGD V2来计算多倍化的时间。补充一下,WGD软件虽然可以进行多倍化分析,但是很麻烦,要一次运行每个模块。Ksrates这个软件集成了WGD软件做所有多倍化分析的流程,直接就能完成分析。所以本教程的多倍化分析使用Ksrates,多倍化时间评估还是使用WGD。
2. 软件下载
mamba create -n wgd python==3.6.5
mamba activate wgd
pip install numpy==1.19.0 -i https://pypi.tuna.tsinghua.edu.cn/simple
pip install wgd==2.0.38 -i https://pypi.tuna.tsinghua.edu.cn/simple
mamba install bioconda::mcl
mamba install bioconda::mafft
mamba install bioconda::fasttree
3. 运行
3.1 Ksrates
Ksrates和一个基于nextflow的工作流,请自行下载nextflow。运行前要准备2个配置文件,config_elaeis.txt和config_expert.txt。
head config_elaeis.txt
[SPECIES]
focal_species = Prped
# 指定一个焦点物种,焦点物种会做种内的Ks分布和其他物种于焦点物种的Ks分布。
newick_tree = (Zijuj, (Rochi, (Madom, (Pravi, (Prten, ((Prper, Prdul), (Prmon, (Prped, Prtri))))))));
# 指定一个系统发育树,要包含所有多倍化分析的物种
latin_names = Madom : Malus_domestica,
Pravi : Prunus_avium,
Prdul : Prunus_dulcis,
Prped : Prunus_pedunculata,
Prper : Prunus_persica,
Rochi : Rosa_chinensis,
Zijuj : Ziziphus_jujuba,
Prten : Prunus_tenella,
Prtri : Prunus_trilpba,
Prmon : Prunus_mongolica
# 物种名字的简写和拉丁名全称
fasta_filenames = Madom : sequence_Prunus/Malus_domestica_Chr_cds.fa,
Pravi : sequence_Prunus/Prunus_avium_Chr_cds.fa,
Prdul : sequence_Prunus/Prunus_dulcis_Chr_cds.fa,
Prped : sequence_Prunus/Prunus_pedunculata_Chr.cds.fasta,
Prper : sequence_Prunus/Prunus_persica_Chr_cds.fa,
Rochi : sequence_Prunus/Rosa_chinensis_Chr_cds.fa,
Zijuj : sequence_Prunus/Ziziphus_jujuba_Chr_cds.fa,
Prten : sequence_Prunus/Prunus_tenella_chr.cds.fasta,
Prtri : sequence_Prunus/Prunus_trilpba_trimmed.cds,
Prmon : sequence_Prunus/Prunus_mongolica_trimmed.cds
# 指定所有物种的cds文件路径
gff_filename = /home/majunpeng/sda2/Biantao_Sequence_data/Anno/Final_result/Prunus_pedunculata_Chr.gff
# 指定焦点物种的注释信息gff文件路径,是一个可选项,可以不提供。
[ANALYSIS SETTING]
paranome = yes
collinearity = no
reciprocal_retention = no
gff_feature = mrna
gff_attribute = id
max_number_outgroups = 4
consensus_mode_for_multiple_outgroups = mean among outgroups
[PARAMETERS]
x_axis_max_limit_paralogs_plot = 5
bin_width_paralogs = 0.1
y_axis_max_limit_paralogs_plot = None
num_bootstrap_iterations = 200
divergence_colors = Red, MediumBlue, Goldenrod, Crimson, ForestGreen, Gray, SaddleBrown, Black
x_axis_max_limit_orthologs_plots = 5
bin_width_orthologs = 0.1
max_ks_paralogs = 5
max_ks_orthologs = 10
head config_expert.txt
[EXPERT PARAMETERS]
logging_level = info
preserve_ks_tmp_files = no
kde_bandwidth_modifier = 0.4
plot_adjustment_arrows = yes
max_mixture_model_iterations = 600
num_mixture_model_initializations = 10
extra_paralogs_analyses_methods = yes
max_mixture_model_components = 5
max_mixture_model_ks = 5
max_gene_family_size = 200
top_reciprocally_retained_gfs = 2000
use_original_orthomcl_version = no
min_ks_anchor_pairs = 0
运行Ksrates进行多倍化分析,并且绘图Ks分布图。
nextflow run VIB-PSB/ksrates -profile docker --config config_elaeis.txt --expert config_expert.txt
3.2 估算多倍化时间
分析原理其实很简单,就是把WGD产生的基因对分成两组,然后通过Mcmctree进行分化时间的估算,进而得到WGD的时间。所以我们要准备一个系统发育树和化石时间,mcmctree的时候要用到。
# 筛选同源基因
wgd dmd Prunus_pedunculata_Chr.cds.fasta -o Prunus_pedunculata_dmd
# 计算KS
wgd ksd Prunus_pedunculata_dmd/Prunus_pedunculata_Chr.cds.fasta.tsv ../../../Anno/Final_result/Prunus_pedunculata_Chr.cds.fasta -o Prunus_pedunculata_ksd -n 50
# 共线性分析
syn -f mRNA -a ID Prunus_pedunculata_dmd/Prunus_pedunculata_Chr.cds.fasta.tsv Prunus_pedunculata_Chr.gff -ks Prunus_pedunculata_ksd/Prunus_pedunculata_Chr.cds.fasta.tsv.ks.tsv
# 拟合峰值wgd peak --heuristic Prunus_pedunculata_ksd/Prunus_pedunculata_Chr.cds.fasta.tsv.ks.tsv -ap wgd_syn/iadhore-out/anchorpoints.txt -sm wgd_syn/iadhore-out/segments.txt -le wgd_syn/iadhore-out/list_elements.txt -mp wgd_syn/iadhore-out/multiplicon_pairs.txt -o wgd_peak
# 筛选同源基因,这一步要加上所有系统发育属上物种的cds序列,同时筛选直系同源基因和旁系同源基
因
wgd dmd -f Prunus_pedunculata -ap wgd_peak/AnchorKs_FindPeak/Prunus_pedunculata.tsv.ks.tsv_95%CI_AP_for_dating_weighted_format.tsv -o wgd_dmd_ortho Amborella_trichopoda Arabidopsis_thaliana Aristolochia_fimbriata Chloranthus_sessilifolius Dryas_drummondii Helianthus_annuus Malus_domestica Oryza_sativa Papaver_somniferum Prunus_avium Prunus_dulcis Prunus_pedunculata Prunus_persica Rosa_chinensis Spiraea_crenata Vitis_vinifera Ziziphus_jujuba
# 估算WGD分化时间
wgd focus --protcocdating --aamodel lg wgd_dmd_ortho/merge_focus_ap.tsv -sp Reroot.tre -o wgd_dating -d mcmctree -ds 'burnin = 2000' -ds 'sampfreq = 1000' -ds 'nsample = 20000'Spiraea_crenata Amborella_trichopoda Arabidopsis_thaliana Aristolochia_fimbriata Chloranthus_sessilifolius Dryas_drummondii Helianthus_annuus Malus_domestica Oryza_sativa Papaver_somniferum Prunus_avium Prunus_dulcis Prunus_persica Rosa_chinensis Spiraea_crenata Vitis_vinifera Ziziphus_jujuba Prunus_pedunculata