小马的生信笔记

多倍化分析软件的的另一个选择“WGD V2”

1. 前言

提起多倍化(WGD)分析软件,大家第一时间想到的都是WGDI,但是其实还有另外一款软件也能做多倍化,相比于WGDI,WGD这个软件对于Ks的峰值的拟合更加严格,并且可以得出多倍化的时间,因此今天就只用WGD V2来计算多倍化的时间。补充一下,WGD软件虽然可以进行多倍化分析,但是很麻烦,要一次运行每个模块。Ksrates这个软件集成了WGD软件做所有多倍化分析的流程,直接就能完成分析。所以本教程的多倍化分析使用Ksrates,多倍化时间评估还是使用WGD。

2. 软件下载

				
					mamba create -n wgd python==3.6.5
mamba activate wgd
pip install numpy==1.19.0 -i https://pypi.tuna.tsinghua.edu.cn/simple
pip install wgd==2.0.38 -i https://pypi.tuna.tsinghua.edu.cn/simple
mamba install bioconda::mcl
mamba install bioconda::mafft
mamba install bioconda::fasttree
				
			

3. 运行

3.1 Ksrates

Ksrates和一个基于nextflow的工作流,请自行下载nextflow。运行前要准备2个配置文件,config_elaeis.txt和config_expert.txt。

				
					head config_elaeis.txt

[SPECIES]
focal_species = Prped
# 指定一个焦点物种,焦点物种会做种内的Ks分布和其他物种于焦点物种的Ks分布。

newick_tree = (Zijuj, (Rochi, (Madom, (Pravi, (Prten, ((Prper, Prdul), (Prmon, (Prped, Prtri))))))));

# 指定一个系统发育树,要包含所有多倍化分析的物种

latin_names =       Madom     : Malus_domestica,
                    Pravi     : Prunus_avium,
                    Prdul     : Prunus_dulcis,
                    Prped     : Prunus_pedunculata,
                    Prper     : Prunus_persica,
                    Rochi     : Rosa_chinensis,
                    Zijuj     : Ziziphus_jujuba,
                    Prten     : Prunus_tenella,
                    Prtri     : Prunus_trilpba,
                    Prmon     : Prunus_mongolica

# 物种名字的简写和拉丁名全称

fasta_filenames =   Madom     : sequence_Prunus/Malus_domestica_Chr_cds.fa,
                    Pravi     : sequence_Prunus/Prunus_avium_Chr_cds.fa,
                    Prdul     : sequence_Prunus/Prunus_dulcis_Chr_cds.fa,
                    Prped     : sequence_Prunus/Prunus_pedunculata_Chr.cds.fasta,
                    Prper     : sequence_Prunus/Prunus_persica_Chr_cds.fa,
                    Rochi     : sequence_Prunus/Rosa_chinensis_Chr_cds.fa,
                    Zijuj     : sequence_Prunus/Ziziphus_jujuba_Chr_cds.fa,
                    Prten     : sequence_Prunus/Prunus_tenella_chr.cds.fasta,
                    Prtri     : sequence_Prunus/Prunus_trilpba_trimmed.cds,
                    Prmon     : sequence_Prunus/Prunus_mongolica_trimmed.cds

# 指定所有物种的cds文件路径

gff_filename = /home/majunpeng/sda2/Biantao_Sequence_data/Anno/Final_result/Prunus_pedunculata_Chr.gff
# 指定焦点物种的注释信息gff文件路径,是一个可选项,可以不提供。

[ANALYSIS SETTING]
paranome = yes
collinearity = no
reciprocal_retention = no
gff_feature = mrna
gff_attribute = id
max_number_outgroups = 4
consensus_mode_for_multiple_outgroups = mean among outgroups

[PARAMETERS]
x_axis_max_limit_paralogs_plot = 5
bin_width_paralogs = 0.1
y_axis_max_limit_paralogs_plot = None
num_bootstrap_iterations = 200
divergence_colors =  Red, MediumBlue, Goldenrod, Crimson, ForestGreen, Gray, SaddleBrown, Black
x_axis_max_limit_orthologs_plots = 5
bin_width_orthologs = 0.1
max_ks_paralogs = 5
max_ks_orthologs = 10

				
			
				
					head config_expert.txt

[EXPERT PARAMETERS]

logging_level = info
preserve_ks_tmp_files = no
kde_bandwidth_modifier = 0.4
plot_adjustment_arrows = yes
max_mixture_model_iterations = 600
num_mixture_model_initializations = 10
extra_paralogs_analyses_methods = yes
max_mixture_model_components = 5
max_mixture_model_ks = 5
max_gene_family_size = 200
top_reciprocally_retained_gfs = 2000
use_original_orthomcl_version = no
min_ks_anchor_pairs = 0
				
			

运行Ksrates进行多倍化分析,并且绘图Ks分布图。

				
					nextflow run VIB-PSB/ksrates -profile docker --config config_elaeis.txt --expert config_expert.txt
				
			

3.2 估算多倍化时间

分析原理其实很简单,就是把WGD产生的基因对分成两组,然后通过Mcmctree进行分化时间的估算,进而得到WGD的时间。所以我们要准备一个系统发育树和化石时间,mcmctree的时候要用到。

				
					# 筛选同源基因
wgd dmd Prunus_pedunculata_Chr.cds.fasta -o Prunus_pedunculata_dmd
# 计算KS
wgd ksd Prunus_pedunculata_dmd/Prunus_pedunculata_Chr.cds.fasta.tsv ../../../Anno/Final_result/Prunus_pedunculata_Chr.cds.fasta -o Prunus_pedunculata_ksd -n 50
# 共线性分析
syn -f mRNA -a ID Prunus_pedunculata_dmd/Prunus_pedunculata_Chr.cds.fasta.tsv Prunus_pedunculata_Chr.gff -ks Prunus_pedunculata_ksd/Prunus_pedunculata_Chr.cds.fasta.tsv.ks.tsv
# 拟合峰值wgd peak --heuristic Prunus_pedunculata_ksd/Prunus_pedunculata_Chr.cds.fasta.tsv.ks.tsv -ap wgd_syn/iadhore-out/anchorpoints.txt -sm wgd_syn/iadhore-out/segments.txt -le wgd_syn/iadhore-out/list_elements.txt -mp wgd_syn/iadhore-out/multiplicon_pairs.txt -o wgd_peak
# 筛选同源基因,这一步要加上所有系统发育属上物种的cds序列,同时筛选直系同源基因和旁系同源基
因
wgd dmd -f Prunus_pedunculata -ap wgd_peak/AnchorKs_FindPeak/Prunus_pedunculata.tsv.ks.tsv_95%CI_AP_for_dating_weighted_format.tsv -o wgd_dmd_ortho Amborella_trichopoda Arabidopsis_thaliana Aristolochia_fimbriata Chloranthus_sessilifolius Dryas_drummondii Helianthus_annuus Malus_domestica Oryza_sativa Papaver_somniferum Prunus_avium Prunus_dulcis Prunus_pedunculata Prunus_persica Rosa_chinensis Spiraea_crenata Vitis_vinifera Ziziphus_jujuba
# 估算WGD分化时间
wgd focus --protcocdating --aamodel lg wgd_dmd_ortho/merge_focus_ap.tsv -sp Reroot.tre -o wgd_dating -d mcmctree -ds 'burnin = 2000' -ds 'sampfreq = 1000' -ds 'nsample = 20000'Spiraea_crenata Amborella_trichopoda Arabidopsis_thaliana Aristolochia_fimbriata Chloranthus_sessilifolius Dryas_drummondii Helianthus_annuus Malus_domestica Oryza_sativa Papaver_somniferum Prunus_avium Prunus_dulcis Prunus_persica Rosa_chinensis Spiraea_crenata Vitis_vinifera Ziziphus_jujuba Prunus_pedunculata
				
			

发表评论