分类导航

Perl 发布时间：2022-04-07 发布网站：大佬教程 code.js-code.com

大佬教程收集整理的这篇文章主要介绍了Perl：从大量数据中删除重复项，大佬教程大佬觉得挺不错的，现在分享给大家，也给大家做个参考。

我正在使用Perl生成一个独特的外显子列表(它是基因的单位).

我已经生成了这种格式的文件(包含数十万行)：

chr1 1000 2000 gene1

chr1 3000 4000 gene2

chr1 5000 6000 gene3

chr1 1000 2000 gene4

位置1是染色体,位置2是外显子的起始坐标,位置3是外显子的结束坐标,位置4是基因名称.

因为基因通常由外显子的不同排列构成,所以在多个基因中具有相同的外显子(参见第一组和第四组).我想删除这些“重复” – 即删除gene1或gene4(不重要的是哪一个被删除).

我把头撞在墙上好几个小时试图做(我认为)这是一项简单的任务.有人能指出我正确的方向吗？我知道人们经常使用哈希来删除重复的元素,但这些并不完全重复(因为基因名称不同).重要的是我也不要丢失基因名称.否则这会更简单.

这是我尝试过的完全无功能的循环. “外显子”数组将每一行存储为标量,因此子程序.不要笑.我知道它不起作用,但至少你可以看到(我希望)我正在尝试做的事情：

for (my $i = 0; $i < scalar @exons; $i++) {
my @temp_line = line_splitter($exons[$i]);                      # runs subroutIne turning scalar into array
for (my $j = 0; $j < scalar @exons_dup; $j++) {
    my @inner_temp_line = line_splitter($exons_dup[$j]);        # runs subroutIne turning scalar into array
    unless (($temp_line[1] == $inner_temp_line[1]) &&           # this loop ensures that the the loop
            ($temp_line[3] eq $inner_temp_line[3])) {           # below skips the identical lines
                if (($temp_line[1] == $inner_temp_line[1]) &&   # if the coordinates are the same
                    ($temp_line[2] == $inner_temp_line[2])) {   # between the comparisons
                        splice(@exons,$i,1);                  # delete the first one
                    }
            }
}

}

解决方法

@H_614_20@my @exons = ( 'chr1 1000 2000 gene1','chr1 3000 4000 gene2','chr1 5000 6000 gene3','chr1 1000 2000 gene4' ); my %unique_exons = map { my ($chro,$scoor,$ecoor,$genE) = (split(/\s+/,$_)); "$chro $scoor $ecoor" => $gene } @exons; print "$_ $unique_exons{$_} \n" for keys %unique_exons;

这将为您提供独特性,并将包含最后一个基因名称.这导致：

chr1 1000 2000 gene4 
chr1 5000 6000 gene3 
chr1 3000 4000 gene2

大佬总结

以上是大佬教程为你收集整理的Perl：从大量数据中删除重复项全部内容，希望文章能够帮你解决Perl：从大量数据中删除重复项所遇到的程序开发问题。

如果觉得大佬教程网站内容还不错，欢迎将大佬教程推荐给程序员好友。

本图文内容来源于网友网络收集整理提供，作为学习参考使用，版权属于原作者。
如您有任何意见或建议可联系处理。小编QQ：384754419，请注明来意。

标签：perl 中删除大量数据重复项

上一篇: perl – 如何用Carp将呼叫标记为... 下一篇:没有副本的指针的Perl SV值

猜你在找的Perl相关文章

从国家自然科学基金里面爬取所有的基金项目 2022-04-07
不想做诗人的程序员不是一个好爸爸 2019-10-06
Perl6 Net::FTP Sample 2019-10-06
A Perl5 Script Copy File Via SSH 2019-10-06
Recursive Find File In Directory 2019-10-06
About Matrix 2019-10-06
A sample use perl www library 2019-10-06
perl post 请求加请求头的方法 2019-10-06
Openresty最佳案例 | 第3篇:Openresty的安装 2019-10-06
一款功能丰富的Perl后门程序分析 2019-10-06