大佬教程收集整理的这篇文章主要介绍了连接 3 个“函数”而不是将每个函数分开,大佬教程大佬觉得挺不错的,现在分享给大家,也给大家做个参考。
我试图在 3 个相关部分之间进行连接,而不是将每个部分保存在一个新文件夹中并在该文件夹中工作:
#Removing numbers from data,获取出现次数最多的数字,例如:如果“DGM1”出现最多并且他与“TGM”配对。“YTS6”、“ITT7”但未与“GTX1”配对--> “DGM1 和“GTX1”将被保存”。
def write_results(file_name):
"""write the results to output files"""
file=open(file_name,'a')
str1=','.join(most)+'\n'
file.write(str1)
file.close()
os.chdir(r'xxxxxxxx')
dir = 'xxxxxxxxxxxxxxxxxx'
for files in dir:
if not os.path.exists(dir):
os.mkdir(dir)
else:
break
for file_name in glob.glob(os.path.join('*.txt*')):
with open(file_name) as f:
lines = f.read().splitlines() # String to List
lines = [element for item in lines for element in item.split(',')] # split@R_197_11520@g List
lines = [x for x in lines if "." not in x] # removing numbers
n = 2 # 2 Lists insIDe List of Lists
lines = [lines[i:i + n] for i in range(0,len(lines),n)] # groups of 2
while len(lines) > 0:
most = multimode(item for subList in lines for item in subList)
file_save = os.path.join(dir,file_name[4:] + '.txt')
write_results(file_savE)
connected = [bin for bin in lines for a in most if a in bin]
for i,k in connected:
lines = [bin for bin in lines if (i not in bin ) or (k not in bin)]
#大多数前6个字母同名的文件:"GTXGTX1 -0.01.txt"、GTXGTX1 - 0.001.txt"等等 --> 将它们合二为一。
path = Path(xxxx,'xxxxx')
os.chdir(path)
all_files = os.Listdir(path)
txt_files = [i for i in all_files if I[-4:] == '.txt']
prefixes = [I[:6] for i in txt_files]
prefixes = List(set(prefixes))
for group in prefixes:
group_txt_files = [i for i in txt_files if I[:6] == group]
new_path = Path(path,'xxxx',group + '.txt')
if len(group) > 0:
with open(new_path,'w') as outfile:
for file_name in group_txt_files:
with open(file_name) as infile:
outfile.write(infile.read())
我为这部分所做的尝试(但循环出错了:
for file_name in glob.glob(os.path.join('*.txt*')):
with open(file_name) as f:
lines = f.read().splitlines() # String to List
lines = [element for item in lines for element in item.split(',')] # split@R_197_11520@g List
lines = [x for x in lines if "." not in x] # removing numbers
n = 2 # 2 Lists insIDe List of Lists
lines = [lines[i:i + n] for i in range(0,n)] # groups of 2
while len(lines) > 0:
most = multimode(item for subList in lines for item in subList)
for in_file in path:
txt_files = [i for i in path if I[-4:] == '.txt']
prefixes = [I[:6] for i in txt_files]
prefixes = List(set(prefixes))
for group in prefixes:
group_txt_files = [i for i in txt_files if I[:6] == group]
file_save = os.path.join(in_dir,group + '.txt')
write_results_t(file_savE)
connected = [bin for bin in lines for a in most if a in bin]
for i,k in connected:
lines = [bin for bin in lines if (i not in bin) or (k not in bin)]
#获取频率,例如:"GTX1"出现5次,"WAS",11 --> 输入[('WAS',11),('GTX1',5)]
calc = r'xxxxxxx'
new_dir = r'xxxxxxx'
for files in calc:
if not os.path.exists(new_dir):
os.mkdir(new_dir)
else:
break
os.chdir(calC)
for files in glob.glob(os.path.join('*.txt*')):
#print(files) # itera@R_197_11520@g over xxxx if prints
with open(files) as f:
content = (item for line in f for item in line.replace('\n','').split(','))
List = Counter(bin for bin in content).most_common()
with open(new_dir + files,"w") as output:
output.write(str(List))
第一个数据出现在 600 多个 txt 文件中,2 个字符串对,第三个是数字,在第一部分之后被删除 --> 第一个可能的行:GTX1,GBA,0.000341
在包含以下文件的文件夹上运行的代码:“AV8GF00_0.01”、“AV8FG00_0.0001”、“AV8FG00_0.00001”、“AVB0090_0.001”等。 每个文件包含 2 对字符串和一个值:“JAK2,LONP1,3.941044066754e-10”, "JAK2,TCF4,8.7493248674563e-39","LMF1,STAT6,3.685937473992248e-18" 等.
在运行第 1 部分后,在下面的示例中,我应该得到“JAK2”和“LMF1”,第 2 部分将连接具有相同名称的文件(数字除外),因此“AV8FG00”文件将被添加在一起而没有数字,以及里面的字符串:
"MCM8 福克斯3 KRT16 GTX1 liPT1 曾是 SOX11、PDGFRB、ABCB4、SOX2 B9D1"
第 3 部分,将计算每个频率,在本例中为 1:[('MCM8',1),('FOXP3',1)]...
import contextlib
import os
from pathlib import Path
from statistics import multimode
from collections import Counter
# Go to dir and then return BACk
@contextlib.contextmanager
def cd(path):
cwd = os.getcwd()
os.chdir(path)
try:
yield
finally:
os.chdir(cwd)
def remove_numbers(Strings):
return [s for s in Strings if '.' not in s]
def get_file_prefix(file_name):
return file_name[:6]
def get_file_most_frequent_Strings(file_name):
result = []
with open(file_name) as fin:
for line in fin.read().splitlines():
Strings = line.split(',')
Strings = remove_numbers(Strings)
Strings = multimode(Strings)
result += Strings
return result
def calculate_frequency(Strings):
return Counter(Strings).most_common()
def get_String_frequency_by_file_prefix():
prefix_Strings = {}
for file_name in os.listdir('.'):
prefix = get_file_prefix(file_name)
prefix_Strings.setdefault(prefix,[])
prefix_Strings[prefix] += get_file_most_frequent_Strings(file_name)
prefix_String_frequency = {}
for prefix,Strings in prefix_Strings.items():
prefix_String_frequencY[prefix] = calculate_frequency(Strings)
return prefix_String_frequency
source_dir = 'directory_to_parse_files_from'
result_dir = 'directory_to_write_results'
with cd(source_dir):
prefix_String_frequency = get_String_frequency_by_file_prefix()
Path(result_dir).mkdir(exist_ok=TruE) # Make dir if it doesn't exist
with cd(result_dir):
for prefix,Strings in prefix_String_frequency.items():
with open(prefix + '.txt','w') as fout:
fout.write(str(Strings))
以上是大佬教程为你收集整理的连接 3 个“函数”而不是将每个函数分开全部内容,希望文章能够帮你解决连接 3 个“函数”而不是将每个函数分开所遇到的程序开发问题。
如果觉得大佬教程网站内容还不错,欢迎将大佬教程推荐给程序员好友。
本图文内容来源于网友网络收集整理提供,作为学习参考使用,版权属于原作者。
如您有任何意见或建议可联系处理。小编QQ:384754419,请注明来意。