大佬教程收集整理的这篇文章主要介绍了循环多个 excel 文件以创建不同的数据帧,执行分组并在 R 中另存为单个 df,大佬教程大佬觉得挺不错的,现在分享给大家,也给大家做个参考。
我是 R 新手,我有疑问,希望您能提供帮助。
我在一个文件夹中有多个 excel 文件。他们分属不同的子孙,但结构相同。
我想遍历它们,作为数据帧加载到 R 中,执行分组并将所有内容保存在单个数据帧中并导出为单个文件。这可能吗?
通过查看这里的几个答案,我做到了:
# Load the data as different dataframes
library(tIDyversE)
library(readxl)
f <- List.files(pattern="xLSX")
myfiles = lapply(f,read_excel)
for (i in 1:length(f)) assign(f[i],read_excel(f[i],sheet = "Deutsch",skip=7),data.frame(f[i]))
我将它们保存为单个数据帧,我不知道如何一起访问它们,所以我手动创建了一个列表:
List_df = List(filialAA.xLSX,filialAB.xLSX,filianAC.xLSX,filianAD.xLSX,filianAE.xLSX...etC)
然后我创建了一个组来执行一些计算:
for (i in 1:length(List_df))
{
List_df[i] %>%
group_by(ABC) %>%
summarise(`Revenue in EUR` = sum(`Revenue in EUR`),`Weight in KG` = sum(`Weight in KG`),`number of Materials` = length(`Materials`),`Avg of deliverIEs` = mean(`DeliverIEs`))
}
如果我对每个数据框都这样做,它就可以工作。但在这个循环中它没有。 你能帮我遍历所有数据帧,执行这个分组并收集到一个文件中吗?可能吗?
非常感谢您的关注!
编辑:包括一个虚拟数据样本:
> dput(df1)
structure(List(Materials = c("11575358","75378378","21333333","02469984","05465478","05645648"),DeliverIEs = c(8,1,12,5,1),ABC = c("C","A","C","B","C"),`Revenue in EUR` = c(6179,1804802.46,3768.04,9e+05,1597.5,1544.55),`Weight in KG` = c(16.6,4.695625,19,9.14625,2.74041666666667,1.44208333333333)),row.names = c(NA,-6L),class = c("tbl_df","tbl","data.frame"))
> dput(df2)
structure(List(Materials = c("48654798","05465489","04598496","08789453","01589494","06459849","54694985","65498848"),DeliverIEs = c(24,6,32,3,11,30,45,2),`Revenue in EUR` = c(5509,506978,3978.04,7e+05,1200258,2406975,4059),`Weight in KG` = c(29.6,24,50,60,10)),-8L),"data.frame"))
原来的 excel 是 xLSX 格式,有 5000 到 15000 行,大约 20 个功能,7 个选项卡。有 22 个 excel 文件可以循环。
好吧,由于我没有您的文件,可能会出现一些错误,但请尝试以下操作:
# first of,write down your files in xLSX. I use xLSX because I prefere it
#but you should already have them
xLSX::write.xLSX2(df1,"df1.xLSX")
xLSX::write.xLSX2(df1,"df2.xLSX")
library(tidyversE)
library(readxl)
# here you get all the xLSX files
f <- list.files(pattern="xLSX")
f
[1] "df1.xLSX" "df2.xLSX"
# an empty list
listed <- list()
# loop that populate the empty list with your files
for (i in f) {
listed[[i]] <- read_excel(i,sheet = "Sheet1" #,skip = 7
)
print(paste0("read the",i," file")) # here it says what it's doing
}
listed
$df1.xLSX
# A tibble: 6 x 6
...1 Materials Deliveries ABC `Revenue in EUR` `Weight in KG`
<chr> <chr> <dbl> <chr> <dbl> <dbl>
1 1 11575358 8 C 6179 16.6
2 2 75378378 1 A 1804802. 4.70
3 3 21333333 12 C 3768. 19
4 4 02469984 5 B 900000 9.15
5 5 05465478 1 C 1598. 2.74
6 6 05645648 1 C 1545. 1.44
$df2.xLSX
# A tibble: 6 x 6
...1 Materials Deliveries ABC `Revenue in EUR` `Weight in KG`
<chr> <chr> <dbl> <chr> <dbl> <dbl>
1 1 11575358 8 C 6179 16.6
2 2 75378378 1 A 1804802. 4.70
3 3 21333333 12 C 3768. 19
4 4 02469984 5 B 900000 9.15
5 5 05465478 1 C 1598. 2.74
6 6 05645648 1 C 1545. 1.44
# now lapply to each element of the list,the sumMary,creaTing a new list
list_result <- lapply(listed,function(X) x %>%
group_by(ABC) %>%
summarise(
`Revenue in EUR` = sum(`Revenue in EUR`),`Weight in KG` = sum(`Weight in KG`),`number of Materials` = length(`Materials`),`Avg of deliveries` = mean(`Deliveries`)))
# put the result in a data.frame
do.call(rbind,list_result)
# A tibble: 6 x 5
ABC `Revenue in EUR` `Weight in KG` `number of Materials` `Avg of deliveries`
* <chr> <dbl> <dbl> <int> <dbl>
1 A 1804802. 4.70 1 1
2 B 900000 9.15 1 5
3 C 13089. 39.8 4 5.5
4 A 1804802. 4.70 1 1
5 B 900000 9.15 1 5
6 C 13089. 39.8 4 5.5
,
你也可以适当地使用purrr::map
@H_387_10@map_dfr(list_df,~(. %>% group_by(ABC) %>% summarise(`Revenue in EUR` = sum(`Revenue in EUR`),`Avg of deliveries` = mean(`Deliveries`))))
它会同时 rbind
结果。
即使在 @H_387_10@myfiles 中存储文件后,您也可以使用以下语法
library(janitor)
map_dfr(myfiles,~(.[-c(1:5),] %>% row_to_names(1) %>%
group_by(ABC) %>%
summarise(`Revenue in EUR` = sum(as.numeric(`Revenue in EUR`)),`Weight in KG` = sum(as.numeric(`Weight in KG`)),`Avg of deliveries` = mean(as.numeric(`Deliveries`)))
%>% ungroup()))
给定文件的结果
# A tibble: 6 x 5
ABC `Revenue in EUR` `Weight in KG` `number of Materials` `Avg of deliveries`
<chr> <dbl> <dbl> <int> <dbl>
1 A 1804802. 4.70 1 1
2 B 900000 9.15 1 5
3 C 13089. 39.8 4 5.5
4 A 3607233 110 2 37.5
5 B 1206978 28.1 2 4.5
6 C 15144. 66.3 4 17.2
,
我喜欢编写函数,所以我会这样做(尽管时间越长,它会在需要时创建一个更稳定的环境来修改/调试)。
# Main Function
main_function <- function(import,Name){
main_function.create_path() -> path
main_function.create_output() -> output
for(file in list.files(path){
if(!str_detect(file,'csv')){
next
}
read_excel(file,sheet = "Deutsch",skip = 7) -> data
main_function.calculate_values(data) -> data.values
main_function.append_values(file,data,data.values,output) -> output
}
main_function.export(path,output,Name)
if(import){
assign('values',envir = .Globalenv)
}
}
# Functions
main_function.export <- function(path,Name){
write.csv(output,file = paste0(path,name,'.csv'))
}
main_function.append_values <- function(file,output){
# This will create a row in the output file with the name of the file
# without the .csv at the end in the first @R_801_8620@n and put in the
# calculated data in the other @R_801_8620@ns
str_extract(file,".+(?=.csv)") -> output[nrow(output) + 1,'file']
for(col in colnames(data.values)){
data.values[,col] -> output[nrow(output),col]
return(output)
}
main_function.calculate_values <- function(data){
data %>% group_by(ABC) %>%
summarize(`Revenue in EUR` = sum(`Revenue in Eur`,na.rm=TRUE),....) -> data
return(data)
}
main_function.create_path <- function(){
'<path to files>' -> path
return(path)
}
main_function.create_output <- function(){
data.frame('file' = as.character(NA),'Revenue in EUR' = 0,'Weight in KG' = 0,'number of Materials' = 0,'Avg of deliveries' = 0) -> output
return(output)
}
这将创建 @H_387_10@main_function,当调用它时将循环遍历给定路径中列出的所有文件并读取它,处理它,将它保存到 output
,它将保存在相同的带有您给它的名称的路径。
如果您将 import
设置为 TRUE,它也会保存输出
以上是大佬教程为你收集整理的循环多个 excel 文件以创建不同的数据帧,执行分组并在 R 中另存为单个 df全部内容,希望文章能够帮你解决循环多个 excel 文件以创建不同的数据帧,执行分组并在 R 中另存为单个 df所遇到的程序开发问题。
如果觉得大佬教程网站内容还不错,欢迎将大佬教程推荐给程序员好友。
本图文内容来源于网友网络收集整理提供,作为学习参考使用,版权属于原作者。
如您有任何意见或建议可联系处理。小编QQ:384754419,请注明来意。