大佬教程收集整理的这篇文章主要介绍了计数,排序数字并在 tibble 中过滤,大佬教程大佬觉得挺不错的,现在分享给大家,也给大家做个参考。
给定下面的数据,我试图找到最常用的自行车的数据记录。我需要找到记录数最多的自行车ID。然后使用最高的自行车 ID,仅过滤该自行车 ID 的数据并仅显示这些记录。
library(lubrIDatE)
library(tIDyversE)
nycbikes18 <- read_csv("data/2018-citibike-triPDAta.csv",locale = locale(tz = "America/New_York"))
nycbikes18
#> # A tibble: 333,687 x 15
#> tripduration starttime stoptime
#> <dbl> <dttm> <dttm>
#> 1 932 2018-01-01 02:06:17 2018-01-01 02:21:50
#> 2 550 2018-01-01 12:06:18 2018-01-01 12:15:28
#> 3 510 2018-01-01 12:06:56 2018-01-01 12:15:27
#> 4 354 2018-01-01 14:53:10 2018-01-01 14:59:05
#> 5 250 2018-01-01 17:34:30 2018-01-01 17:38:40
#> 6 613 2018-01-01 22:05:05 2018-01-01 22:15:19
#> 7 290 2018-01-02 12:13:51 2018-01-02 12:18:42
#> 8 381 2018-01-02 12:50:03 2018-01-02 12:56:24
#> 9 318 2018-01-02 13:55:58 2018-01-02 14:01:16
#> 10 1852 2018-01-02 16:55:29 2018-01-02 17:26:22
#> # … with 333,677 more rows,and 12 more variables:
#> # start_station_ID <dbl>,start_station_name <chr>,#> # start_station_latitude <dbl>,start_station_longitude <dbl>,#> # end_station_ID <dbl>,end_station_name <chr>,#> # end_station_latitude <dbl>,end_station_longitude <dbl>,#> # bikEID <dbl>,usertype <chr>,birth_year <dbl>,gender <dbl>
我的代码和输出
top_bike_trips <- nycbikes18%>%group_by(bikEID) %>% filter( tripduration==max(tripduration))
top_bike_trips
# A tibble: 900 x 15
# Groups: bikEID [900]
tripduration starttime stoptime
<dbl> <dttm> <dttm>
1 2111 2018-01-11 15:33:24 2018-01-11 16:08:36
2 21262 2018-01-12 13:00:26 2018-01-12 18:54:48
3 1804 2018-01-12 17:10:56 2018-01-12 17:41:01
4 2717 2018-01-30 18:03:31 2018-01-30 18:48:49
5 1892 2018-01-19 18:40:06 2018-01-19 19:11:39
6 563 2018-01-31 09:20:28 2018-01-31 09:29:51
7 50545 2018-01-02 17:58:07 2018-01-03 08:00:32
8 475 2018-01-03 18:03:39 2018-01-03 18:11:34
9 30997 2018-01-19 08:43:44 2018-01-19 17:20:22
10 80854 2018-01-19 18:50:43 2018-01-20 17:18:18
# ... with 890 more rows,and 12 more variables:
预期输出
top_bike_trips
#> # A tibble: 825 x 15
#> tripduration starttime stoptime
#> <dbl> <dttm> <dttm>
#> 1 520 2018-01-03 13:06:21 2018-01-03 13:15:01
#> 2 232 2018-01-03 17:01:21 2018-01-03 17:05:14
#> 3 315 2018-01-14 15:08:14 2018-01-14 15:13:30
#> 4 266 2018-01-23 14:57:30 2018-01-23 15:01:57
#> 5 162 2018-01-24 17:01:10 2018-01-24 17:03:53
#> 6 150 2018-01-25 18:26:58 2018-01-25 18:29:29
#> 7 272 2018-01-03 08:49:11 2018-01-03 08:53:43
#> 8 315 2018-01-20 14:06:28 2018-01-20 14:11:44
#> 9 322 2018-01-02 15:43:42 2018-01-02 15:49:04
#> 10 251 2018-01-10 17:48:03 2018-01-10 17:52:14
#> # … with 815 more rows,and 12 more variables:
#> # start_station_ID <dbl>,gender <dbl>
我不确定如何获得预期的输出 825,也许使用 count() ?
library(dplyr)
mtcars %>%
filter(cyl == names(which.max(table(cyl))))
# mpg cyl disp hp drat wt qsec vs am gear carb
# Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
# Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
# Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
# Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
# Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
# Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
# Lincoln Con@R_197_11520@ental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
# Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
# Dodge ChALLENger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
# AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
# Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
# Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
# Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
# Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
由此看来,尝试一下就足够了
nycbikes18 %>%
filter(bikEID == names(which.max(table(bikEID))))
或者,因为我们必须猜测数据。相反,如果您指的是具有最高 bikEID
数字的 tripdurations
,那么也许
nycbikes18 %>%
filter(bikEID == bikEID[which.max(tripdurations)])
,
基础 R 解决方案:
# Calculate the modal bike id: modal_bike_id => character scalar
modal_bike_id <- tail(names(sort(table(bikEID))),1)
# Subset data set to only contain modal bike id's records: data.frame => stdout(consolE)
nycbikes18[
with(
nycbikes18,bikEID == modal_bike_id
),]
,
library(dplyr)
mtcars %>%
count(cyl) %>%
slice_max(n = 1,n) %>%
SELEct(cyl) %>%
left_join(mtcars,by = 'cyl')
# cyl mpg disp hp drat wt qsec vs am gear carb
#1 8 18.7 360.0 175 3.15 3.440 17.02 0 0 3 2
#2 8 14.3 360.0 245 3.21 3.570 15.84 0 0 3 4
#3 8 16.4 275.8 180 3.07 4.070 17.40 0 0 3 3
#4 8 17.3 275.8 180 3.07 3.730 17.60 0 0 3 3
#5 8 15.2 275.8 180 3.07 3.780 18.00 0 0 3 3
#6 8 10.4 472.0 205 2.93 5.250 17.98 0 0 3 4
#7 8 10.4 460.0 215 3.00 5.424 17.82 0 0 3 4
#8 8 14.7 440.0 230 3.23 5.345 17.42 0 0 3 4
#9 8 15.5 318.0 150 2.76 3.520 16.87 0 0 3 2
#10 8 15.2 304.0 150 3.15 3.435 17.30 0 0 3 2
#11 8 13.3 350.0 245 3.73 3.840 15.41 0 0 3 4
#12 8 19.2 400.0 175 3.08 3.845 17.05 0 0 3 2
#13 8 15.8 351.0 264 4.22 3.170 14.50 0 1 5 4
#14 8 15.0 301.0 335 3.54 3.570 14.60 0 1 5 8
以上是大佬教程为你收集整理的计数,排序数字并在 tibble 中过滤全部内容,希望文章能够帮你解决计数,排序数字并在 tibble 中过滤所遇到的程序开发问题。
如果觉得大佬教程网站内容还不错,欢迎将大佬教程推荐给程序员好友。
本图文内容来源于网友网络收集整理提供,作为学习参考使用,版权属于原作者。
如您有任何意见或建议可联系处理。小编QQ:384754419,请注明来意。