r - Subset dataframe by most number of daily records -
i working big dataset, illustration can shown below. bulk of individual files have process there should more 1 day's worth of data.
date <- c("05/12/2012 05:00:00", "05/12/2012 06:00:00", "05/12/2012 07:00:00", "05/12/2012 08:00:00", "06/12/2012 07:00:00", "06/12/2012 08:00:00", "07/12/2012 05:00:00", "07/12/2012 06:00:00", "07/12/2012 07:00:00", "07/12/2012 08:00:00") date <- strptime(date, "%d/%m/%y %h:%m") c <- c("0","1","5","4","6","8","0","3","10","6") c <- as.numeric(c) df1 <- data.frame(date,c,stringsasfactors = false)
i wish left info on single day. day chosen having number of info points day. if reason 2 days tied (with maximum number of info points), wish select day highest individual value recorded.
in illustration dataframe given above, left 7th dec. has 4 info points (as has 5th dec), has highest value recorded out of these 2 days (i.e. 10).
here's solution tapply
.
# count rows per day , find maximum c value res <- with(df1, tapply(c, as.date(date), function(x) c(length(x), max(x)))) # order these 2 values in decreasing order , find associated day # (at top position): maxdate <- names(res)[order(sapply(res, "[", 1), sapply(res, "[", 2), decreasing = true)[1]] # subset info frame: subset(df1, as.character(as.date(date)) %in% maxdate) date c 7 2012-12-07 05:00:00 0 8 2012-12-07 06:00:00 3 9 2012-12-07 07:00:00 10 10 2012-12-07 08:00:00 6
r data.frame subset
No comments:
Post a Comment