Hee: r - Subset dataframe by most number of daily records -

Friday, 15 April 2011

r - Subset dataframe by most number of daily records -

i working big dataset, illustration can shown below. bulk of individual files have process there should more 1 day's worth of data.

date <- c("05/12/2012 05:00:00", "05/12/2012 06:00:00", "05/12/2012 07:00:00",           "05/12/2012 08:00:00", "06/12/2012 07:00:00", "06/12/2012 08:00:00",            "07/12/2012 05:00:00", "07/12/2012 06:00:00", "07/12/2012 07:00:00",           "07/12/2012 08:00:00") date <- strptime(date, "%d/%m/%y %h:%m") c <- c("0","1","5","4","6","8","0","3","10","6") c <- as.numeric(c) df1 <- data.frame(date,c,stringsasfactors = false)

i wish left info on single day. day chosen having number of info points day. if reason 2 days tied (with maximum number of info points), wish select day highest individual value recorded.

in illustration dataframe given above, left 7th dec. has 4 info points (as has 5th dec), has highest value recorded out of these 2 days (i.e. 10).

here's solution tapply.

# count rows per day , find maximum c value res <- with(df1, tapply(c, as.date(date), function(x) c(length(x), max(x))))  # order these 2 values in decreasing order , find associated day # (at top position): maxdate <- names(res)[order(sapply(res, "[", 1),                              sapply(res, "[", 2), decreasing = true)[1]]  # subset   info frame: subset(df1, as.character(as.date(date)) %in% maxdate)                    date  c 7  2012-12-07 05:00:00  0 8  2012-12-07 06:00:00  3 9  2012-12-07 07:00:00 10 10 2012-12-07 08:00:00  6

r data.frame subset

Hee

Friday, 15 April 2011

r - Subset dataframe by most number of daily records -

No comments:

Post a Comment