Thursday, 15 March 2012

r - inserting missing dates into data frame -



r - inserting missing dates into data frame -

i have info frame:

date company part units 1 1/1/2012 ibm america 10 2 1/1/2012 ibm europe 4 3 1/1/2012 ibm pacific 2 4 1/1/2012 hp america 10 5 1/1/2012 hp europe 2 6 1/1/2012 gateway americas 2 7 1/2/2012 ibm americas 10 8 1/2/2012 hp europe 2 9 1/12/2012 gateway americas 10 dput(x) structure(list(date = structure(c(1l, 1l, 1l, 1l, 1l, 1l, 3l, 3l, 2l), .label = c("1/1/2012", "1/12/2012", "1/2/2012"), class = "factor"), company = structure(c(3l, 3l, 3l, 2l, 2l, 1l, 3l, 2l, 4l), .label = c(" gateway", " hp", " ibm", " gateway"), class = "factor"), part = structure(c(3l, 5l, 6l, 1l, 2l, 7l, 4l, 2l, 7l), .label = c(" america", " europe", " america", " americas", " europe", " pacific", " americas"), class = "factor"), units = c(10l, 4l, 2l, 10l, 2l, 2l, 10l, 2l, 10l)), .names = c("date", "company", "region", "units"), class = "data.frame", row.names = c(na, -9l))

i create heatmap there lots of missing dates not good. need fill in ) units each missing part , date.

i need have 3 regions each company , each date, , if date , part missing, insert , set 0 units.

i can create vector 1/1/2012 1/12/2012 dates:

d<-seq(as.date(c("1/1/2012"), format="%m/%d/%y"), as.date(c("12/12/2012"), format="%m/%d/%y"), by="mon")

for each company, have check dates in vector d exist 3 region, if not insert units 0.

is there easy way this? guidance appreciated.

you can utilize expand.grid if don't know company part or date values before hand.

> <- structure(list(date = structure(c(1l, 1l, 1l, 1l, 1l, 1l, 3l, + 3l, 2l), .label = c("1/1/2012", "1/12/2012", "1/2/2012"), class = "factor"), + company = structure(c(3l, 3l, 3l, 2l, 2l, 1l, 3l, 2l, 4l), .label = c(" gateway", + " hp", " ibm", " gateway"), class = "factor"), part = structure(c(3l, + 5l, 6l, 1l, 2l, 7l, 4l, 2l, 7l), .label = c(" america", + " europe", " america", " americas", + " europe", " pacific", " americas"), class = "factor"), + units = c(10l, 4l, 2l, 10l, 2l, 2l, 10l, 2l, 10l)), .names = c("date", + "company", "region", "units"), class = "data.frame", row.names = c(na, + -9l)) > > b <- expand.grid(date=unique(a$date), company=unique(a$company), region=unique(a$region)) > > > > z <- merge(x=b,y=a, all.x=t) > > z[is.na(z)] <- 0 > z date company part units 1 1/1/2012 gateway america 0 2 1/1/2012 gateway europe 0 3 1/1/2012 gateway america 0 4 1/1/2012 gateway americas 0 5 1/1/2012 gateway europe 0 6 1/1/2012 gateway pacific 0 7 1/1/2012 gateway americas 2 8 1/1/2012 hp america 10 9 1/1/2012 hp europe 2 10 1/1/2012 hp america 0 11 1/1/2012 hp americas 0 12 1/1/2012 hp europe 0 13 1/1/2012 hp pacific 0 14 1/1/2012 hp americas 0 15 1/1/2012 ibm america 0 16 1/1/2012 ibm europe 0 17 1/1/2012 ibm america 10 18 1/1/2012 ibm americas 0 19 1/1/2012 ibm europe 4 20 1/1/2012 ibm pacific 2 21 1/1/2012 ibm americas 0 22 1/1/2012 gateway america 0 23 1/1/2012 gateway europe 0 24 1/1/2012 gateway america 0 25 1/1/2012 gateway americas 0 26 1/1/2012 gateway europe 0 27 1/1/2012 gateway pacific 0 28 1/1/2012 gateway americas 0 29 1/12/2012 gateway america 0 30 1/12/2012 gateway europe 0 31 1/12/2012 gateway america 0 32 1/12/2012 gateway americas 0 33 1/12/2012 gateway europe 0 34 1/12/2012 gateway pacific 0 35 1/12/2012 gateway americas 0 36 1/12/2012 hp america 0 37 1/12/2012 hp europe 0 38 1/12/2012 hp america 0 39 1/12/2012 hp americas 0 40 1/12/2012 hp europe 0 41 1/12/2012 hp pacific 0 42 1/12/2012 hp americas 0 43 1/12/2012 ibm america 0 44 1/12/2012 ibm europe 0 45 1/12/2012 ibm america 0 46 1/12/2012 ibm americas 0 47 1/12/2012 ibm europe 0 48 1/12/2012 ibm pacific 0 49 1/12/2012 ibm americas 0 50 1/12/2012 gateway america 0 51 1/12/2012 gateway europe 0 52 1/12/2012 gateway america 0 53 1/12/2012 gateway americas 0 54 1/12/2012 gateway europe 0 55 1/12/2012 gateway pacific 0 56 1/12/2012 gateway americas 10 57 1/2/2012 gateway america 0 58 1/2/2012 gateway europe 0 59 1/2/2012 gateway america 0 60 1/2/2012 gateway americas 0 61 1/2/2012 gateway europe 0 62 1/2/2012 gateway pacific 0 63 1/2/2012 gateway americas 0 64 1/2/2012 hp america 0 65 1/2/2012 hp europe 2 66 1/2/2012 hp america 0 67 1/2/2012 hp americas 0 68 1/2/2012 hp europe 0 69 1/2/2012 hp pacific 0 70 1/2/2012 hp americas 0 71 1/2/2012 ibm america 0 72 1/2/2012 ibm europe 0 73 1/2/2012 ibm america 0 74 1/2/2012 ibm americas 10 75 1/2/2012 ibm europe 0 76 1/2/2012 ibm pacific 0 77 1/2/2012 ibm americas 0 78 1/2/2012 gateway america 0 79 1/2/2012 gateway europe 0 80 1/2/2012 gateway america 0 81 1/2/2012 gateway americas 0 82 1/2/2012 gateway europe 0 83 1/2/2012 gateway pacific 0 84 1/2/2012 gateway americas 0 note : seems info has duplicate values of america , gateway , such. hence appear more 1 time while using expand.grid

r

No comments:

Post a Comment