loops - R ddply rollingmean help: Need to capture rolling mean by Unique ID -
i'm struggling desired output using ddply. believe on right track think failing output data loop, inside loop...
sample data:
player, career_game, date, era, pitches gio gonzalez, 176, aug 1, 3.0, 86 gio gonzalez, 177, aug 5, 4.01, 89 gio gonzalez, 178, aug 10, 4, 11 gio gonzalez, 179, aug 16, 4.06, 102 gio gonzalez, 180, aug 21, 3.83, 97 ............... jordan zimmermann, 114, apr 4, 1.8, 81 jordan zimmermann, 115, apr 9, 8.1, 57 jordan zimmermann, 116, apr 14, 5.27, 93 jordan zimmermann, 117, apr 19, 3.92, 100 ..............
ill call data frame, bb.
so trying accomplish want average of previous, lets 5 games each player @ each instance... example far have code below....
pitchers_5 = data.frame(ddply(bb, ~player, tail, n=5, numcolwise(mean)))
this calculates previous 5 games player (career_games 176 through 180). however, average each observation. career_game 177, code calculate mean games 172 through 176, spit out instance 177 having mean of previous 5 games continue instance 178, , recalculate previous 5 games , on... using data above, once code got gio gonzalez 181st career game, (the average of previous 5 games)
gio gonzalez, 178, date (not necessary), 3.78, 77
update: metrics comment has led me zoo package's rollmean function. have since read few posts , answers similar problem looking further guidance (rolling mean (moving average) group/id dplyr). link resolves similar problem mine except in 2 areas. calculates rolling mean of blood pressure unique id new field, want calculate rolling mean of many fields. includes blood pressure observation on mean calculation. example, im looking for....
if calculate rolling means of gio gonzalez 180th game, want mean of games 175 though 179. not including 180th game results.
thanks!
assuming want rolling mean of era
, pitches
, using 3 instead of 5 illustration due size of sample data set:
library(plyr) library(zoo) cbind(bb, ddply(bb, ~ player, function(x) rollapply(x[c("era", "pitches")], list(-(1:3)), mean, fill = na)))[-6]
giving:
player career_game date era pitches era.1 pitches.1 1 gio gonzalez 176 aug 1 3.00 86 na na 2 gio gonzalez 177 aug 5 4.01 89 na na 3 gio gonzalez 178 aug 10 4.00 11 na na 4 gio gonzalez 179 aug 16 4.06 102 3.670000 62.00000 5 gio gonzalez 180 aug 21 3.83 97 4.023333 67.33333 6 jordan zimmermann 114 apr 4 1.80 81 na na 7 jordan zimmermann 115 apr 9 8.10 57 na na 8 jordan zimmermann 116 apr 14 5.27 93 na na 9 jordan zimmermann 117 apr 19 3.92 100 5.056667 77.00000
if possible groups have less 4 rows use this. if there 1 row returns nas. if there less 4 rows reduces k
still returns something.
f <- function(x) { x <- as.matrix(x[c("era", "pitches")]) k <- min(3, nrow(x)-1) if (k) rollapply(x, list(-(1:k)), mean, fill = na) else na * x } cbind(bb, ddply(bb, ~ player, f))[-6]
note: used input:
lines <- "player, career_game, date, era, pitches gio gonzalez, 176, aug 1, 3.0, 86 gio gonzalez, 177, aug 5, 4.01, 89 gio gonzalez, 178, aug 10, 4, 11 gio gonzalez, 179, aug 16, 4.06, 102 gio gonzalez, 180, aug 21, 3.83, 97 jordan zimmermann, 114, apr 4, 1.8, 81 jordan zimmermann, 115, apr 9, 8.1, 57 jordan zimmermann, 116, apr 14, 5.27, 93 jordan zimmermann, 117, apr 19, 3.92, 100" bb <- read.csv(text = lines, strip.white = true, as.is = true)
updated use plyr requested. added variation handles small groups.
Comments
Post a Comment