loops - R ddply rollingmean help: Need to capture rolling mean by Unique ID -


i'm struggling desired output using ddply. believe on right track think failing output data loop, inside loop...
sample data:

player, career_game, date, era, pitches  gio gonzalez, 176,  aug 1,  3.0,    86  gio gonzalez, 177,  aug 5,  4.01,   89  gio gonzalez, 178,  aug 10, 4,  11  gio gonzalez, 179,  aug 16, 4.06,   102  gio gonzalez, 180,  aug 21, 3.83,   97  ...............  jordan zimmermann,  114,    apr 4,  1.8,    81  jordan zimmermann,  115,    apr 9,  8.1,    57  jordan zimmermann,  116,    apr 14, 5.27,   93  jordan zimmermann,  117,    apr 19, 3.92,   100  .............. 

ill call data frame, bb.

so trying accomplish want average of previous, lets 5 games each player @ each instance... example far have code below....

pitchers_5 = data.frame(ddply(bb, ~player, tail, n=5, numcolwise(mean))) 

this calculates previous 5 games player (career_games 176 through 180). however, average each observation. career_game 177, code calculate mean games 172 through 176, spit out instance 177 having mean of previous 5 games continue instance 178, , recalculate previous 5 games , on... using data above, once code got gio gonzalez 181st career game, (the average of previous 5 games)

gio gonzalez, 178,  date (not necessary),   3.78,   77 

update: metrics comment has led me zoo package's rollmean function. have since read few posts , answers similar problem looking further guidance (rolling mean (moving average) group/id dplyr). link resolves similar problem mine except in 2 areas. calculates rolling mean of blood pressure unique id new field, want calculate rolling mean of many fields. includes blood pressure observation on mean calculation. example, im looking for....
if calculate rolling means of gio gonzalez 180th game, want mean of games 175 though 179. not including 180th game results.

thanks!

assuming want rolling mean of era , pitches , using 3 instead of 5 illustration due size of sample data set:

library(plyr) library(zoo)  cbind(bb, ddply(bb, ~ player,    function(x) rollapply(x[c("era", "pitches")], list(-(1:3)), mean, fill = na)))[-6] 

giving:

             player career_game   date  era pitches    era.1 pitches.1 1      gio gonzalez         176  aug 1 3.00      86       na        na 2      gio gonzalez         177  aug 5 4.01      89       na        na 3      gio gonzalez         178 aug 10 4.00      11       na        na 4      gio gonzalez         179 aug 16 4.06     102 3.670000  62.00000 5      gio gonzalez         180 aug 21 3.83      97 4.023333  67.33333 6 jordan zimmermann         114  apr 4 1.80      81       na        na 7 jordan zimmermann         115  apr 9 8.10      57       na        na 8 jordan zimmermann         116 apr 14 5.27      93       na        na 9 jordan zimmermann         117 apr 19 3.92     100 5.056667  77.00000 

if possible groups have less 4 rows use this. if there 1 row returns nas. if there less 4 rows reduces k still returns something.

f <- function(x) {     x <- as.matrix(x[c("era", "pitches")])     k <- min(3, nrow(x)-1)     if (k) rollapply(x, list(-(1:k)), mean, fill = na) else na * x }  cbind(bb, ddply(bb, ~ player, f))[-6] 

note: used input:

lines <- "player, career_game, date, era, pitches gio gonzalez, 176,  aug 1,  3.0,    86 gio gonzalez, 177,  aug 5,  4.01,   89 gio gonzalez, 178,  aug 10, 4,  11 gio gonzalez, 179,  aug 16, 4.06,   102 gio gonzalez, 180,  aug 21, 3.83,   97 jordan zimmermann,  114,    apr 4,  1.8,    81 jordan zimmermann,  115,    apr 9,  8.1,    57 jordan zimmermann,  116,    apr 14, 5.27,   93 jordan zimmermann,  117,    apr 19, 3.92,   100"  bb <- read.csv(text = lines, strip.white = true, as.is = true) 

updated use plyr requested. added variation handles small groups.


Comments

Popular posts from this blog

node.js - Mongoose: Cast to ObjectId failed for value on newly created object after setting the value -

gradle error "Cannot convert the provided notation to a File or URI" -

python - NameError: name 'subprocess' is not defined -