removing NA values from a DataFrame in Python 3.4 -
import pandas pd import statistics df=print(pd.read_csv('001.csv',keep_default_na=false, na_values=[""])) print(df)
i using code create data frame has no na values. have couple of csv files , want calculate mean of 1 of columns - sulfate. column has many 'na' values, trying exclude. after using above code, 'na's aren't excluded data frame. please suggest.
method 1 :
df[['a','c']].apply(lambda x: my_func(x) if(np.all(pd.notnull(x[1]))) else x, axis = 1)
use pandas notnull
method 2 :
df = df[np.isfinite(df['eps'])]
method 3 : using dropna here
in [24]: df = pd.dataframe(np.random.randn(10,3)) in [25]: df.ix[::2,0] = np.nan; df.ix[::4,1] = np.nan; df.ix[::3,2] = np.nan; in [26]: df out[26]: 0 1 2 0 nan nan nan 1 2.677677 -1.466923 -0.750366 2 nan 0.798002 -0.906038 3 0.672201 0.964789 nan 4 nan nan 0.050742 5 -1.250970 0.030561 -2.678622 6 nan 1.036043 nan 7 0.049896 -0.308003 0.823295 8 nan nan 0.637482 9 -0.310130 0.078891 nan in [27]: df.dropna() #drop rows have nan values out[27]: 0 1 2 1 2.677677 -1.466923 -0.750366 5 -1.250970 0.030561 -2.678622 7 0.049896 -0.308003 0.823295
Comments
Post a Comment