python - Building a dataframe in an efficient way from dictionary -

- September 15, 2011

i have large set of data have process , generated dictionary. want create dataframe dictionary. vales of dictionary list of tuples. values need find out unique values build columns of dataframe:

d = {'0001': [('skiing',0.789),('snow',0.65),('winter',0.56)],'0002': [('drama', 0.89),('comedy', 0.678),('action',-0.42) ('winter',-0.12),('kids',0.12)],'0003': [('action', 0.89),('funny', 0.58),('sports',0.12)],'0004': [('dark', 0.89),('mystery', 0.678),('crime',0.12), ('adult',-0.423)],'0005': [('cartoon', -0.89),('comedy', 0.678),('action',0.12)],'0006': [('drama', -0.49),('funny', 0.378),('suspense',0.12), ('thriller',0.78)],'0007': [('dark', 0.79),('mystery', 0.88),('crime',0.32), ('adult',-0.423)]}

(size of dictionary close 800,000 records)

i iterate on dictionary find out unique headers:

col_headers = [] entities = [] key, scores in d.iteritems():     entities.append(key)     d[key] = dict(scores)     col_headers.extend(d[key].keys()) col_headers = list(set(col_headers))

i believe take long time process. using dict might issue since slower. further more when construct data frame raw raw further slows down process:

df = pd.dataframe(columns=col_headers, index=entities) k in d:     df.loc[k] = pd.series(d[k]) df.fillna(0.0, axis=1)

how can speed process reduce process time?

@ajcr gets it.

but need unwrap internal key-value pairs dictionary along way.

df = pd.dataframe.from_dict({ k: dict(v) k,v in d.items() },                               orient="index").fillna(0)

then optionally, if want homogenize style of column titles:

df.columns = [c.lower() c in df.columns]

enter image description here

if wanted go entirely crazy, sort columns:

df = df.sort(axis=1)

enter image description here

Search This Blog

Sort

python - Building a dataframe in an efficient way from dictionary -

Comments

Post a Comment

Popular posts from this blog

node.js - Mongoose: Cast to ObjectId failed for value on newly created object after setting the value -

how does one get csharp-sqlite to throw exceptions for duplicates or foreign key constraint violations -

Simple Angular 2 project fails 'Unexpected reserved word' -