python - Building a dataframe in an efficient way from dictionary -
i have large set of data have process , generated dictionary. want create dataframe dictionary. vales of dictionary list of tuples. values need find out unique values build columns of dataframe:
d = {'0001': [('skiing',0.789),('snow',0.65),('winter',0.56)],'0002': [('drama', 0.89),('comedy', 0.678),('action',-0.42) ('winter',-0.12),('kids',0.12)],'0003': [('action', 0.89),('funny', 0.58),('sports',0.12)],'0004': [('dark', 0.89),('mystery', 0.678),('crime',0.12), ('adult',-0.423)],'0005': [('cartoon', -0.89),('comedy', 0.678),('action',0.12)],'0006': [('drama', -0.49),('funny', 0.378),('suspense',0.12), ('thriller',0.78)],'0007': [('dark', 0.79),('mystery', 0.88),('crime',0.32), ('adult',-0.423)]}
(size of dictionary close 800,000 records)
i iterate on dictionary find out unique headers:
col_headers = [] entities = [] key, scores in d.iteritems(): entities.append(key) d[key] = dict(scores) col_headers.extend(d[key].keys()) col_headers = list(set(col_headers))
i believe take long time process. using dict
might issue since slower. further more when construct data frame raw raw further slows down process:
df = pd.dataframe(columns=col_headers, index=entities) k in d: df.loc[k] = pd.series(d[k]) df.fillna(0.0, axis=1)
how can speed process reduce process time?
but need unwrap internal key-value pairs dictionary along way.
df = pd.dataframe.from_dict({ k: dict(v) k,v in d.items() }, orient="index").fillna(0)
then optionally, if want homogenize style of column titles:
df.columns = [c.lower() c in df.columns]
if wanted go entirely crazy, sort columns:
df = df.sort(axis=1)
Comments
Post a Comment