【对比python】日志处理3 | 润乾 -九游会登陆

任务:每条日志不定行,每条记录开始的时候有固定的标记。

python

1 import pandas as pd
2 log_file = 'e://txt//indefinite _info2.txt'
3 log_info = pd.read_csv(log_file,header=none)
4 group_cond = log_info[0].apply(lambda x:1 if x.split("\t")[0].split(":")[0]=="userid" else 0).cumsum()
5 log_g = log_info.groupby(group_cond,sort=false)
6 columns = ["userid","gender","age","salary","province","musicid","watch_time","time"]
7 df_dic = {}
8 for c in columns:
9     df_dic[c]=[]
10 for index,group in log_g:
11     rec_dic = {}
12     rec = group.values.flatten()
13     rec = '\t'.join(rec).split("\t")
14     for r in rec:
15         v = r.split(":")
16         rec_dic[v[0]]=v[1]
17     for col in columns:
18         if col not in rec_dic.keys():
19             df_dic[col].append(none)
20         else:
21             df_dic[col].append(rec_dic[col])
22 df = pd.dataframe(df_dic)
23 print(df)

pandas没有按条件分组的功能,需要构造出一个按条件分组的数组。

集算器

  a  
1 e://txt//indefinite _info2.txt  
2 [userid,gender,age,salary,province,musicid,watch_time,time]  
3 =file(a1).import@s()  
4 =a3.group@i(_1.array("\t")(1).array("\:")(1)=="userid")  
5 =a4.(~.(_1.array("\t")).conj().align(a2,~.array("\:")(1)).(~.array("\:")(2))).conj()  
6 =create(${a2.concat@c()}).record(a5)  

集算器强大的分组功能和循环计算能力,代码简单、明了。

网站地图