【对比python】可重复的条件分组 | 润乾 -九游会登陆
任务:按在公司的工龄将员工分段分组统计每组的男女工人数
python
1 | import pandas as pd |
2 | import datetime |
3 | def eval_g(dd:dict,ss:str): |
4 | return eval(ss,dd) |
5 | emp_file = 'e:\\txt\\employee.txt' |
6 | emp_info = pd.read_csv(emp_file,sep='\t') |
7 | employed_list = ['within five years','five to ten years','more than ten years','over fifteen years'] |
8 | employed_str_list = ["(s<5)","(s>=5) & (s<10)","(s>=10)","(s>=15)"] |
9 | today = datetime.datetime.today().year |
10 | arr = pd.to_datetime(emp_info['hiredate']) |
11 | employed = today-arr.dt.year |
12 | emp_info['employed']=employed |
13 | dd = {'s':emp_info['employed']} |
14 | group_cond = [] |
15 | for n in range(len(employed_str_list)): |
16 | emp_g = emp_info.groupby(eval_g(dd,employed_str_list[n])) |
17 | emp_g_index = [index for index in emp_g.size().index] |
18 | if true not in emp_g_index: |
19 | female_emp=0 |
20 | male_emp=0 |
21 | else: |
22 | group = emp_g.get_group(true) |
23 | sum_emp = len(group) |
24 | female_emp = len(group[group['gender']=='f']) |
25 | male_emp = sum_emp-female_emp |
26 | group_cond.append([employed_list[n],male_emp,female_emp]) |
27 | group_df = pd.dataframe(group_cond,columns=['employed','male','female']) |
28 | print(group_df) |
pandas没有现成的重复条件分组的函数,所以只能按照条件重新分组,取到满足条件的分组。
集算器
a | b | |
1 | ?<5 | within five years |
2 | ?>=5 && ?<10 | five to ten years |
3 | ?>=10 | more than ten years |
4 | ?>=15 | over fifteen years |
5 | e:\\txt\\employee.txt | |
6 | =[a1:a4] | =a6.concat@c() |
7 | =file(a5).import@t() | =a7.derive(age@y(hiredate):employed) |
8 | =b7.enum@r(a6,employed) | =[b1:b4] |
9 | =a8.new(b8(#):employed,~.count(gender=="m"):male,~.count(gender=="f"):femal) |
集算器有强大的枚举分组功能,可以轻松实现重复的条件分组。