经常面对这类任务?
- 对着500多个excel做同样的汇总统计
- 去除文本文件里的重复行
- 对比两个csv有啥差异
- 几十个excel合并起来,把大excel拆成多个小的
- 几个excel的数据要用同一个列关联起来
1 | import pandas as pd |
2 | def iterate(col): |
3 | prev = 0; |
4 | res = 0; |
5 | val = 0; |
6 | for curr in col: |
7 | if curr – prev > 0: |
8 | res = 1; |
9 | else: |
10 | res = 0; |
11 | prev = curr; |
12 | if val < res: |
13 | val = res; |
14 | return val; |
15 | data = pd.read_excel('d:/stock.xlsx',sheet_name=0). sort_values('date').groupby('company')['price'].apply(iterate); |
a | |
---|---|
1 | =file("d:/stock.xlsx").xlsimport@t().sort(date).group(company) |
2 | =0 |
3 | =a1.max(a2=if(price>price[-1],a2 1,0)) |
集合运算领域专业语法,同样过程代码更简洁!
a | |
---|---|
1 | $select * from d:/stock.xlsx where company='0001' order by date |
2 | $select company,max(price),min(price) from d:/stock.xls where month(date)=1 group by company |
3 | $select company.name,stock.date,stock.price from d:/stock.xls stock left join d:/company.txt company on stock.company=company.id |
old.csv | new.csv | |
username,date,salevalue,salecount | username,date,salevalue,salecount | |
---|---|---|
1 | rachel,2015-03-01,4500,9 | rachel,2015-03-01,4500,9 |
2 | rachel,2015-03-03,8700,4 | rachel,2015-03-02,5000,5 |
3 | tom,2015-03-02,3000,8 | ashley,2015-03-01,6000,5 |
4 | tom,2015-03-03,5000,7 | rachel,2015-03-03,11700,4 |
5 | tom,2015-03-04,6000,12 | tom,2015-03-03,5000,7 |
6 | john,2015-03-04,4800,4 |
a | b | c | |
---|---|---|---|
1 | =file("d:\\old.csv").import@ct() | =file("d:\\new.csv").import@ct() | /逗号分隔的文本 |
2 | =new=[b1,a1].merge@od() | /求差集 |
customer id | customer name | invoice number | amount | purchase date |
---|---|---|---|---|
1234 | john smith | 100-0002 | $1,200.00 | 2013/1/1 |
2345 | mary harrison | 100-0003 | $1,425.00 | 2013/1/6 |
3456 | lucy gomez | 100-0004 | $1,390.00 | 2013/1/11 |
4567 | rupert jones | 100-0005 | $1,257.00 | 2013/1/18 |
5678 | jenny walters | 100-0006 | $1,725.00 | 2013/1/24 |
6789 | samantha donaldson | 100-0007 | $1,995.00 | 2013/1/31 |
a | b | c | |
---|---|---|---|
1 | for directory@p("d:/excel/*.xlsx") | =file(a1).xlsopen() | /循环目录下每个excel |
2 | =b1.conj(b1.xlsimport@t(;~.stname)) | /循环每个sheet | |
3 | =@|b2 | /依次合并sheet | |
4 | =a4.groups (month('purchase date'):month;sum(amount):total,avg(amount):average) |
/分组汇总 |