我们将使用groupby(). 使用 grouper 功能选择要使用的列。对于下面显示的汽车销售记录示例,我们将按年份分组并计算注册价格与年份间隔的总和。
首先,假设以下是我们的三列 Pandas DataFrame -
# 其中一列为 Date_of_Purchase 的数据框 dataFrame = pd.DataFrame( { "Car": ["Audi", "Lexus", "Tesla", "Mercedes", "BMW", "Toyota", "Nissan", "Bentley", "Mustang"], "Date_of_Purchase": [pd.Timestamp("2021-06-10"), pd.Timestamp("2019-07-11"), pd.Timestamp("2016-06-25"), pd.Timestamp("2021-06-29"), pd.Timestamp("2020-03-20"), pd.Timestamp("2019-01-22"), pd.Timestamp("2011-01-06"), pd.Timestamp("2013-01-04"), pd.Timestamp("2014-05-09") ], "Reg_Price": [1000, 1400, 1100, 900, 1700, 1800, 1300, 1150, 1350] } )
接下来,使用 Grouper 在 groupby 函数中选择 Date_of_Purchase 列。频率设置为 3Y,即以 3 年为一组进行分组。
以下是代码 -
import pandas as pd # 其中一列为 Date_of_Purchase 的数据框 dataFrame = pd.DataFrame( { "Car": ["Audi", "Lexus", "Tesla", "Mercedes", "BMW", "Toyota", "Nissan", "Bentley", "Mustang"], "Date_of_Purchase": [pd.Timestamp("2021-06-10"), pd.Timestamp("2019-07-11"), pd.Timestamp("2016-06-25"), pd.Timestamp("2021-06-29"), pd.Timestamp("2020-03-20"), pd.Timestamp("2019-01-22"), pd.Timestamp("2011-01-06"), pd.Timestamp("2013-01-04"), pd.Timestamp("2014-05-09") ], "Reg_Price": [1000, 1400, 1100, 900, 1700, 1800, 1300, 1150, 1350] } ) print("DataFrame...\n",dataFrame) # Grouper 在 groupby 函数中选择 Date_of_Purchase 列 print("\nGroup Dataframe by 3 years...\n",dataFrame.groupby(pd.Grouper(key='Date_of_Purchase', axis=0, freq='3Y')).sum())输出结果
这将产生以下输出 -
DataFrame... Car Date_of_Purchase Reg_Price 0 Audi 2021-06-10 1000 1 Lexus 2019-07-11 1400 2 Tesla 2016-06-25 1100 3 Mercedes 2021-06-29 900 4 BMW 2020-03-20 1700 5 Toyota 2019-01-22 1800 6 Nissan 2011-01-06 1300 7 Bentley 2013-01-04 1150 8 Mustang 2014-05-09 1350 Group Dataframe by 3 years... Reg_Price Date_of_Purchase 2011-12-31 1300 2014-12-31 2500 2017-12-31 1100 2020-12-31 4900 2023-12-31 1900