分类导航

程序问答发布时间：2022-06-02 发布网站：大佬教程 code.js-code.com

大佬教程收集整理的这篇文章主要介绍了在 Pandas 1.3 中升级多索引数据帧，大佬教程大佬觉得挺不错的，现在分享给大家，也给大家做个参考。

如何解决在 Pandas 1.3 中升级多索引数据帧？

开发过程中遇到在 Pandas 1.3 中升级多索引数据帧的问题如何解决？下面主要结合日常开发的经验，给出你关于在 Pandas 1.3 中升级多索引数据帧的解决方法建议，希望对你解决在 Pandas 1.3 中升级多索引数据帧有所启发或帮助；

我正在尝试增加 DataFrame 的时间分辨率（放大）。我找到了几个解决方案，但对 Pandas 1.3 没有任何作用。

我的 DataFrame 具有这种形状。

                                              Population (cap)
country category   date                                       
FR      Population 2018-01-01 00:00:00+00:00        67101930.0
                   2019-01-01 00:00:00+00:00        67248926.0
                   2020-01-01 00:00:00+00:00        67391582.0
DE      Population 2018-01-01 00:00:00+00:00        82905782.0
                   2019-01-01 00:00:00+00:00        83092962.0
                   2020-01-01 00:00:00+00:00        83240525.0

我尝试过类似这样的不同代码：

    data = data.groupby(level=["country","category"])
    data = data.resample(freq)
    data = data.ffill()

不幸的是，这不起作用并导致错误：@H_374_7@multiIndex has no single BACking array. Use 'MultiIndex.to_numpy()' to get a NumPy array of tuples.

或者这个：

    data = data.reset_index()
    data = data.set_index(["date"])
    data = data.resample(freq)
    data = data.ffill()
    data = data.reset_index()
    data = data.set_index(["country","category","date"])

抛出错误：cAnnot reindex a non-unique index with a method or limit。

如何在 Pandas 1.3 中执行此任务？

解决方法

可惜目前的格式不利于pandas中的这个操作。 MultiIndex 将需要重置，索引中只留下 date。然后可以使用 groupby resample：

freq = '6M'
df = (
    df.reset_index(["country","category"])  # Leave Date as the only index
        .groupby(["country","category"],as_index=falsE)  # Groupby 
        .resample(freq)  # Resample at frequency
        .ffill()  # whatever resampling operation here
)

df：

                            country    category  Population (cap)
  date                                                           
0 2018-01-31 00:00:00+00:00      DE  Population                 4
  2018-07-31 00:00:00+00:00      DE  Population                 4
  2019-01-31 00:00:00+00:00      DE  Population                 5
  2019-07-31 00:00:00+00:00      DE  Population                 5
  2020-01-31 00:00:00+00:00      DE  Population                 6
1 2018-01-31 00:00:00+00:00      FR  Population                 1
  2018-07-31 00:00:00+00:00      FR  Population                 1
  2019-01-31 00:00:00+00:00      FR  Population                 2
  2019-07-31 00:00:00+00:00      FR  Population                 2
  2020-01-31 00:00:00+00:00      FR  Population                 3

可以使用 drop_level、reset_index 和 set_index 进行一些清理以恢复初始形状：

freq = '6M'
df = (
    df.reset_index(["country","category"])
        .groupby(["country",as_index=falsE)
        .resample(freq)
        .ffill()
        .droplevel(0)  # Remove added numerical index
        .reset_index() 
        .set_index(['country','category','date'])  # Restore MultiIndex
)

df：

                                              Population (cap)
country category   date                                       
DE      Population 2018-01-31 00:00:00+00:00                 4
                   2018-07-31 00:00:00+00:00                 4
                   2019-01-31 00:00:00+00:00                 5
                   2019-07-31 00:00:00+00:00                 5
                   2020-01-31 00:00:00+00:00                 6
FR      Population 2018-01-31 00:00:00+00:00                 1
                   2018-07-31 00:00:00+00:00                 1
                   2019-01-31 00:00:00+00:00                 2
                   2019-07-31 00:00:00+00:00                 2
                   2020-01-31 00:00:00+00:00                 3

使用的数据帧和导入：

df = pd.DataFrame({
    'Population (cap)': {
        ('FR','Population',pd.timestamp('2018-01-01 00:00:00+0000',tz='UTC')): 1,('FR',pd.timestamp('2019-01-01 00:00:00+0000',tz='UTC')): 2,pd.timestamp('2020-01-01 00:00:00+0000',tz='UTC')): 3,('DE',tz='UTC')): 4,tz='UTC')): 5,tz='UTC')): 6}}
).rename_axis(['country','date'])

df：

                                              Population (cap)
country category   date                                       
DE      Population 2018-01-01 00:00:00+00:00                 4
                   2019-01-01 00:00:00+00:00                 5
                   2020-01-01 00:00:00+00:00                 6
FR      Population 2018-01-01 00:00:00+00:00                 1
                   2019-01-01 00:00:00+00:00                 2
                   2020-01-01 00:00:00+00:00                 3

我认为你可以用 pd.Grouper 做到这一点，但它涉及到大量令人困惑的技巧来跟踪多索引的其他部分。

我认为这也可以工作，尽管由于堆栈/取消堆栈，它不一定是最有效的。重新排序和排序索引并不是绝对必要的，但我假设它们使最终结果更接近您要查找的结果。

import pandas as pd

df = pd.DataFrame({
    'Population (cap)': {
        ('FR','date'])

(
    df
    .unstack(['country','category'])
    .resample('3M').ffill()
    .stack(['country','category'])
    .reorder_levels(['country','date'])
    .sort_index()
)

输入看起来像这样：

在 Pandas 1.3 中升级多索引数据帧