程序问答   发布时间:2022-06-01  发布网站:大佬教程  code.js-code.com
大佬教程收集整理的这篇文章主要介绍了DynamoDB get_item 以毫秒为单位读取 400kb 数据大佬教程大佬觉得挺不错的,现在分享给大家,也给大家做个参考。

如何解决DynamoDB get_item 以毫秒为单位读取 400kb 数据?

开发过程中遇到DynamoDB get_item 以毫秒为单位读取 400kb 数据的问题如何解决?下面主要结合日常开发的经验,给出你关于DynamoDB get_item 以毫秒为单位读取 400kb 数据的解决方法建议,希望对你解决DynamoDB get_item 以毫秒为单位读取 400kb 数据有所启发或帮助;

我有一个名为 events 的 dynamodb 表,我在其中存储了所有 user event details,例如 product_vIEwadd_to_cartproduct_purchase

在这个 events 表中,我有一些 items 的存储容量达到了 400kb

问题:

        response = self._table.get_item(
            Key={
                PARTITION_KEY: <pk>,SORT_KEY: <sk>,},ConsistentRead=false,)

当我想使用 dynamodb get_item 方法访问 item(400kb) 时,大约需要 5 seconds 才能返回结果。

我已经使用过 DAX

目标

我想在 1 秒内阅读 400kb 项。

重要信息:

dynamodb 中的数据将以此格式存储

{
 "partition_key": "user_iD1111","sort_key": "version_1","attributes": {
  "events": [
   {
    "t": "1614712316","a": "product_vIEw","i": "1275"
   },{
    "t": "1614712316","a": "product_add","a": "product_purchase",...

  ]
 }
}
  • t 是时间戳
  • a 可能是 product_vIEw,product_add,product_purchase
  • i 是 product_ID

如果您看到上面的项目 events 是一个列表,它将被新事件附加。

我有一个项目是 400kb,事件数在 events 列表中

我写了一些脚本来测量时间,结果如下

import boto3
import datetiR_354_11845@e

dynamodb = boto3.resource('dynamodb')

table = dynamodb.table('events')

pk = f"user_iD1111"
sk = f"version_1"


t_load_start = datetiR_354_11845@e.datetiR_354_11845@e.Now()


response = table.get_item(
    Key={
        "partition_key": pk,"sort_key": sk,ReturnConsumedCapacity="@R_279_10586@L"
)
capacity_units = response["ConsumedCapacity"]["CapacityUnits"]

t_load_end = datetiR_354_11845@e.datetiR_354_11845@e.Now()
seconds = (t_load_end - t_load_start).@R_279_10586@l_seconds()

print(f"Elapsed time is::{seconds}sec and {Capacity_units} capacity units")

这是我得到的输出。

Elapsed time is::5.676799sec and 50.0 capacity units

有人可以为此提出解决方案吗?

解决方法

tl;dr:将您的函数内存增加到至少 1024MB,请参阅更新 2


我很好奇,所以我做了一些测量。我创建了一个脚本,用于在新表中创建一个大小几乎为 400KB 的大 boi 项。

然后我测试了 Python 的两次读取 - 一次使用资源 API,另一次使用较低级别的客户端 - 最终在两种情况下读取一致。

这是我测量的:

Reading Big Boi from a Table resource took 0.366508s and consumed 50.0 RCUs
Reading Big Boi from a Client took 0.301585s and consumed 50.0 RCUs

如果我们从 RCU 推断,它读取的项目大小约为 50 * 2 * 4KB = 400 KB(最终一致性读取消耗 0.5 个 RCU)。

我在德国本地针对 eu-central-1(德国法兰克福)运行了几次,我看到的最高延迟约为 900 毫秒。 (这没有 DAX。)

因此,我认为您应该向我们展示您是如何进行测量的。

import uuid
from datetiR_354_11845@e import datetiR_354_11845@e,timedelta

import boto3
import boto3.dynamodb.conditions as conditions

TABLE_NAME = "big-boi-test"
BIG_BOI_PK = "f0ba8d6c"

TABLE_resourcE = boto3.resource("dynamodb").Table(TABLE_Name)
DDB_CLIENT = boto3.client("dynamodb")

def create_table():
    DDB_CLIENT.create_table(
        AttributeDefinitions=[{"Attributename": "PK","AttributeType": "S"}],Tablename=TABLE_NAME,KeyscheR_354_11845@a=[{"Attributename": "PK","KeyType": "HASH"}],BillingMode="PAY_PER_requEST"
    )

def create_big_boi_item() -> str:
    # based on calculations here: https://zaccharles.github.io/dynamodb-calculator/
    template = {
        "PK": {
            "S": BIG_BOI_PK
        },"bigBoi": {
            "S": ""
        }
    } # This is 16 bytes

    big_boi = "X" * (1024 * 400 - 16)
    template["bigBoi"]["S"] = big_boi
    return template

def store_big_boi():
    big_bio = create_big_boi_item()

    DDB_CLIENT.put_item(
        Item=big_bio,Tablename=TABLE_NAME
    )

def geT_Big_boi_with_table_resource():

    start = datetiR_354_11845@e.now()
    response = TABLE_resourcE.get_item(
        Key={"PK": BIG_BOI_PK},ReturnConsumedCapacity="@R_279_10586@L"
    )
    end = datetiR_354_11845@e.now()
    seconds = (end - start).@R_279_10586@l_seconds()
    capacity_units = response["ConsumedCapacity"]["CapacityUnits"]

    print(f"Reading Big Boi from a Table resource took {seconds}s and consumed {Capacity_units} RCUs")

def geT_Big_boi_with_client():

    start = datetiR_354_11845@e.now()
    response = DDB_CLIENT.get_item(
        Key={"PK": {"S": BIG_BOI_PK}},ReturnConsumedCapacity="@R_279_10586@L",Tablename=TABLE_NAME
    )
    end = datetiR_354_11845@e.now()
    seconds = (end - start).@R_279_10586@l_seconds()
    capacity_units = response["ConsumedCapacity"]["CapacityUnits"]

    print(f"Reading Big Boi from a Client took {seconds}s and consumed {Capacity_units} RCUs")

if __name__ == "__main__":
    # create_table()
    # store_big_boi()
    geT_Big_boi_with_table_resource()
    geT_Big_boi_with_client()

更新

我对一个看起来更像您使用的那个的项目再次进行了相同的测量,无论我以何种方式请求它们,我的平均测量值仍然低于 1000 毫秒:

Reading Big Boi from a Table resource took 1.492829s and consumed 50.0 RCUs
Reading Big Boi from a Table resource took 0.871583s and consumed 50.0 RCUs
Reading Big Boi from a Table resource took 0.857513s and consumed 50.0 RCUs
Reading Big Boi from a Table resource took 0.769432s and consumed 50.0 RCUs
Reading Big Boi from a Table resource took 0.690172s and consumed 50.0 RCUs
Reading Big Boi from a Table resource took 0.670099s and consumed 50.0 RCUs
Reading Big Boi from a Table resource took 0.633489s and consumed 50.0 RCUs
Reading Big Boi from a Table resource took 0.605999s and consumed 50.0 RCUs
Reading Big Boi from a Table resource took 0.598635s and consumed 50.0 RCUs
Reading Big Boi from a Table resource took 0.606553s and consumed 50.0 RCUs
Reading Big Boi from a Client took 1.66636s and consumed 50.0 RCUs
Reading Big Boi from a Client took 0.921605s and consumed 50.0 RCUs
Reading Big Boi from a Client took 0.831735s and consumed 50.0 RCUs
Reading Big Boi from a Client took 0.707082s and consumed 50.0 RCUs
Reading Big Boi from a Client took 0.668602s and consumed 50.0 RCUs
Reading Big Boi from a Client took 0.648401s and consumed 50.0 RCUs
Reading Big Boi from a Client took 0.5695s and consumed 50.0 RCUs
Reading Big Boi from a Client took 0.592073s and consumed 50.0 RCUs
Reading Big Boi from a Client took 0.611436s and consumed 50.0 RCUs
Reading Big Boi from a Client took 0.553827s and consumed 50.0 RCUs
Average latency over 10 requests with the table resource: 0.7796304s
Average latency over 10 requests with the client: 0.7770621s

这是项目的样子:

DynamoDB get_item 以毫秒为单位读取 400kb 数据

DynamoDB get_item 以毫秒为单位读取 400kb 数据

这是供您验证的完整测试脚本:

import statistics
import uuid
from datetiR_354_11845@e import datetiR_354_11845@e,timedelta

import boto3
import boto3.dynamodb.conditions as conditions

TABLE_NAME = "big-boi-test"
BIG_BOI_PK = "nestedBoi"

TABLE_resourcE = boto3.resource("dynamodb").Table(TABLE_Name)
DDB_CLIENT = boto3.client("dynamodb")

def create_table():
    DDB_CLIENT.create_table(
        AttributeDefinitions=[{"Attributename": "PK",BillingMode="PAY_PER_requEST"
    )

def create_big_boi_item() -> str:
    # based on calculations here: https://zaccharles.github.io/dynamodb-calculator/
    template = {
        "PK": {
            "S": "nestedBoi"
        },"bigBoiContainer": {
            "M": {
            "bigBoiList": {
                "L": [
                
                ]
            }
            }
        }
    } # 43 bytes

    item = {
        "M": {
        "t": {
            "S": "1614712316"
        },"a": {
            "S": "product_view"
        },"i": {
            "S": "1275"
        }
        }
    }  # 36 bytes

    number_of_items = int((1024 * 400 - 43) / 36)

    for _ in range(number_of_items):
        template["bigBoiContainer"]["M"]["bigBoiList"]["L"].append(item)

    return template

def store_big_boi():
    big_bio = create_big_boi_item()

    DDB_CLIENT.put_item(
        Item=big_bio,ReturnConsumedCapacity="@R_279_10586@L"
    )
    end = datetiR_354_11845@e.now()
    seconds = (end - start).@R_279_10586@l_seconds()
    capacity_units = response["ConsumedCapacity"]["CapacityUnits"]

    print(f"Reading Big Boi from a Table resource took {seconds}s and consumed {Capacity_units} RCUs")

    return seconds

def geT_Big_boi_with_client():

    start = datetiR_354_11845@e.now()
    response = DDB_CLIENT.get_item(
        Key={"PK": {"S": BIG_BOI_PK}},Tablename=TABLE_NAME
    )
    end = datetiR_354_11845@e.now()
    seconds = (end - start).@R_279_10586@l_seconds()
    capacity_units = response["ConsumedCapacity"]["CapacityUnits"]

    print(f"Reading Big Boi from a Client took {seconds}s and consumed {Capacity_units} RCUs")

    return seconds

if __name__ == "__main__":
    # create_table()
    # store_big_boi()

    n_experiments = 10
    experiments_with_table_resource = [geT_Big_boi_with_table_resource() for i in range(n_experiments)]
    experiments_with_client = [geT_Big_boi_with_client() for i in range(n_experiments)]
    print(f"Average latency over {n_experiments} requests with the table resource: {statistics.mean(experiments_with_table_resourcE)}s")
    print(f"Average latency over {n_experiments} requests with the client: {statistics.mean(experiments_with_client)}s")

如果我增加 n_experiments,它往往会变得更快,可能是因为 DDB 内部缓存。

仍然:无法重现。


更新 2

在得知您正在运行 Lambda 函数后,我在具有不同内存配置的 Lambda 内部再次运行了测试。

@H_607_248@
内存 n_experiments 使用资源的平均时间 与客户的平均时间
128MB 10 6.28s 5.06s
256MB 10 3.26s 2.61s
512MB 10 1.62s 1.33s
1024MB 10 0.84s 0.68s
2048MB 10 0.52s 0.43s
4096MB 10 0.51s 0.41s

如评论中所述,CPU 和网络性能随您分配给函数的内存量而变化。 你可以通过投钱来解决你的问题:-)

,

听起来您遇到了一些问题。第一个问题是您遇到了 400kb 项目大小限制。然您没有说这是一个问题,但可能值得重新审视您的数据模型,以便您可以存储更多事件数据。

性能问题不太可能与您的数据模型有关。 get_item 操作的平均延迟应为个位数毫秒,尤其是当您指定最终一致的读取时。这里还有其他事情要做。

您如何测试和衡量此操作的性能?

AWS 文档从 about troubleshooTing high latency DynamoDB operations 中提供了一些可能有用的建议。

大佬总结

以上是大佬教程为你收集整理的DynamoDB get_item 以毫秒为单位读取 400kb 数据全部内容,希望文章能够帮你解决DynamoDB get_item 以毫秒为单位读取 400kb 数据所遇到的程序开发问题。

如果觉得大佬教程网站内容还不错,欢迎将大佬教程推荐给程序员好友。

本图文内容来源于网友网络收集整理提供,作为学习参考使用,版权属于原作者。
如您有任何意见或建议可联系处理。小编QQ:384754419,请注明来意。
标签:数据