MongoDB Test4: 实验测试-写数据

MongoDB Test4: 实验测试-写数据

表设计

用python在mongodb创建数据库名为seismic_keyspace,该库里创建project_chunks_table表,表里的字段有project,sensor,trace,chunk,其中project为文本类型,sensor、trace为整数类型,chunk使用GidFS进行存储。
具体的是:将一个大小为1.96GB的segy数据,存储在表project_chunks_table里,chunk字段用来存储文件,project字段值为全为proj_001,sensor为1,trace则按chunk的数量从1开始递增,请具体实现一下

写数据

将一个大小为1.96GB的segy数据,将文件分割为合适大小的 "chunks",每个 "chunk" 大小不超过 255 KB;在表project_chunks_table,chunk字段用来存储“chunk”,project字段值为全为proj_001,sensor为1,trace则按chunk的数量从1开始递增,请具体实现一下

两个表(自己分割数据)

将文件分割成大小设置为255KB,存入GridFS!
用python在mongodb创建数据库名为seismic_space,该库里创建p普通的表project_table和GridFS表project_chunks_table。
project_table表里的字段有_id,project,sensor,trace,chunk,其中_id使用mongodb默认的生成,project为字符类型,sensor、trace为整数类型,chunk记录GidFS进行文件存储的_id。
project_chunk_table存储文件,project_chunk_table 里的_id使用mongodb默认的生成,filename为project+sensor+trace组合名。
具体的是:将一个大小为1.96GB的segy数据,将文件分割成大小设置为255KB
存储在表project_table里,project字段值为全为proj_001,sensor为1,trace则按chunk的数量从1开始递增,chunk字段用来存储文件,请具体实现一下
import pymongo from pymongo import MongoClient import gridfs import os # Connect to MongoDB client = MongoClient("192.168.92.159", 20000) db = client["seismic_space"] # Create a collection for project_table project_table = db["project_table"] # Create a GridFS collection for project_chunks_table fs = gridfs.GridFS(db, collection="project_chunks_table") # Define the file path and chunk size file_path = '.\data\LX_SEGY005.segy' chunk_size = 255 * 1024 # 255 KB file_extension = file_path.split(".")[-1] file_name = os.path.basename(file_path) # Insert data into project_table and store chunks in project_chunks_table def store_segy_data(file_path, project, sensor): with open(file_path, "rb") as segy_file: chunk_num = 0 while True: chunk = segy_file.read(chunk_size) if not chunk: break trace = chunk_num + 1 chunk_id = fs.put( chunk, filename=f"{project}_{sensor}_{trace}.segy", content_type=file_extension,metadata=file_name) project_table.insert_one({ "project": project, "sensor": sensor, "trace": trace, "chunk": chunk_id }) chunk_num += 1 # Insert data into project_table and store chunks in project_chunks_table project_name = "proj_001" sensor_id = 1 store_segy_data(file_path, project_name, sensor_id) print("Data insertion complete.")
 
用python在mongodb创建数据库名为seismic_space,该库里创建p普通的表project_table和GridFS表project_chunk_table。
project_table表里的字段有_id,project,sensor,trace,chunk,其中_id使用mongodb默认的生成,project为字符类型,sensor、trace为整数类型,chunk记录GidFS进行文件存储的_id。
project_chunk_table存储文件,project_chunk_table 里的_id使用mongodb默认的生成,filename为project+sensor+trace组合名,content type记录数据的类型
将一个大小为1GB的segy数据,存储在表project_table里,project字段值为为proj_001,sensor为1,trace为2000,chunk字段用来存储文件,请具体实现一下

两个表(GridFS内部分割大文件)

这里存为一条记录,两个表关联,使用GridFS进行文件切割。

模拟一个5MB的数据

import pymongo from pymongo import MongoClient from gridfs import GridFS # 连接 MongoDB client = MongoClient("192.168.92.159", 20000) # 创建数据库 db = client['seismic_space'] # 创建 project_table 集合 project_table = db['project_table'] # 创建 GridFS 对象和 project_chunk_table 集合 fs = GridFS(db, collection='project_chunk_table') # 模拟一个 1GB 的数据 dummy_data = b'0' * (5*1024 * 1024) # 将数据存储在 GridFS 中 project = 'proj_001' sensor = 1 trace = 2000 filename = f'{project}_{sensor}_{trace}.segy' content_type = 'application/octet-stream' file_id = fs.put(dummy_data, filename=filename, content_type=content_type) # 创建 project_table 文档 project_document = { 'project': project, 'sensor': sensor, 'trace': trace, 'chunk': file_id } # 插入文档到 project_table 集合 project_table.insert_one(project_document) print("Data stored successfully.")

实际数据

  • 按GridFS默认chunk_size为255kb
  • 也可以自定义数据块大小
    • chunk_size_bytes = 1024 * 1024 # 分块大小设置为 1MB
    • chunk_size=chunk_size_bytes
import pymongo from pymongo import MongoClient from gridfs import GridFS # 连接 MongoDB client = MongoClient("192.168.92.159", 20000) # 创建数据库 db = client['seismic_space'] # 创建 project_table 集合 project_table = db['project_table'] # 创建 GridFS 对象和 project_chunk_table 集合 fs = GridFS(db, collection='project_chunk_table') # 定义文件路径和信息 file_path = '.\data\data718.segy' # 替换为实际文件路径 project = 'proj_001' sensor = 1 trace = 2000 filename = f'{project}_{sensor}_{trace}.segy' content_type = 'application/octet-stream' # 以二进制读取文件并分块写入 GridFS # chunk_size_bytes = 1024 * 1024 # 分块大小设置为 1MB with open(file_path, 'rb') as file: file_id = fs.put(file, filename=filename, content_type=content_type) # 创建 project_table 文档 project_document = { 'project': project, 'sensor': sensor, 'trace': trace, 'chunk': file_id } # 插入文档到 project_table 集合 project_table.insert_one(project_document) print("Data stored successfully.")