【答案】:
在Python中有时候回因为内存溢出问题不能一次性读取一个大Json文件,这个时候可以将它拆成N个小json文件,用来测试我们的代码,小数据测试起来也比较快。
下面是代码,供参考
count = 0
for count,line in enumerate(open(r"data.json",'rU',encoding='utf-8')):
pass
count += 1
print("文件总行数:",count)
split = 5 # 拆成五个文件
nums = [ (count*i//split) for i in range(1,split+1)]
print(nums)
# 拆分文件
import json
current_lines = 0
data_list = []
# 打开大文件,拆成小文件
with open('data.json', 'r',encoding='utf-8') as file:
i = 0
for line in file:
line = line.replace('},','}')
data_list.append(json.loads(line))
current_lines +=1
if current_lines in nums:
print(current_lines)
# 保存文件
file_name = './data_tmp/data_' + str(current_lines) + '.json'
with open(file_name,'w',encoding = 'utf-8') as f:
#print(len(data_list))
data = json.dumps(data_list)
f.write(data)
data_list = []
data = []
【关键字】:首页 > IT » Python将一个大Json文件拆分成多个小Json文件 MasterXiao MasterXiao