python – 熊猫:从函数连续写入csv

我为Pandas设置了一个函数,它在input.csv中运行了大量的行,并将结果输入到Series中.然后它将Series写入output.csv.

但是,如果进程中断(例如,由于意外事件),程序将终止,并且所有进入csv的数据都将丢失.

有没有办法将数据连续写入csv,无论该函数是否为所有行完成?

最好的是,每次程序启动时,都会创建一个空白的output.csv,该函数在运行时附加到该文件中.

import pandas as pd

df = pd.read_csv("read.csv")

def crawl(a):
    #Create x, y
    return pd.Series([x, y])

df[["Column X", "Column Y"]] = df["Column A"].apply(crawl)
df.to_csv("write.csv", index=False)
这是一种可能的解决方案,它将数据附加到新文件中,因为它以块的形式读取csv.如果进程中断,则新文件将包含中断之前的所有信息.

import pandas as pd

#csv file to be read in 
in_csv = '/path/to/read/file.csv'

#csv to write data to 
out_csv = 'path/to/write/file.csv'

#get the number of lines of the csv file to be read
number_lines = sum(1 for row in (open(in_csv)))

#size of chunks of data to write to the csv
chunksize = 10

#start looping through data writing it to a new file for each chunk
for i in range(1,number_lines,chunksize):
     df = pd.read_csv(in_csv,
          header=None,
          nrows = chunksize,#number of rows to read at each loop
          skiprows = i)#skip rows that have been read

     df.to_csv(out_csv,
          index=False,
          header=False,
          mode='a',#append data to csv file
          chunksize=chunksize)#size of data to append for each loop

转载注明原文:python – 熊猫:从函数连续写入csv - 代码日志