Scrapy爬虫管道持久化存储文件无法写入的原因是什么？-小浪学习网

Scrapy爬虫管道持久化存储文件无法写入的原因是什么？

scrapy爬虫管道：持久化存储文件写入失败问题排查

本文分析一个Scrapy爬虫项目中，管道（Pipeline）持久化存储文件写入失败的问题。问题表现为：创建的文件为空，无法写入数据。

代码示例（问题代码）：

# spider.py import scrapy import sys sys.path.append(r'd:project_testpydemodemo1xunlianmyspiderqiubai') from ..items import qiubaiitem  class biedouspider(scrapy.Spider):     name = "biedou"     start_urls = ["https://www.biedoul.com/wenzi/"]      def parse(self, response):         dl_list = response.xpath('/html/body/div[4]/div[1]/div[1]/dl')          for dl in dl_list:             title = dl.xpath('./span/dd/a/strong/text()')[0].extract()             content = dl.xpath('./dd//text()').extract()             content = ''.join(content)              item = qiubaiitem()             item['title'] = title             item['content'] = content             yield item             break  # items.py import scrapy class qiubaiitem(scrapy.Item):     title = scrapy.Field()     content = scrapy.Field()  # pipelines.py class qiubaipipeline(Object):     def __init__(self):         self.fp = None      def open_spdier(self, spider):  # 拼写错误！         print("开始爬虫")         self.fp = open('./biedou.txt', 'w', encoding='utf-8')      def close_spider(self, spider):         print("结束爬虫")         self.fp.close()      def process_item(self, item, spider):         title = str(item['title'])         content = str(item['content'])         self.fp.write(title + ':' + content + 'n')         return item

错误信息：

... typeerror: object of type qiubaiitem is not json serializable 结束爬虫 ... Attributeerror: 'nonetype' object has no attribute 'close'

问题分析：

错误信息提示’nonetype’ object has no attribute ‘close’，表明self.fp为None，导致无法关闭文件。进一步追溯原因，发现pipelines.py文件中open_spdier方法名拼写错误，应为open_spider。由于方法名错误，Scrapy框架无法正确调用open_spider方法打开文件，导致self.fp始终为None。 TypeError错误则是因为process_item方法尝试写入qiubaiitem对象，而该对象并非直接可写入字符串类型。

解决方案：

修正pipelines.py文件中open_spdier方法的拼写错误，并改进process_item方法以正确处理Item对象：

# pipelines.py (修正后的代码) class QiubaiPipeline(object):     def __init__(self):         self.fp = None      def open_spider(self, spider): # 修正拼写错误         print("开始爬虫")         self.fp = open('./biedou.txt', 'w', encoding='utf-8')      def close_spider(self, spider):         print("结束爬虫")         self.fp.close()      def process_item(self, item, spider):         title = item['title']         content = item['content']         self.fp.write(f"{title}:{content}n") # 使用f-String更简洁         return item

通过修正方法名和改进process_item方法，确保文件被正确打开和写入数据，从而解决文件写入失败的问题。建议使用f-string格式化字符串，代码更简洁易读。此外，为了更好的错误处理，建议添加try…except块来处理潜在的IO错误。

文章版权归作者所有，未经允许请勿转载。

THE END