本文介绍如何使用Python的requests库和xml.etree.ElementTree模块解析XML数据。XML(可扩展标记语言)用于存储结构化数据。 常见的XML应用包括站点地图和RSS订阅。
以下是一个XML文件示例:
<breakfast_menu> <food> <name>belgian waffles</name> <price>$5.95</price> <description>two of our famous belgian waffles with plenty of real maple syrup</description> <calories>650</calories> </food> <food> <name>strawberry belgian waffles</name> <price>$7.95</price> <description>light belgian waffles covered with strawberries and whipped cream</description> <calories>900</calories> </food> <food> <name>berry-berry belgian waffles</name> <price>$8.95</price> <description>light belgian waffles covered with an assortment of fresh berries and whipped cream</description> <calories>900</calories> </food> <food> <name>french toast</name> <price>$4.50</price> <description>thick slices made from our homemade sourdough bread</description> <calories>600</calories> </food> <food> <name>homestyle breakfast</name> <price>$6.95</price> <description>two eggs, bacon or sausage, toast, and our ever-popular hash browns</description> <calories>950</calories> </food> </breakfast_menu>
这个例子展示了一个breakfast_menu根元素,包含多个food元素,每个food元素包含name、price、description和calories子元素。
接下来,我们将学习如何用python解析此类XML数据。首先,设置开发环境:
立即学习“Python免费学习笔记(深入)”;
安装必要的库:
sudo apt install python3 python3-virtualenv -y # Debian/Ubuntu python3 -m venv env # 创建虚拟环境 source env/bin/activate # 激活虚拟环境 pip3 install requests
创建main.py文件并输入以下代码:
步骤一:获取所有标签名
import requests import xml.etree.ElementTree as ET response = requests.get('https://www.w3schools.com/xml/simple.xml') root = ET.fromstring(response.content) for item in root.iter('*'): print(item.tag)
这将打印出所有XML标签的名称。
步骤二:提取特定元素的值
import requests import xml.etree.ElementTree as ET response = requests.get('https://www.w3schools.com/xml/simple.xml') root = ET.fromstring(response.content) for item in root.iterfind('food'): print(item.findtext('name')) print(item.findtext('price')) print(item.findtext('description')) print(item.findtext('calories'))
这将打印每个食物的名称、价格、描述和卡路里信息。
步骤三:格式化输出
为了更清晰地显示结果,我们可以格式化输出:
import requests import xml.etree.ElementTree as ET response = requests.get('https://www.w3schools.com/xml/simple.xml') root = ET.fromstring(response.content) for item in root.iterfind('food'): print('Name: {}, Price: {}, Description: {}, Calories: {}'.format( item.findtext('name'), item.findtext('price'), item.findtext('description'), item.findtext('calories')))
这将以更易读的格式打印输出。
XML文件示例来自w3schools。
希望本文对您有所帮助! 您可以通过你的赞助链接来支持我的工作。