mhtml
MHTML是一种用于电子邮件和归档网页的格式。MHTML,有时称为MHT,代表MIME HTML,是一个将整个网页归档为单个文件的格式。当将网页保存为MHTML格式时,该文件扩展名将包含HTML代码、图像、音频文件、Flash动画等。
<!--IMPORTS:[{"imported": "MHTMLLoader", "source": "langchain_community.document_loaders", "docs": "https://python.langchain.com/api_reference/community/document_loaders/langchain_community.document_loaders.mhtml.MHTMLLoader.html", "title": "mhtml"}]-->
from langchain_community.document_loaders import MHTMLLoader
# Create a new loader object for the MHTML file
loader = MHTMLLoader(
file_path="../../../../../../tests/integration_tests/examples/example.mht"
)
# Load the document from the file
documents = loader.load()
# Print the documents to see the results
for doc in documents:
print(doc)
page_content='LangChain\nLANG CHAIN 🦜️🔗Official Home Page\xa0\n\n\n\n\n\n\n\nIntegrations\n\n\n\nFeatures\n\n\n\n\nBlog\n\n\n\nConceptual Guide\n\n\n\n\nPython Repo\n\n\nJavaScript Repo\n\n\n\nPython Documentation \n\n\nJavaScript Documentation\n\n\n\n\nPython ChatLangChain \n\n\nJavaScript ChatLangChain\n\n\n\n\nDiscord \n\n\nTwitter\n\n\n\n\nIf you have any comments about our WEB page, you can \nwrite us at the address shown above. However, due to \nthe limited number of personnel in our corporate office, we are unable to \nprovide a direct response.\n\nCopyright © 2023-2023 LangChain Inc.\n\n\n' metadata={'source': '../../../../../../tests/integration_tests/examples/example.mht', 'title': 'LangChain'}