6大核心模块(Modules)
示例(Examples)
Markdown

LangChain

Markdown文本分割器#

MarkdownTextSplitter将文本沿Markdown标题、代码块或水平线分割。它是递归字符分割器的简单子类,具有Markdown特定的分隔符。默认情况下,查看源代码以查看Markdown语法。

  • 文本如何拆分:按照Markdown特定字符列表拆分

  • 如何测量块大小:通过传递的长度函数测量(默认为字符数)

from langchain.text_splitter import MarkdownTextSplitter
 
markdown_text = """
# 🦜️🔗 LangChain
 
⚡ Building applications with LLMs through composability ⚡
 
## Quick Install
 
```bash
# Hopefully this code block isn't split
pip install langchain
```python
 
As an open source project in a rapidly developing field, we are extremely open to contributions.
"""
markdown_splitter = MarkdownTextSplitter(chunk_size=100, chunk_overlap=0)
 
docs = markdown_splitter.create_documents([markdown_text])
 
docs
 
[Document(page_content='# 🦜️🔗 LangChain  ⚡ Building applications with LLMs through composability ⚡', lookup_str='', metadata={}, lookup_index=0),
 Document(page_content="Quick Install  ```bash\n# Hopefully this code block isn't split\npip install langchain", lookup_str='', metadata={}, lookup_index=0),
 Document(page_content='As an open source project in a rapidly developing field, we are extremely open to contributions.', lookup_str='', metadata={}, lookup_index=0)]