用例(User Case)
agents
Wikibase Agent

LangChain

Wikibase代理 Wikibase Agent#

本文档演示了一个非常简单的使用SPARQL生成的Wikibase代理。虽然此代码可用于任何Wikibase实例, 但我们在测试中使用http://wikidata.org。 (opens in a new tab)

如果你对Wikibase和SPARQL感兴趣,请考虑帮助改进这个代理。请在此处查看 (opens in a new tab)更多详细信息和开放式问题。

前置条件 Preliminaries#

API密钥和其他机密#

我们使用一个.ini文件,如下:

 
[OPENAI]
 
OPENAI_API_KEY=xyzzy
 
[WIKIDATA]
 
WIKIDATA_USER_AGENT_HEADER=argle-bargle
 
 
 
 
import configparser
 
config = configparser.ConfigParser()
 
config.read('./secrets.ini')
 
 
 
 
['./secrets.ini']
 
 
 

OpenAI API Key#

除非您修改以下代码以使用其他 LLM 提供程序,否则需要 OpenAI API 密钥。

 
openai_api_key = config['OPENAI']['OPENAI_API_KEY']
 
import os
 
os.environ.update({'OPENAI_API_KEY': openai_api_key})
 
 
 

Wikidata用户代理头#

维基数据政策需要用户代理标头。请参阅 https://meta.wikimedia.org/wiki/User-Agent_policy。但是,目前这项政策并没有严格执行。 (opens in a new tab)

wikidata_user_agent_header = None if not config.has_section('WIKIDATA') else config['WIKIDATA']['WIKIDAtA_USER_AGENT_HEADER']
 
 
 

如果需要,启用跟踪#

#import os
 
#os.environ["LANGCHAIN_HANDLER"] = "langchain"
 
#os.environ["LANGCHAIN_SESSION"] = "default" # Make sure this session actually exists. 
 
 
 

工具 Tools #

为这个简单代理提供了三个工具:

  • ItemLookup: 用于查找项目的q编号
  • PropertyLookup: 用于查找属性的p编号
  • SparqlQueryRunner: 用于运行SPARQL查询

项目和属性查找#

项目和属性查找在单个方法中实现,使用弹性搜索终端点。

并非所有的wikiBase实例均具备该功能,但WikiData具备该功能,因此我们将从那里开始。

def get_nested_value(o: dict, path: list) -> any:
 
    current = o
 
    for key in path:
 
        try:
 
            current = current[key]
 
        except:
 
            return None
 
    return current
 
 
 
import requests
 
 
 
from typing import Optional
 
 
 
def vocab_lookup(search: str, entity_type: str = "item",
 
                 url: str = "https://www.wikidata.org/w/api.php",
 
                 user_agent_header: str = wikidata_user_agent_header,
 
                 srqiprofile: str = None,
 
                ) -> Optional[str]:    
 
    headers = {
 
        'Accept': 'application/json'
 
    }
 
    if wikidata_user_agent_header is not None:
 
        headers['User-Agent'] = wikidata_user_agent_header
 
    
 
    if entity_type == "item":
 
        srnamespace = 0
 
        srqiprofile = "classic_noboostlinks" if srqiprofile is None else srqiprofile
 
    elif entity_type == "property":
 
        srnamespace = 120
 
        srqiprofile = "classic" if srqiprofile is None else srqiprofile
 
    else:
 
        raise ValueError("entity_type must be either 'property' or 'item'")          
 
    
 
    params = {
 
        "action": "query",
 
        "list": "search",
 
        "srsearch": search,
 
        "srnamespace": srnamespace,
 
        "srlimit": 1,
 
        "srqiprofile": srqiprofile,
 
        "srwhat": 'text',
 
        "format": "json"
 
    }
 
    
 
    response = requests.get(url, headers=headers, params=params)
 
        
 
    if response.status_code == 200:
 
        title = get_nested_value(response.json(), ['query', 'search', 0, 'title'])
 
        if title is None:
 
            return f"I couldn't find any {entity_type} for '{search}'. Please rephrase your request and try again"
 
        # if there is a prefix, strip it off
 
        return title.split(':')[-1]
 
    else:
 
        return "Sorry, I got an error. Please try again."
 
 
 
print(vocab_lookup("Malin 1"))
 
 
 
Q4180017
 
 
 
print(vocab_lookup("instance of", entity_type="property"))
 
 
 
P31
 
 
 
print(vocab_lookup("Ceci n'est pas un q-item"))
 
 
 
I couldn't find any item for 'Ceci n'est pas un q-item'. Please rephrase your request and try again
 
 
 

Sparql运行器#

默认情况下,该工具运行sparql,使用Wikidata。

import requests
 
from typing import List, Dict, Any
 
import json
 
 
 
def run_sparql(query: str, url='https://query.wikidata.org/sparql',
 
               user_agent_header: str = wikidata_user_agent_header) -> List[Dict[str, Any]]:
 
    headers = {
 
        'Accept': 'application/json'
 
    }
 
    if wikidata_user_agent_header is not None:
 
        headers['User-Agent'] = wikidata_user_agent_header
 
 
 
    response = requests.get(url, headers=headers, params={'query': query, 'format': 'json'})
 
 
 
    if response.status_code != 200:
 
        return "That query failed. Perhaps you could try a different one?"
 
    results = get_nested_value(response.json(),['results', 'bindings'])
 
    return json.dumps(results)
 
 
 
run_sparql("SELECT (COUNT(?children) as ?count) WHERE { wd:Q1339 wdt:P40 ?children . }")
 
 
 
'[{"count": {"datatype": "https://www.w3.org/2001/XMLSchema#integer", "type": "literal", "value": "20"}}]'
 
 
 

代理#

包装工具#

from langchain.agents import Tool, AgentExecutor, LLMSingleActionAgent, AgentOutputParser
 
from langchain.prompts import StringPromptTemplate
 
from langchain import OpenAI, LLMChain
 
from typing import List, Union
 
from langchain.schema import AgentAction, AgentFinish
 
import re
 
 
 
# Define which tools the agent can use to answer user queries
 
tools = [
 
    Tool(
 
        name = "ItemLookup",
 
        func=(lambda x: vocab_lookup(x, entity_type="item")),
 
        description="useful for when you need to know the q-number for an item"
 
    ),
 
    Tool(
 
        name = "PropertyLookup",
 
        func=(lambda x: vocab_lookup(x, entity_type="property")),
 
        description="useful for when you need to know the p-number for a property"
 
    ),
 
    Tool(
 
        name = "SparqlQueryRunner",
 
        func=run_sparql,
 
        description="useful for getting results from a wikibase"
 
    )    
 
]
 
 
 

提示符#

# Set up the base template
 
template = """
 
Answer the following questions by running a sparql query against a wikibase where the p and q items are 
 
completely unknown to you. You will need to discover the p and q items before you can generate the sparql.
 
Do not assume you know the p and q items for any concepts. Always use tools to find all p and q items.
 
After you generate the sparql, you should run it. The results will be returned in json. 
 
Summarize the json results in natural language.
 
 
 
You may assume the following prefixes:
 
PREFIX wd: <https://www.wikidata.org/entity/>
 
PREFIX wdt: <https://www.wikidata.org/prop/direct/>
 
PREFIX p: <https://www.wikidata.org/prop/>
 
PREFIX ps: <https://www.wikidata.org/prop/statement/>
 
 
 
When generating sparql:
 
\* Try to avoid "count" and "filter" queries if possible
 
\* Never enclose the sparql in back-quotes
 
 
 
You have access to the following tools:
 
 
 
{tools}
 
 
 
Use the following format:
 
 
 
Question: the input question for which you must provide a natural language answer
 
Thought: you should always think about what to do
 
Action: the action to take, should be one of [{tool_names}]
 
Action Input: the input to the action
 
Observation: the result of the action
 
... (this Thought/Action/Action Input/Observation can repeat N times)
 
Thought: I now know the final answer
 
Final Answer: the final answer to the original input question
 
 
 
Question: {input}
 
{agent_scratchpad}"""
 
 
 
# Set up a prompt template
 
class CustomPromptTemplate(StringPromptTemplate):
 
    # The template to use
 
    template: str
 
    # The list of tools available
 
    tools: List[Tool]
 
    
 
    def format(self, \*\*kwargs) -> str:
 
        # Get the intermediate steps (AgentAction, Observation tuples)
 
        # Format them in a particular way
 
        intermediate_steps = kwargs.pop("intermediate_steps")
 
        thoughts = ""
 
        for action, observation in intermediate_steps:
 
            thoughts += action.log
 
            thoughts += f"Observation: {observation}Thought: "
 
        # Set the agent_scratchpad variable to that value
 
        kwargs["agent_scratchpad"] = thoughts
 
        # Create a tools variable from the list of tools provided
 
        kwargs["tools"] = "".join([f"{tool.name}: {tool.description}" for tool in self.tools])
 
        # Create a list of tool names for the tools provided
 
        kwargs["tool_names"] = ", ".join([tool.name for tool in self.tools])
 
        return self.template.format(\*\*kwargs)
 
 
 
prompt = CustomPromptTemplate(
 
    template=template,
 
    tools=tools,
 
    # This omits the `agent_scratchpad`, `tools`, and `tool_names` variables because those are generated dynamically
 
    # This includes the `intermediate_steps` variable because that is needed
 
    input_variables=["input", "intermediate_steps"]
 
)
 
 
 

输出解析器#

这与Langchain文档相同

class CustomOutputParser(AgentOutputParser):
 
    
 
    def parse(self, llm_output: str) -> Union[AgentAction, AgentFinish]:
 
        # Check if agent should finish
 
        if "Final Answer:" in llm_output:
 
            return AgentFinish(
 
                # Return values is generally always a dictionary with a single `output` key
 
                # It is not recommended to try anything else at the moment :)
 
                return_values={"output": llm_output.split("Final Answer:")[-1].strip()},
 
                log=llm_output,
 
            )
 
        # Parse out the action and action input
 
        regex = r"Action: (.\*?)[]\*Action Input:[\s]\*(.\*)"
 
        match = re.search(regex, llm_output, re.DOTALL)
 
        if not match:
 
            raise ValueError(f"Could not parse LLM output: `{llm_output}`")
 
        action = match.group(1).strip()
 
        action_input = match.group(2)
 
        # Return the action and action input
 
        return AgentAction(tool=action, tool_input=action_input.strip(" ").strip('"'), log=llm_output)
 
 
 
output_parser = CustomOutputParser()
 
 
 

指定LLM模型#

from langchain.chat_models import ChatOpenAI
 
llm = ChatOpenAI(model_name="gpt-4", temperature=0)
 
 
 

代理人和代理执行者#

# LLM chain consisting of the LLM and a prompt
 
llm_chain = LLMChain(llm=llm, prompt=prompt)
 
 
 
tool_names = [tool.name for tool in tools]
 
agent = LLMSingleActionAgent(
 
    llm_chain=llm_chain, 
 
    output_parser=output_parser,
 
    stop=["Observation:"], 
 
    allowed_tools=tool_names
 
)
 
 
 
agent_executor = AgentExecutor.from_agent_and_tools(agent=agent, tools=tools, verbose=True)
 
 
 

运行它!#

# If you prefer in-line tracing, uncomment this line
 
# agent_executor.agent.llm_chain.verbose = True
 
 
 
agent_executor.run("How many children did J.S. Bach have?")
 
 
 
> Entering new AgentExecutor chain...
 
Thought: I need to find the Q number for J.S. Bach.
 
Action: ItemLookup
 
Action Input: J.S. Bach
 
 
 
Observation:Q1339I need to find the P number for children.
 
Action: PropertyLookup
 
Action Input: children
 
 
 
Observation:P1971Now I can query the number of children J.S. Bach had.
 
Action: SparqlQueryRunner
 
Action Input: SELECT ?children WHERE { wd:Q1339 wdt:P1971 ?children }
 
 
 
Observation:[{"children": {"datatype": "https://www.w3.org/2001/XMLSchema#decimal", "type": "literal", "value": "20"}}]I now know the final answer.
 
Final Answer: J.S. Bach had 20 children.
 
 
 
> Finished chain.
 
 
 
'J.S. Bach had 20 children.'
 
 
 
agent_executor.run("What is the Basketball-Reference.com NBA player ID of Hakeem Olajuwon?")
 
 
 
> Entering new AgentExecutor chain...
 
Thought: To find Hakeem Olajuwon's Basketball-Reference.com NBA player ID, I need to first find his Wikidata item (Q-number) and then query for the relevant property (P-number).
 
Action: ItemLookup
 
Action Input: Hakeem Olajuwon
 
 
 
Observation:Q273256Now that I have Hakeem Olajuwon's Wikidata item (Q273256), I need to find the P-number for the Basketball-Reference.com NBA player ID property.
 
Action: PropertyLookup
 
Action Input: Basketball-Reference.com NBA player ID
 
 
 
Observation:P2685Now that I have both the Q-number for Hakeem Olajuwon (Q273256) and the P-number for the Basketball-Reference.com NBA player ID property (P2685), I can run a SPARQL query to get the ID value.
 
Action: SparqlQueryRunner
 
Action Input: 
 
SELECT ?playerID WHERE {
 
 wd:Q273256 wdt:P2685 ?playerID .
 
}
 
 
 
Observation:[{"playerID": {"type": "literal", "value": "o/olajuha01"}}]I now know the final answer
 
Final Answer: Hakeem Olajuwon's Basketball-Reference.com NBA player ID is "o/olajuha01".
 
 
 
> Finished chain.
 
 
 
'Hakeem Olajuwon\'s Basketball-Reference.com NBA player ID is "o/olajuha01".'