A financial web table scraper based on playwright that supports Model Context Protocol (MCP). Currently supports the following sources:
In live trading, if a website goes down or undergoes redesign, you can immediately switch to other websites. (Note: Different websites have different table structures and require adaptation in advance)
playwright. Currently supports the following sources:
RooCode provides Human Reply functionality. However, we found that the web version of Nano Search breaks formatting when copying, so we developed this feature.
pip install -i https://pypi.org/simple --upgrade mcp_query_table
pip install -i https://pypi.tuna.tsinghua.edu.cn/simple --upgrade mcp_query_table
import asyncio
from mcp_query_table import *
async def main() -> None:
async with BrowserManager(endpoint="http://127.0.0.1:9222", executable_path=None, devtools=True) as bm:
# iWencai requires browser width > 768 to prevent mobile interface adaptation
page = await bm.get_page()
df = await query(page, 'Top 200 ETFs with best returns', query_type=QueryType.ETF, max_page=1, site=Site.THS)
print(df.to_markdown())
df = await query(page, 'Top 50 funds by year-to-date returns', query_type=QueryType.Fund, max_page=1, site=Site.TDX)
print(df.to_csv())
df = await query(page, 'Top 10 industry sectors by market cap', query_type=QueryType.Index, max_page=1, site=Site.TDX)
print(df.to_csv())
# TODO East Money pagination requires login in advance
df = await query(page, 'Top 5 concept sectors by today\'s gains;', query_type=QueryType.Board, max_page=3, site=Site.EastMoney)
print(df)
output = await chat(page, "What does 1+2 equal?", provider=Provider.YuanBao)
print(output)
output = await chat(page, "What does 3+4 equal?", provider=Provider.YuanBao, create=True)
print(output)
print('done')
bm.release_page(page)
await page.wait_for_timeout(2000)
if __name__ == '__main__':
asyncio.run(main())
Chrome. If you must use Edge, besides closing all Edge windows, you also need to terminate all Microsoft Edge processes in Task Manager, i.e., taskkill /f /im msedge.exeUnlike requests, playwright is browser-based and simulates user operations in the browser.
requests, but has higher development efficiencyData acquisition methods include:
json data
requests, requires response parsingThis project uses simulated browser clicks to send requests and intercepts responses for data parsing.
Future adaptations will use more suitable methods based on different website redesign situations.
Headless mode runs faster, but some websites require login in advance, so headless mode must specify user_data_dir, otherwise login issues may occur.
endpoint=None, headless=True can start a new browser instance headlessly. Specify executable_path and user_data_dir to ensure normal operation in headless mode.endpoint starts with http://, it connects to a headed browser started in CDP mode, with required parameter --remote-debugging-port. executable_path is the local browser path.endpoint starts with ws://, it connects to a remote Playwright Server. This is also headless mode, but cannot specify user_data_dir, so usage is limited
Ensure you can execute python -m mcp_query_table -h in the console. If not, you may need to pip install mcp_query_table first.
In Cline, you can configure as follows. Where command is the absolute path to python, and timeout is the timeout in seconds. Since AI platforms often require over 1 minute for responses, a large timeout value needs to be set.
{
"mcpServers": {
"mcp_query_table": {
"timeout": 300,
"command": "D:\\Users\\Kan\\miniconda3\\envs\\py312\\python.exe",
"args": [
"-m",
"mcp_query_table",
"--format",
"markdown",
"--endpoint",
"http://127.0.0.1:9222",
"--executable_path",
"C:\\Program Files\\Google\\Chrome\\Application\\chrome.exe"
]
}
}
}
First execute the following command in the console to start the MCP service:
python -m mcp_query_table --format markdown --transport sse --port 8000 --endpoint http://127.0.0.1:9222
Then you can connect to the MCP service:
{
"mcpServers": {
"mcp_query_table": {
"timeout": 300,
"url": "http://127.0.0.1:8000/sse"
}
}
}
MCP Inspectornpx @modelcontextprotocol/inspector python -m mcp_query_table --format markdown --endpoint http://127.0.0.1:9222
Opening browsers and pagination is a time-consuming operation that may cause MCP Inspector page timeouts. You can use http://localhost:5173/?timeout=300000 to set a timeout of 300 seconds.
This is my first attempt at writing an MCP project, so there may be various issues. Welcome everyone to communicate and exchange ideas.
MCP Usage TipsTop 100 stocks with highest gains in 2024 ranked by total market cap on December 31, 2024. Results differ across the three websites:
Large language models have weak question decomposition abilities, so questions should be asked reasonably to ensure query conditions aren't modified. Methods 2 and 3 below are recommended:
Implements querying financial data on the same page and manually inputting into AI for deep analysis. Refer to the README.md file in the streamlit directory.