Skip to content

autogen-ext test is too slow...! #6376

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
SongChiYoung opened this issue Apr 23, 2025 · 2 comments
Open

autogen-ext test is too slow...! #6376

SongChiYoung opened this issue Apr 23, 2025 · 2 comments
Labels
help wanted Extra attention is needed proj-extensions

Comments

@SongChiYoung
Copy link
Contributor

What happened?

Describe the bug
Check each tests at autogen-ext

autogen-ext/cache_store : 0.15s
autogen-ext/code_executors : 203.89s
autogen-ext/memory : 12s
autogen-ext/models : 37s
autogen-ext/tools: 14s
autogen_ext/*.py : 146.02s

I could find out slow tests

packages/autogen-ext/tests/code_executors/test_docker_commandline_code_executor.py : 161.66s
packages/autogen-ext/tests/test_openai_assistant_agent.py : 92.66s

=============================================================== slowest more than 1sec durations ================================================================
21.02s call     tests/code_executors/test_docker_commandline_code_executor.py::test_delete_tmp_files
16.01s call     tests/test_openai_assistant_agent.py::test_file_retrieval[openai]
13.08s call     tests/test_openai_assistant_agent.py::test_on_reset_behavior[openai]
11.42s call     tests/code_executors/test_docker_commandline_code_executor.py::test_commandline_code_executor_timeout[docker]
10.75s call     tests/code_executors/test_docker_commandline_code_executor.py::test_deprecated_warning
10.69s call     tests/code_executors/test_docker_commandline_code_executor.py::test_docker_commandline_code_executor_serialization
10.50s call     tests/code_executors/test_docker_commandline_code_executor.py::test_docker_commandline_code_executor_extra_args
10.49s call     tests/code_executors/test_docker_commandline_code_executor.py::test_error_wrong_path
10.38s call     tests/code_executors/test_docker_commandline_code_executor.py::test_docker_commandline_code_executor_start_stop
10.34s call     tests/code_executors/test_docker_commandline_code_executor.py::test_docker_commandline_code_executor_start_stop_context_manager
10.34s call     tests/code_executors/test_docker_commandline_code_executor.py::test_directory_creation_cleanup
10.25s call     tests/code_executors/test_docker_jupyter_code_executor.py::test_canncellation[docker]
10.18s teardown tests/code_executors/test_docker_commandline_code_executor.py::test_commandline_code_executor_cancellation[docker]
10.12s teardown tests/code_executors/test_docker_commandline_code_executor.py::test_commandline_code_executor_timeout[docker]
10.12s teardown tests/code_executors/test_docker_commandline_code_executor.py::test_valid_relative_path[docker]
10.10s teardown tests/code_executors/test_docker_commandline_code_executor.py::test_execute_code[docker]
10.10s teardown tests/code_executors/test_docker_commandline_code_executor.py::test_invalid_relative_path[docker]
10.03s call     tests/test_worker_runtime.py::test_register_receives_publish_cascade_single_worker
7.51s call     tests/test_websurfer_agent.py::test_run_websurfer
6.44s call     tests/test_openai_assistant_agent.py::test_code_interpreter[openai]
6.20s call     tests/models/test_llama_cpp_model_client.py::test_llama_cpp_integration_non_streaming_structured_output
5.87s call     tests/models/test_llama_cpp_model_client.py::test_llama_cpp_integration_non_streaming
3.96s call     tests/models/test_openai_model_client.py::test_model_client_with_function_calling[gpt-4.1-nano]
3.71s call     tests/models/test_openai_model_client.py::test_model_client_basic_completion[gpt-4.1-nano]
3.10s call     tests/memory/test_chroma_memory.py::test_initialization
2.87s call     tests/models/test_openai_model_client.py::test_structured_output_with_streaming_tool_calls
2.81s call     tests/code_executors/test_docker_jupyter_code_executor.py::test_timeout[docker]
2.75s call     tests/models/test_openai_model_client.py::test_structured_output_with_streaming
2.72s call     tests/code_executors/test_jupyter_code_executor.py::test_commandline_code_executor_timeout
2.72s call     tests/code_executors/test_docker_jupyter_code_executor.py::test_execute_code_with_image_output
2.39s call     tests/memory/test_chroma_memory.py::test_content_types
2.01s call     tests/code_executors/test_docker_commandline_code_executor.py::test_commandline_code_executor_cancellation[docker]
1.95s call     tests/tools/test_mcp_tools.py::test_mcp_server_fetch
1.92s setup    tests/code_executors/test_docker_jupyter_code_executor.py::test_execute_code[docker]
1.87s call     tests/code_executors/test_docker_jupyter_code_executor.py::test_execute_code_and_persist_variable[docker]
1.82s call     tests/memory/test_chroma_memory.py::test_strict_matching
1.72s call     tests/code_executors/test_jupyter_code_executor.py::test_commandline_code_executor_cancellation
1.71s call     tests/test_playwright_controller.py::test_playwright_controller_click_id
1.68s call     tests/models/test_openai_model_client.py::test_openai_structured_output_with_streaming_tool_calls[gpt-4.1-nano]
1.64s call     tests/models/test_openai_model_client.py::test_openai_structured_output_with_tool_calls[gpt-4.1-nano]
1.39s call     tests/memory/test_chroma_memory.py::test_basic_workflow
1.34s call     tests/code_executors/test_jupyter_code_executor.py::test_jupyter_code_executor_serialization
1.25s call     tests/memory/test_chroma_memory.py::test_model_context_update
1.22s call     tests/memory/test_chroma_memory.py::test_metadata_handling
1.21s call     tests/code_executors/test_jupyter_code_executor.py::test_execute_code_after_restart
1.13s call     tests/code_executors/test_docker_jupyter_code_executor.py::test_start_stop
1.09s call     tests/tools/test_mcp_tools.py::test_mcp_server_filesystem
1.09s call     tests/models/test_openai_model_client.py::test_openai_structured_output[gpt-4.1-nano]
1.04s call     tests/models/test_openai_model_client.py::test_openai_structured_output_with_streaming[gpt-4.1-nano]
1.03s call     tests/models/test_openai_model_client.py::test_openai_structured_output_using_response_format[gpt-4.1-nano]
1.01s call     tests/code_executors/test_commandline_code_executor.py::test_commandline_code_executor_timeout[local]
1.01s call     tests/code_executors/test_commandline_code_executor.py::test_commandline_code_executor_cancellation
1.00s setup    tests/code_executors/test_docker_jupyter_code_executor.py::test_canncellation[docker]
1.00s setup    tests/code_executors/test_docker_jupyter_code_executor.py::test_execute_code_and_persist_variable[docker]

To Reproduce
poe test
pytest python/packages/autogen-ext/tests

Expected behavior
More fast...!

Which packages was the bug in?

Python Extensions (autogen-ext)

AutoGen library version.

Python dev (main branch)

Other library version.

No response

Model used

No response

Model provider

None

Other model provider

No response

Python version

None

.NET version

None

Operating system

None

@ekzhu ekzhu added help wanted Extra attention is needed proj-extensions and removed needs-triage labels Apr 23, 2025
@ekzhu
Copy link
Collaborator

ekzhu commented Apr 23, 2025

Let's first figure out why the docker tests are so slow.

Then, for docker code executor tests (both DockerCommandLineCodeExecutor and DockerJupyterCodeExecutor), I think we should create separate poe tasks to to run them, and have separate jobs in .github/workflows/checks.yml. See example test-grpc which is already separate:

test-grpc:
runs-on: ubuntu-latest

@SongChiYoung
Copy link
Contributor Author

SongChiYoung commented Apr 25, 2025

@ekzhu Cool.
I found that in my environment, each Docker build takes about 10 seconds.

In packages/autogen-ext/tests/code_executors/test_docker_commandline_code_executor.py,
there are 13 calls to DockerCommandLineCodeExecutor and 5 calls to the executor_and_temp_dir fixture.

So, now I see why the Docker tests are quite slow.

I’m testing sharing Docker containers between tests instead of creating a new one for each test.

@pytest_asyncio.fixture(scope="function") # type: ignore
async def executor_and_temp_dir(
request: pytest.FixtureRequest,
) -> AsyncGenerator[tuple[DockerCommandLineCodeExecutor, str], None]:
if not docker_tests_enabled():
pytest.skip("Docker tests are disabled")
with tempfile.TemporaryDirectory() as temp_dir:
async with DockerCommandLineCodeExecutor(work_dir=temp_dir) as executor:
yield executor, temp_dir

This results in Docker being rebuilt for each test.
I changed the scope to "session" like below to reuse the Docker container:

@pytest_asyncio.fixture(scope="session")  # type: ignore
async def executor_and_temp_dir() -> AsyncGenerator[tuple[DockerCommandLineCodeExecutor, str], None]:
    if not docker_tests_enabled():
        pytest.skip("Docker tests are disabled")

    with tempfile.TemporaryDirectory() as temp_dir:
        async with DockerCommandLineCodeExecutor(work_dir=temp_dir) as executor:
            yield executor, temp_dir

At packages/autogen-ext/tests/code_executors/test_docker_commandline_code_executor.py
As a result, the test duration improved from 161.66s to 110s.

Yup, it still needs a clean-up routine for the shared work_dir between tests.
Currently, this is just a suggestion, so I haven’t implemented it yet.

But I plan to implement it like this:

@pytest_asyncio.fixture(scope="function")
async def cleanup_after_test(executor_and_temp_dir, request):
    _, work_dir = executor_and_temp_dir
    def cleanup():
        reset_temp_dir(work_dir)
    request.addfinalizer(cleanup)
    yield

And change test usage like this:

@pytest.mark.asyncio
async def test_example(executor_and_temp_dir, cleanup_after_test):
    executor, tmp_dir = executor_and_temp_dir
    ....
    .... (Yes, It routine is same as before) 

Just sharing my ongoing experiment :)

I welcome any suggestions or help from others.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed proj-extensions
Projects
None yet
Development

No branches or pull requests

2 participants