Agent

[Omni-Modal Agent]

About this agent

An agent that has access to multi-modal tools to generate images, videos, and more

document-question-answering
image-captioning
image-question-answering
image-segmentation
speech-to-text
summarization

Use Cases

Image Captioning

Caption the images

Generate Images

Generate an image from a task

text-to-video

Generate videos from text

Requirements

PackageInstallation
Swarmspip3 install swarms
Langchain Experimentalpip3 install langchain-experimental

Agent Code

The main implementation code for this agent. You can view, copy, and use this code directly in your projects.

Agent Metadata (JSON)

All metadata and code for this agent, as a JSON object. Useful for programmatic use, export, or debugging.

{
  "id": "326ec413-d903-4c9c-bad0-ba013b702d9a",
  "name": "[Omni-Modal Agent]",
  "title": "Agent",
  "description": "An agent that has access to multi-modal tools to generate images, videos, and more",
  "tags": [
    "document-question-answering",
    "image-captioning",
    "image-question-answering",
    "image-segmentation",
    "speech-to-text",
    "summarization"
  ],
  "requirements": [
    {
      "package": "Swarms",
      "installation": "pip3 install swarms"
    },
    {
      "package": "Langchain Experimental",
      "installation": "pip3 install langchain-experimental"
    }
  ],
  "usecases": [
    {
      "title": "Image Captioning",
      "description": "Caption the images"
    },
    {
      "title": "Generate Images",
      "description": "Generate an image from a task"
    },
    {
      "title": "text-to-video",
      "description": "Generate videos from text"
    }
  ],
  "userId": "6a5ca266-caff-46a5-8e29-fba2085e4e5f",
  "createdAt": "2024-06-16T21:58:19.204498+00:00",
  "links": [],
  "code": "from langchain.base_language import BaseLanguageModel\nfrom langchain_experimental.autonomous_agents.hugginggpt.repsonse_generator import (\n    load_response_generator,\n)\nfrom langchain_experimental.autonomous_agents.hugginggpt.task_executor import (\n    TaskExecutor,\n)\nfrom langchain_experimental.autonomous_agents.hugginggpt.task_planner import (\n    load_chat_planner,\n)\nfrom transformers import load_tool\n\nfrom swarms.structs.agent import Agent\nfrom swarms.utils.loguru_logger import logger\n\n\nclass OmniModalAgent(Agent):\n    \"\"\"\n    OmniModalAgent\n    LLM -> Plans -> Tasks -> Tools -> Response\n\n    Architecture:\n    1. LLM: Language Model\n    2. Chat Planner: Plans\n\n    Args:\n        llm (BaseLanguageModel): Language Model\n        tools (List[BaseTool]): List of tools\n\n    Returns:\n        str: response\n\n    Usage:\n    from swarms import OmniModalAgent, OpenAIChat,\n\n    llm = OpenAIChat()\n    agent = OmniModalAgent(llm)\n    response = agent.run(\"Hello, how are you? Create an image of how your are doing!\")\n    \"\"\"\n\n    def __init__(\n        self,\n        llm: BaseLanguageModel,\n        verbose: bool = False,\n        *args,\n        **kwargs,\n    ):\n        super().__init__(llm=llm, *args, **kwargs)\n        self.llm = llm\n        self.verbose = verbose\n\n        print(\"Loading tools...\")\n        self.tools = [\n            load_tool(tool_name)\n            for tool_name in [\n                \"document-question-answering\",\n                \"image-captioning\",\n                \"image-question-answering\",\n                \"image-segmentation\",\n                \"speech-to-text\",\n                \"summarization\",\n                \"text-classification\",\n                \"text-question-answering\",\n                \"translation\",\n                \"huggingface-tools/text-to-image\",\n                \"huggingface-tools/text-to-video\",\n                \"text-to-speech\",\n                \"huggingface-tools/text-download\",\n                \"huggingface-tools/image-transformation\",\n            ]\n        ]\n\n        # Load the chat planner and response generator\n        self.chat_planner = load_chat_planner(llm)\n        self.response_generator = load_response_generator(llm)\n        self.task_executor = TaskExecutor\n        self.history = []\n\n    def run(self, task: str) -> str:\n        \"\"\"Run the OmniAgent\"\"\"\n        try:\n            plan = self.chat_planner.plan(\n                inputs={\n                    \"input\": task,\n                    \"hf_tools\": self.tools,\n                }\n            )\n            self.task_executor = TaskExecutor(plan)\n            self.task_executor.run()\n\n            response = self.response_generator.generate(\n                {\"task_execution\": self.task_executor}\n            )\n\n            return response\n        except Exception as error:\n            logger.error(f\"Error running the agent: {error}\")\n            return f\"Error running the agent: {error}\"\n"
}

Comments & Discussion

Items You'd Like

Check out similar agents that match your interests