Agent
[Omni-Modal Agent]
About this agent
An agent that has access to multi-modal tools to generate images, videos, and more
document-question-answering
image-captioning
image-question-answering
image-segmentation
speech-to-text
summarization
Use Cases
Image Captioning
Caption the images
Generate Images
Generate an image from a task
text-to-video
Generate videos from text
Requirements
Package | Installation |
---|---|
Swarms | pip3 install swarms |
Langchain Experimental | pip3 install langchain-experimental |
Agent Code
The main implementation code for this agent. You can view, copy, and use this code directly in your projects.
Agent Metadata (JSON)
All metadata and code for this agent, as a JSON object. Useful for programmatic use, export, or debugging.
{ "id": "326ec413-d903-4c9c-bad0-ba013b702d9a", "name": "[Omni-Modal Agent]", "title": "Agent", "description": "An agent that has access to multi-modal tools to generate images, videos, and more", "tags": [ "document-question-answering", "image-captioning", "image-question-answering", "image-segmentation", "speech-to-text", "summarization" ], "requirements": [ { "package": "Swarms", "installation": "pip3 install swarms" }, { "package": "Langchain Experimental", "installation": "pip3 install langchain-experimental" } ], "usecases": [ { "title": "Image Captioning", "description": "Caption the images" }, { "title": "Generate Images", "description": "Generate an image from a task" }, { "title": "text-to-video", "description": "Generate videos from text" } ], "userId": "6a5ca266-caff-46a5-8e29-fba2085e4e5f", "createdAt": "2024-06-16T21:58:19.204498+00:00", "links": [], "code": "from langchain.base_language import BaseLanguageModel\nfrom langchain_experimental.autonomous_agents.hugginggpt.repsonse_generator import (\n load_response_generator,\n)\nfrom langchain_experimental.autonomous_agents.hugginggpt.task_executor import (\n TaskExecutor,\n)\nfrom langchain_experimental.autonomous_agents.hugginggpt.task_planner import (\n load_chat_planner,\n)\nfrom transformers import load_tool\n\nfrom swarms.structs.agent import Agent\nfrom swarms.utils.loguru_logger import logger\n\n\nclass OmniModalAgent(Agent):\n \"\"\"\n OmniModalAgent\n LLM -> Plans -> Tasks -> Tools -> Response\n\n Architecture:\n 1. LLM: Language Model\n 2. Chat Planner: Plans\n\n Args:\n llm (BaseLanguageModel): Language Model\n tools (List[BaseTool]): List of tools\n\n Returns:\n str: response\n\n Usage:\n from swarms import OmniModalAgent, OpenAIChat,\n\n llm = OpenAIChat()\n agent = OmniModalAgent(llm)\n response = agent.run(\"Hello, how are you? Create an image of how your are doing!\")\n \"\"\"\n\n def __init__(\n self,\n llm: BaseLanguageModel,\n verbose: bool = False,\n *args,\n **kwargs,\n ):\n super().__init__(llm=llm, *args, **kwargs)\n self.llm = llm\n self.verbose = verbose\n\n print(\"Loading tools...\")\n self.tools = [\n load_tool(tool_name)\n for tool_name in [\n \"document-question-answering\",\n \"image-captioning\",\n \"image-question-answering\",\n \"image-segmentation\",\n \"speech-to-text\",\n \"summarization\",\n \"text-classification\",\n \"text-question-answering\",\n \"translation\",\n \"huggingface-tools/text-to-image\",\n \"huggingface-tools/text-to-video\",\n \"text-to-speech\",\n \"huggingface-tools/text-download\",\n \"huggingface-tools/image-transformation\",\n ]\n ]\n\n # Load the chat planner and response generator\n self.chat_planner = load_chat_planner(llm)\n self.response_generator = load_response_generator(llm)\n self.task_executor = TaskExecutor\n self.history = []\n\n def run(self, task: str) -> str:\n \"\"\"Run the OmniAgent\"\"\"\n try:\n plan = self.chat_planner.plan(\n inputs={\n \"input\": task,\n \"hf_tools\": self.tools,\n }\n )\n self.task_executor = TaskExecutor(plan)\n self.task_executor.run()\n\n response = self.response_generator.generate(\n {\"task_execution\": self.task_executor}\n )\n\n return response\n except Exception as error:\n logger.error(f\"Error running the agent: {error}\")\n return f\"Error running the agent: {error}\"\n" }
Comments & Discussion
Items You'd Like
Check out similar agents that match your interests