Merge branch 'harry0703:main' into main

2026-02-21 16:37:21 +08:00 · 2024-04-09 16:04:36 +07:00 · 2024-04-09 16:04:36 +07:00 · 3fe6ff42c8
commit 3fe6ff42c8
parent a71e54fd7c 68c7ea13c7
34 changed files with 2589 additions and 436 deletions
--- a/.dockerignore
+++ b/.dockerignore
@ -0,0 +1,23 @@
+# Exclude common Python files and directories
+venv/
+__pycache__/
+*.pyc
+*.pyo
+*.pyd
+*.pyz
+*.pyw
+*.pyi
+*.egg-info/
+
+# Exclude development and local files
+.env
+.env.*
+*.log
+*.db
+
+# Exclude version control system files
+.git/
+.gitignore
+.svn/
+
+storage/
--- a/.gitignore
+++ b/.gitignore
@ -1,3 +1,12 @@
+.DS_Store
 /config.toml
 /storage/
-/.idea/
+/.idea/
+/app/services/__pycache__
+/app/__pycache__/
+/app/config/__pycache__/
+/app/models/__pycache__/
+/app/utils/__pycache__/
+/*/__pycache__/*
+.vscode
+/**/.streamlit
--- a/37
+++ b/37
@ -0,0 +1,37 @@
+# Use an official Python runtime as a parent image
+FROM python:3.10-slim
+
+# Set the working directory in the container
+WORKDIR /MoneyPrinterTurbo
+
+ENV PYTHONPATH="/MoneyPrinterTurbo:$PYTHONPATH"
+
+# Install system dependencies
+RUN apt-get update && apt-get install -y \
+    git \
+    imagemagick \
+    ffmpeg \
+    && rm -rf /var/lib/apt/lists/*
+
+# Fix security policy for ImageMagick
+RUN sed -i '/<policy domain="path" rights="none" pattern="@\*"/d' /etc/ImageMagick-6/policy.xml
+
+# Copy the current directory contents into the container at /MoneyPrinterTurbo
+COPY ./app ./app
+COPY ./webui ./webui
+COPY ./resource ./resource
+COPY ./requirements.txt ./requirements.txt
+COPY ./main.py ./main.py
+
+# Install Python dependencies
+RUN pip install --no-cache-dir -r requirements.txt
+
+# Expose the port the app runs on
+EXPOSE 8501
+
+# Command to run the application
+CMD ["streamlit", "run", "./webui/Main.py","--browser.serverAddress=0.0.0.0","--server.enableCORS=True","--browser.gatherUsageStats=False"]
+
+# At runtime, mount the config.toml file from the host into the container
+# using Docker volumes. Example usage:
+# docker run -v ./config.toml:/MoneyPrinterTurbo/config.toml -v ./storage:/MoneyPrinterTurbo/storage -p 8501:8501 moneyprinterturbo
--- a/README-en.md
+++ b/README-en.md
@ -1,21 +1,38 @@
-# MoneyPrinterTurbo 💸
+<div align="center">
+<h1 align="center">MoneyPrinterTurbo 💸</h1>
+
+<p align="center">
+  <a href="https://github.com/harry0703/MoneyPrinterTurbo/stargazers"><img src="https://img.shields.io/github/stars/harry0703/MoneyPrinterTurbo.svg?style=for-the-badge" alt="Stargazers"></a>
+  <a href="https://github.com/harry0703/MoneyPrinterTurbo/issues"><img src="https://img.shields.io/github/issues/harry0703/MoneyPrinterTurbo.svg?style=for-the-badge" alt="Issues"></a>
+  <a href="https://github.com/harry0703/MoneyPrinterTurbo/network/members"><img src="https://img.shields.io/github/forks/harry0703/MoneyPrinterTurbo.svg?style=for-the-badge" alt="Forks"></a>
+  <a href="https://github.com/harry0703/MoneyPrinterTurbo/blob/main/LICENSE"><img src="https://img.shields.io/github/license/harry0703/MoneyPrinterTurbo.svg?style=for-the-badge" alt="License"></a>
+</p>
+
+<h3>English | <a href="README.md">简体中文</a></h3>

-[Chinese 简体中文](README.md)

 > Thanks to [RootFTW](https://github.com/Root-FTW) for the translation


-Simply provide a **topic** or **keyword** for a video, and it will automatically generate the video copy, video
+Simply provide a <b>topic</b> or <b>keyword</b> for a video, and it will automatically generate the video copy, video
 materials, video subtitles, and video background music before synthesizing a high-definition short video.

-![](docs/webui.jpg)
+### WebUI
+
+![](docs/webui-en.jpg)
+
+### API Interface
+
+![](docs/api.jpg)
+
+</div>

 ## Special Thanks 🙏

 Due to the **deployment** and **usage** of this project, there is a certain threshold for some beginner users. We would
 like to express our special thanks to

-**LuKa (AI Intelligent Multimedia Service Platform)** for providing a free `AI Video Generator` service based on this
+**RecCloud (AI-Powered Multimedia Service Platform)** for providing a free `AI Video Generator` service based on this
 project. It allows for online use without deployment, which is very convenient.

 https://reccloud.com
@ -24,7 +41,8 @@ https://reccloud.com

 ## Features 🎯

- [x] Complete **MVC architecture**, **clearly structured** code, easy to maintain, supports both API and Web interface
+- [x] Complete **MVC architecture**, **clearly structured** code, easy to maintain, supports both `API`
+  and `Web interface`
 - [x] Supports **AI-generated** video copy, as well as **customized copy**
 - [x] Supports various **high-definition video** sizes
    - [x] Portrait 9:16, `1080x1920`
@ -38,42 +56,110 @@ https://reccloud.com
  supports `subtitle outlining`
 - [x] Supports **background music**, either random or specified music files, with adjustable `background music volume`
 - [x] Video material sources are **high-definition** and **royalty-free**
- [x] Supports integration with various models such as **OpenAI**, **moonshot**, **Azure**, **gpt4free**, **one-api**, *
-  *qianwen**
-  and more
+- [x] Supports integration with various models such as **OpenAI**, **moonshot**, **Azure**, **gpt4free**, **one-api**,
+  **qianwen**, **Google Gemini**, **Ollama** and more
+
+ ❓[How to Use the Free OpenAI GPT-3.5 Model?](https://github.com/harry0703/MoneyPrinterTurbo/blob/main/README-en.md#common-questions-)
+

 ### Future Plans 📅

- [ ] Support for GPT-SoVITS dubbing
- [ ] Optimize voice synthesis using large models to make the synthesized voice sound more natural and emotionally rich
- [ ] Add video transition effects to make the viewing experience smoother
- [ ] Optimize the relevance of video materials
- [ ] OLLAMA support
+- [ ] Introduce support for GPT-SoVITS dubbing
+- [ ] Enhance voice synthesis with large models for a more natural and emotionally resonant voice output
+- [ ] Incorporate video transition effects to ensure a smoother viewing experience
+- [ ] Improve the relevance of video content
+- [ ] Add options for video length: short, medium, long
+- [ ] Package the application into a one-click launch bundle for Windows and macOS for ease of use
+- [ ] Enable the use of custom materials
+- [ ] Offer voiceover and background music options with real-time preview
+- [ ] Support a wider range of voice synthesis providers, such as OpenAI TTS, Azure TTS
+- [ ] Automate the upload process to the YouTube platform

 ## Video Demos 📺

 ### Portrait 9:16

-▶️ How to Add Fun to Your Life
-
-https://github.com/harry0703/MoneyPrinterTurbo/assets/4928832/a84d33d5-27a2-4aba-8fd0-9fb2bd91c6a6
-
-▶️ What is the Meaning of Life
-
-https://github.com/harry0703/MoneyPrinterTurbo/assets/4928832/112c9564-d52b-4472-99ad-970b75f66476
+<table>
+<thead>
+<tr>
+<th align="center"><g-emoji class="g-emoji" alias="arrow_forward">▶️</g-emoji> How to Add Fun to Your Life </th>
+<th align="center"><g-emoji class="g-emoji" alias="arrow_forward">▶️</g-emoji> What is the Meaning of Life</th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td align="center"><video src="https://github.com/harry0703/MoneyPrinterTurbo/assets/4928832/a84d33d5-27a2-4aba-8fd0-9fb2bd91c6a6"></video></td>
+<td align="center"><video src="https://github.com/harry0703/MoneyPrinterTurbo/assets/4928832/112c9564-d52b-4472-99ad-970b75f66476"></video></td>
+</tr>
+</tbody>
+</table>

 ### Landscape 16:9

-▶️ What is the Meaning of Life
-
-https://github.com/harry0703/MoneyPrinterTurbo/assets/4928832/346ebb15-c55f-47a9-a653-114f08bb8073
-
-▶️ Why Exercise
-
-https://github.com/harry0703/MoneyPrinterTurbo/assets/4928832/271f2fae-8283-44a0-8aa0-0ed8f9a6fa87
+<table>
+<thead>
+<tr>
+<th align="center"><g-emoji class="g-emoji" alias="arrow_forward">▶️</g-emoji> What is the Meaning of Life</th>
+<th align="center"><g-emoji class="g-emoji" alias="arrow_forward">▶️</g-emoji> Why Exercise</th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td align="center"><video src="https://github.com/harry0703/MoneyPrinterTurbo/assets/4928832/346ebb15-c55f-47a9-a653-114f08bb8073"></video></td>
+<td align="center"><video src="https://github.com/harry0703/MoneyPrinterTurbo/assets/4928832/271f2fae-8283-44a0-8aa0-0ed8f9a6fa87"></video></td>
+</tr>
+</tbody>
+</table>

 ## Installation & Deployment 📥

+- Try to avoid using **Chinese paths** to prevent unpredictable issues
+- Ensure your **network** is stable, meaning you can access foreign websites normally
+
+#### ① Clone the Project
+
+```shell
+git clone https://github.com/harry0703/MoneyPrinterTurbo.git
+```
+
+#### ② Modify the Configuration File
+
+- Copy the `config.example.toml` file and rename it to `config.toml`
+- Follow the instructions in the `config.toml` file to configure `pexels_api_keys` and `llm_provider`, and according to
+  the llm_provider's service provider, set up the corresponding API Key
+
+#### ③ Configure Large Language Models (LLM)
+
+- To use `GPT-4.0` or `GPT-3.5`, you need an `API Key` from `OpenAI`. If you don't have one, you can set `llm_provider`
+  to `g4f` (a free-to-use GPT library https://github.com/xtekky/gpt4free)
+
+### Docker Deployment 🐳
+
+#### ① Launch the Docker Container
+
+If you haven't installed Docker, please install it first https://www.docker.com/products/docker-desktop/
+If you are using a Windows system, please refer to Microsoft's documentation:
+
+1. https://learn.microsoft.com/en-us/windows/wsl/install
+2. https://learn.microsoft.com/en-us/windows/wsl/tutorials/wsl-containers
+
+```shell
+cd MoneyPrinterTurbo
+docker-compose up
+```
+
+#### ② Access the Web Interface
+
+Open your browser and visit http://0.0.0.0:8501
+
+#### ③ Access the API Interface
+
+Open your browser and visit http://0.0.0.0:8080/docs Or http://0.0.0.0:8080/redoc
+
+### Manual Deployment 📦
+
+#### ① Create a Python Virtual Environment
+
 It is recommended to create a Python virtual environment
 using [conda](https://conda.io/projects/conda/en/latest/user-guide/install/index.html)

@ -85,71 +171,45 @@ conda activate MoneyPrinterTurbo
 pip install -r requirements.txt
 ```

-## Quick Start 🚀
+#### ② Install ImageMagick

-### Video Tutorials
-
- Complete usage demonstration: https://v.douyin.com/iFhnwsKY/
- How to deploy on Windows: https://v.douyin.com/iFyjoW3M
-
-### Prerequisites
-
- Try to avoid using **Chinese paths** to prevent unpredictable issues
- Ensure your **network** is stable, meaning you can access foreign websites normally
-
-#### ① Install ImageMagick
-
-##### Windows:
+###### Windows:

 - Download https://imagemagick.org/archive/binaries/ImageMagick-7.1.1-29-Q16-x64-static.exe
- Install the downloaded ImageMagick, do not change the installation path
+- Install the downloaded ImageMagick, **do not change the installation path**
+- Modify the `config.toml` configuration file, set `imagemagick_path` to your actual installation path (if you didn't
+  change the path during installation, just uncomment it)

-##### MacOS:
+###### MacOS:

 ```shell
 brew install imagemagick
 ````

-##### Ubuntu
+###### Ubuntu

 ```shell
 sudo apt-get install imagemagick
 ```

-##### CentOS
+###### CentOS

 ```shell
 sudo yum install ImageMagick
 ```

-#### ② Modify the Configuration File
-
- Copy the `config.example.toml` file and rename it to `config.toml`
- Follow the instructions in the `config.toml` file to configure `pexels_api_keys` and `llm_provider`, and according to
-  the llm_provider's service provider, set up the corresponding API Key
- If it's a `Windows` system, `imagemagick_path` is your actual installation path (if you didn't change the path during
-  installation, just uncomment it)
-
-#### ③ Configure Large Language Models (LLM)
-
- To use `GPT-4.0` or `GPT-3.5`, you need an `API Key` from `OpenAI`. If you don't have one, you can set `llm_provider`
-  to `g4f` (a free-to-use GPT library https://github.com/xtekky/gpt4free)
- Alternatively, you can apply at [Moonshot](https://platform.moonshot.cn/console/api-keys). Register to get 15 yuan of
-  trial money, which allows for about 1500 conversations. Then set `llm_provider="moonshot"` and `moonshot_api_key`.
-  Thanks to [@jerryblues](https://github.com/harry0703/MoneyPrinterTurbo/issues/8) for the suggestion
-
-### Launch the Web Interface 🌐
+#### ③ Launch the Web Interface 🌐

 Note that you need to execute the following commands in the `root directory` of the MoneyPrinterTurbo project

-#### Windows
+###### Windows

 ```bat
 conda activate MoneyPrinterTurbo
 webui.bat
 ```

-#### MacOS or Linux
+###### MacOS or Linux

 ```shell
 conda activate MoneyPrinterTurbo
@ -158,10 +218,7 @@ sh webui.sh

 After launching, the browser will open automatically

-The effect is shown in the following image:
-![](docs/webui.jpg)
-
-### Launch the API Service 🚀
+#### ④ Launch the API Service 🚀

 ```shell
 python main.py
@ -170,9 +227,6 @@ python main.py
 After launching, you can view the `API documentation` at http://127.0.0.1:8080/docs and directly test the interface
 online for a quick experience.

-The effect is shown in the following image:
-![](docs/api.jpg)
-
 ## Voice Synthesis 🗣

 A list of all supported voices can be viewed here: [Voice List](./docs/voice-list.txt)
@ -206,6 +260,20 @@ own fonts.

 ## Common Questions 🤔

+### ❓How to Use the Free OpenAI GPT-3.5 Model?
+[OpenAI has announced that ChatGPT with 3.5 is now free](https://openai.com/blog/start-using-chatgpt-instantly), and developers have wrapped it into an API for direct usage.
+
+**Ensure you have Docker installed and running**. Execute the following command to start the Docker service:
+```shell
+docker run -p 3040:3040 missuo/freegpt35
+```
+Once successfully started, modify the `config.toml` configuration as follows:
+
+- Set `llm_provider` to `openai`
+- Fill in `openai_api_key` with any value, for example, '123456'
+- Change `openai_base_url` to `http://localhost:3040/v1/`
+- Set `openai_model_name` to `gpt-3.5-turbo`
+
 ### ❓RuntimeError: No ffmpeg exe could be found

 Normally, ffmpeg will be automatically downloaded and detected.
@ -248,7 +316,7 @@ This is likely due to network issues preventing access to foreign services. Plea
 [issue 33](https://github.com/harry0703/MoneyPrinterTurbo/issues/33)

 1. Follow the `example configuration` provided `download address` to
-   install https://imagemagick.org/archive/binaries/ImageMagick-7.1.1-29-Q16-x64-static.exe, using the static library
+   install https://imagemagick.org/archive/binaries/ImageMagick-7.1.1-30-Q16-x64-static.exe, using the static library
 2. Do not install in a path with Chinese characters to avoid unpredictable issues

 [issue 54](https://github.com/harry0703/MoneyPrinterTurbo/issues/54#issuecomment-2017842022)
@ -270,3 +338,7 @@ optimizations and added functionalities. Thanks to the original author for their
 ## License 📝

 Click to view the [`LICENSE`](LICENSE) file
+
+## Star History
+
+[![Star History Chart](https://api.star-history.com/svg?repos=harry0703/MoneyPrinterTurbo&type=Date)](https://star-history.com/#harry0703/MoneyPrinterTurbo&Date)
--- a/README.md
+++ b/README.md
@ -1,11 +1,28 @@
-# MoneyPrinterTurbo 💸
+<div align="center">
+<h1 align="center">MoneyPrinterTurbo 💸</h1>

-[English](README-en.md)
+<p align="center">
+  <a href="https://github.com/harry0703/MoneyPrinterTurbo/stargazers"><img src="https://img.shields.io/github/stars/harry0703/MoneyPrinterTurbo.svg?style=for-the-badge" alt="Stargazers"></a>
+  <a href="https://github.com/harry0703/MoneyPrinterTurbo/issues"><img src="https://img.shields.io/github/issues/harry0703/MoneyPrinterTurbo.svg?style=for-the-badge" alt="Issues"></a>
+  <a href="https://github.com/harry0703/MoneyPrinterTurbo/network/members"><img src="https://img.shields.io/github/forks/harry0703/MoneyPrinterTurbo.svg?style=for-the-badge" alt="Forks"></a>
+  <a href="https://github.com/harry0703/MoneyPrinterTurbo/blob/main/LICENSE"><img src="https://img.shields.io/github/license/harry0703/MoneyPrinterTurbo.svg?style=for-the-badge" alt="License"></a>
+</p>
+<br>
+<h3>简体中文 | <a href="README-en.md">English</a></h3>
+<br>
+只需提供一个视频 <b>主题</b> 或 <b>关键词</b> ，就可以全自动生成视频文案、视频素材、视频字幕、视频背景音乐，然后合成一个高清的短视频。
+<br>

-只需提供一个视频 **主题** 或 **关键词** ，就可以全自动生成视频文案、视频素材、视频字幕、视频背景音乐，然后合成一个高清的短视频。
+<h4>Web界面</h4>

 ![](docs/webui.jpg)

+<h4>API界面</h4>
+
+![](docs/api.jpg)
+
+</div>
+
 ## 特别感谢 🙏

 由于该项目的 **部署** 和 **使用**，对于一些小白用户来说，还是 **有一定的门槛**，在此特别感谢
@ -19,7 +36,7 @@

 ## 功能特性 🎯

- [x] 完整的 **MVC架构**，代码 **结构清晰**，易于维护，支持API和Web界面
+- [x] 完整的 **MVC架构**，代码 **结构清晰**，易于维护，支持 `API` 和 `Web界面`
 - [x] 支持视频文案 **AI自动生成**，也可以**自定义文案**
 - [x] 支持多种 **高清视频** 尺寸
    - [x] 竖屏 9:16，`1080x1920`
@ -31,40 +48,117 @@
 - [x] 支持 **字幕生成**，可以调整 `字体`、`位置`、`颜色`、`大小`，同时支持`字幕描边`设置
 - [x] 支持 **背景音乐**，随机或者指定音乐文件，可设置`背景音乐音量`
 - [x] 视频素材来源 **高清**，而且 **无版权**
- [x] 支持 **OpenAI**、**moonshot**、**Azure**、**gpt4free**、**one-api**、**通义千问** 等多种模型接入
+- [x] 支持 **OpenAI**、**moonshot**、**Azure**、**gpt4free**、**one-api**、**通义千问**、**Google Gemini**、**Ollama** 等多种模型接入
+
+ ❓[如何使用免费的 **OpenAI GPT-3.5** 模型?](https://github.com/harry0703/MoneyPrinterTurbo?tab=readme-ov-file#%E5%B8%B8%E8%A7%81%E9%97%AE%E9%A2%98-)

 ### 后期计划 📅

 - [ ] GPT-SoVITS 配音支持
 - [ ] 优化语音合成，利用大模型，使其合成的声音，更加自然，情绪更加丰富
 - [ ] 增加视频转场效果，使其看起来更加的流畅
- [ ] 优化视频素材的匹配度
- [ ] OLLAMA 支持
+- [ ] 增加更多视频素材来源，优化视频素材和文案的匹配度
+- [ ] 增加视频长度选项：短、中、长
+- [ ] 打包成一键启动包（Windows，macOS），方便使用
+- [ ] 增加免费网络代理，让访问OpenAI和素材下载不再受限
+- [ ] 可以使用自己的素材
+- [ ] 朗读声音和背景音乐，提供实时试听
+- [ ] 支持更多的语音合成服务商，比如 OpenAI TTS, Azure TTS
+- [ ] 自动上传到YouTube平台

 ## 视频演示 📺

 ### 竖屏 9:16

-▶️ 《如何增加生活的乐趣》
-
-https://github.com/harry0703/MoneyPrinterTurbo/assets/4928832/a84d33d5-27a2-4aba-8fd0-9fb2bd91c6a6
-
-▶️ 《生命的意义是什么》
-
-https://github.com/harry0703/MoneyPrinterTurbo/assets/4928832/112c9564-d52b-4472-99ad-970b75f66476
+<table>
+<thead>
+<tr>
+<th align="center"><g-emoji class="g-emoji" alias="arrow_forward">▶️</g-emoji> 《如何增加生活的乐趣》</th>
+<th align="center"><g-emoji class="g-emoji" alias="arrow_forward">▶️</g-emoji> 《生命的意义是什么》</th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td align="center"><video src="https://github.com/harry0703/MoneyPrinterTurbo/assets/4928832/a84d33d5-27a2-4aba-8fd0-9fb2bd91c6a6"></video></td>
+<td align="center"><video src="https://github.com/harry0703/MoneyPrinterTurbo/assets/4928832/112c9564-d52b-4472-99ad-970b75f66476"></video></td>
+</tr>
+</tbody>
+</table>

 ### 横屏 16:9

-▶️《生命的意义是什么》
-
-https://github.com/harry0703/MoneyPrinterTurbo/assets/4928832/346ebb15-c55f-47a9-a653-114f08bb8073
-
-▶️《为什么要运动》
-
-https://github.com/harry0703/MoneyPrinterTurbo/assets/4928832/271f2fae-8283-44a0-8aa0-0ed8f9a6fa87
+<table>
+<thead>
+<tr>
+<th align="center"><g-emoji class="g-emoji" alias="arrow_forward">▶️</g-emoji>《生命的意义是什么》</th>
+<th align="center"><g-emoji class="g-emoji" alias="arrow_forward">▶️</g-emoji>《为什么要运动》</th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td align="center"><video src="https://github.com/harry0703/MoneyPrinterTurbo/assets/4928832/346ebb15-c55f-47a9-a653-114f08bb8073"></video></td>
+<td align="center"><video src="https://github.com/harry0703/MoneyPrinterTurbo/assets/4928832/271f2fae-8283-44a0-8aa0-0ed8f9a6fa87"></video></td>
+</tr>
+</tbody>
+</table>

 ## 安装部署 📥

+- 尽量不要使用 **中文路径**，避免出现一些无法预料的问题
+- 请确保你的 **网络** 是正常的，VPN需要打开`全局流量`模式
+
+#### ① 克隆代码
+
+```shell
+git clone https://github.com/harry0703/MoneyPrinterTurbo.git
+```
+
+#### ② 修改配置文件
+
+- 将 `config.example.toml` 文件复制一份，命名为 `config.toml`
+- 按照 `config.toml` 文件中的说明，配置好 `pexels_api_keys` 和 `llm_provider`，并根据 llm_provider 对应的服务商，配置相关的
+  API Key
+
+#### ③ 配置大模型(LLM)
+
+- 如果要使用 `GPT-4.0` 或 `GPT-3.5`，需要有 `OpenAI` 的 `API Key`，如果没有，可以将 `llm_provider` 设置为 `g4f` (
+  一个免费使用GPT的开源库 https://github.com/xtekky/gpt4free ，但是该免费的服务，稳定性较差，有时候可以用，有时候用不了)
+- 或者可以使用到 [月之暗面](https://platform.moonshot.cn/console/api-keys) 申请。注册就送
+  15元体验金，可以对话1500次左右。然后设置 `llm_provider="moonshot"` 和 `moonshot_api_key`
+- 也可以使用 通义千问，具体请看配置文件里面的注释说明
+
+### Docker部署 🐳
+
+#### ① 启动Docker
+
+如果未安装 Docker，请先安装 https://www.docker.com/products/docker-desktop/
+
+如果是Windows系统，请参考微软的文档：
+1. https://learn.microsoft.com/zh-cn/windows/wsl/install
+2. https://learn.microsoft.com/zh-cn/windows/wsl/tutorials/wsl-containers
+
+```shell
+cd MoneyPrinterTurbo
+docker-compose up
+```
+
+#### ② 访问Web界面
+
+打开浏览器，访问 http://0.0.0.0:8501
+
+#### ③ 访问API文档
+
+打开浏览器，访问 http://0.0.0.0:8080/docs 或者 http://0.0.0.0:8080/redoc
+
+### 手动部署 📦
+
+> 视频教程
+
+- 完整的使用演示：https://v.douyin.com/iFhnwsKY/
+- 如何在Windows上部署：https://v.douyin.com/iFyjoW3M
+
+#### ① 创建虚拟环境
+
 建议使用 [conda](https://conda.io/projects/conda/en/latest/user-guide/install/index.html) 创建 python 虚拟环境

 ```shell
@ -75,91 +169,58 @@ conda activate MoneyPrinterTurbo
 pip install -r requirements.txt
 ```

-## 快速使用 🚀
+#### ② 安装好 ImageMagick

-### 视频教程
+###### Windows:

- 完整的使用演示：https://v.douyin.com/iFhnwsKY/
- 如何在Windows上部署：https://v.douyin.com/iFyjoW3M
-
-### 前提
-
- 尽量不要使用 **中文路径**，避免出现一些无法预料的问题
- 请确保你的 **网络** 是正常的，即可以正常访问境外网站
-
-#### ① 安装好 ImageMagick
-
-##### Windows:
-
- 下载 https://imagemagick.org/archive/binaries/ImageMagick-7.1.1-29-Q16-x64-static.exe
+- 下载 https://imagemagick.org/archive/binaries/ImageMagick-7.1.1-30-Q16-x64-static.exe
 - 安装下载好的 ImageMagick，注意不要修改安装路径
+- 修改 `配置文件 config.toml` 中的 `imagemagick_path` 为你的实际安装路径（如果安装的时候没有修改路径，直接取消注释即可）

-##### MacOS:
+###### MacOS:

 ```shell
 brew install imagemagick
 ````

-##### Ubuntu
+###### Ubuntu

 ```shell
 sudo apt-get install imagemagick
 ```

-##### CentOS
+###### CentOS

 ```shell
 sudo yum install ImageMagick
 ```

-#### ② 修改配置文件
-
- 将 `config.example.toml` 文件复制一份，命名为 `config.toml`
- 按照 `config.toml` 文件中的说明，配置好 `pexels_api_keys` 和 `llm_provider`，并根据 llm_provider 对应的服务商，配置相关的
-  API Key
- 如果是`Windows`系统，`imagemagick_path` 为你的实际安装路径（如果安装的时候没有修改路径，直接取消注释即可）
-
-#### ③ 配置大模型(LLM)
-
- 如果要使用 `GPT-4.0` 或 `GPT-3.5`，需要有 `OpenAI` 的 `API Key`，如果没有，可以将 `llm_provider` 设置为 `g4f` (
-  一个免费使用GPT的开源库 https://github.com/xtekky/gpt4free)
- 或者可以使用到 [月之暗面](https://platform.moonshot.cn/console/api-keys) 申请。注册就送
-  15元体验金，可以对话1500次左右。然后设置 `llm_provider="moonshot"` 和 `moonshot_api_key`
-  。感谢 [@jerryblues](https://github.com/harry0703/MoneyPrinterTurbo/issues/8) 的建议
-
-### 启动Web界面 🌐
+#### ③ 启动Web界面 🌐

 注意需要到 MoneyPrinterTurbo 项目 `根目录` 下执行以下命令

-#### Windows
+###### Windows

 ```bat
 conda activate MoneyPrinterTurbo
 webui.bat
 ```

-#### MacOS or Linux
+###### MacOS or Linux

 ```shell
 conda activate MoneyPrinterTurbo
 sh webui.sh
 ```
-
 启动后，会自动打开浏览器

-效果如下图：
-![](docs/webui.jpg)
-
-### 启动API服务 🚀
+#### ④ 启动API服务 🚀

 ```shell
 python main.py
 ```

-启动后，可以查看 `API文档` http://127.0.0.1:8080/docs 直接在线调试接口，快速体验。
-
-效果如下图：
-![](docs/api.jpg)
+启动后，可以查看 `API文档` http://127.0.0.1:8080/docs 或者 http://127.0.0.1:8080/redoc 直接在线调试接口，快速体验。

 ## 语音合成 🗣

@ -170,13 +231,15 @@ python main.py
 当前支持2种字幕生成方式：

 - edge: 生成速度更快，性能更好，对电脑配置没有要求，但是质量可能不稳定
- whisper: 生成速度较慢，性能较差，对电脑配置有一定要求，但是质量更可靠
+- whisper: 生成速度较慢，性能较差，对电脑配置有一定要求，但是质量更可靠。

 可以修改 `config.toml` 配置文件中的 `subtitle_provider` 进行切换

 建议使用 `edge` 模式，如果生成的字幕质量不好，再切换到 `whisper` 模式

-> 如果留空，表示不生成字幕。
+> 注意：
+1. whisper 模式下需要到 HuggingFace 下载一个模型文件，大约 3GB 左右，请确保网络通畅
+2. 如果留空，表示不生成字幕。

 ## 背景音乐 🎵

@ -189,6 +252,26 @@ python main.py

 ## 常见问题 🤔

+### ❓如何使用免费的OpenAI GPT-3.5模型?
+[OpenAI宣布ChatGPT里面3.5已经免费了](https://openai.com/blog/start-using-chatgpt-instantly)，有开发者将其封装成了API，可以直接调用
+
+**确保你安装和启动了docker服务**，执行以下命令启动docker服务
+```shell
+docker run -p 3040:3040 missuo/freegpt35
+```
+启动成功后，修改 `config.toml` 中的配置
+- `llm_provider` 设置为 `openai`
+- `openai_api_key` 随便填写一个即可，比如 '123456'
+- `openai_base_url` 改为 `http://localhost:3040/v1/`
+- `openai_model_name` 改为 `gpt-3.5-turbo`
+
+
+### ❓AttributeError: 'str' object has no attribute 'choices'`
+
+这个问题是由于 OpenAI 或者其他 LLM，没有返回正确的回复导致的。
+
+大概率是网络原因， 使用 **VPN**，或者设置 `openai_base_url` 为你的代理 ，应该就可以解决了。
+
 ### ❓RuntimeError: No ffmpeg exe could be found

 通常情况下，ffmpeg 会被自动下载，并且会被自动检测到。
@ -239,6 +322,55 @@ if you are in China, please use a VPN.

 感谢 [@wangwenqiao666](https://github.com/wangwenqiao666)的研究探索

+### ❓ImageMagick的安全策略阻止了与临时文件@/tmp/tmpur5hyyto.txt相关的操作
+
+[issue 92](https://github.com/harry0703/MoneyPrinterTurbo/issues/92)
+
+可以在ImageMagick的配置文件policy.xml中找到这些策略。
+这个文件通常位于 /etc/ImageMagick-`X`/ 或 ImageMagick 安装目录的类似位置。
+修改包含`pattern="@"`的条目，将`rights="none"`更改为`rights="read|write"`以允许对文件的读写操作。
+
+感谢 [@chenhengzh](https://github.com/chenhengzh)的研究探索
+
+### ❓OSError: [Errno 24] Too many open files
+
+[issue 100](https://github.com/harry0703/MoneyPrinterTurbo/issues/100)
+
+这个问题是由于系统打开文件数限制导致的，可以通过修改系统的文件打开数限制来解决。
+
+查看当前限制
+
+```shell
+ulimit -n
+```
+
+如果过低，可以调高一些，比如
+
+```shell
+ulimit -n 10240
+```
+
+### ❓AttributeError: module 'PIL.Image' has no attribute 'ANTIALIAS'
+
+[issue 101](https://github.com/harry0703/MoneyPrinterTurbo/issues/101),
+[issue 83](https://github.com/harry0703/MoneyPrinterTurbo/issues/83),
+[issue 70](https://github.com/harry0703/MoneyPrinterTurbo/issues/70)
+
+先看下当前的 Pillow 版本是多少
+
+```shell
+pip list |grep Pillow
+```
+
+如果是 10.x 的版本，可以尝试下降级看看，有用户反馈降级后正常
+
+```shell
+pip uninstall Pillow
+pip install Pillow==9.5.0
+# 或者降级到 8.4.0
+pip install Pillow==8.4.0
+```
+
 ## 反馈建议 📢

 - 可以提交 [issue](https://github.com/harry0703/MoneyPrinterTurbo/issues)
@ -261,3 +393,7 @@ if you are in China, please use a VPN.

 点击查看 [`LICENSE`](LICENSE) 文件

+
+## Star History
+
+[![Star History Chart](https://api.star-history.com/svg?repos=harry0703/MoneyPrinterTurbo&type=Date)](https://star-history.com/#harry0703/MoneyPrinterTurbo&Date)
--- a/app/asgi.py
+++ b/app/asgi.py
@ -1,10 +1,12 @@
 """Application implementation - ASGI."""
+import os

 from fastapi import FastAPI, Request
 from fastapi.exceptions import RequestValidationError
 from fastapi.responses import JSONResponse
 from loguru import logger
 from fastapi.staticfiles import StaticFiles
+from fastapi.middleware.cors import CORSMiddleware

 from app.config import config
 from app.models.exception import HttpException
@ -46,6 +48,21 @@ def get_application() -> FastAPI:


 app = get_application()
+
+# Configures the CORS middleware for the FastAPI app
+cors_allowed_origins_str = os.getenv("CORS_ALLOWED_ORIGINS", "")
+origins = cors_allowed_origins_str.split(",") if cors_allowed_origins_str else ["*"]
+app.add_middleware(
+    CORSMiddleware,
+    allow_origins=origins,
+    allow_credentials=True,
+    allow_methods=["*"],
+    allow_headers=["*"],
+)
+
+task_dir = utils.task_dir()
+app.mount("/tasks", StaticFiles(directory=task_dir, html=True, follow_symlink=True), name="")
+
 public_dir = utils.public_dir()
 app.mount("/", StaticFiles(directory=public_dir, html=True), name="")

--- a/app/config/init.py
+++ b/app/config/init.py
@ -8,7 +8,7 @@ from app.utils import utils


 def __init_logger():
-    _log_file = utils.storage_dir("logs/server.log")
+    # _log_file = utils.storage_dir("logs/server.log")
    _lvl = config.log_level
    root_dir = os.path.dirname(os.path.dirname(os.path.dirname(os.path.realpath(__file__))))

@ -36,16 +36,16 @@ def __init_logger():
        colorize=True,
    )

-    logger.add(
-        _log_file,
-        level=_lvl,
-        format=format_record,
-        rotation="00:00",
-        retention="3 days",
-        backtrace=True,
-        diagnose=True,
-        enqueue=True,
-    )
+    # logger.add(
+    #     _log_file,
+    #     level=_lvl,
+    #     format=format_record,
+    #     rotation="00:00",
+    #     retention="3 days",
+    #     backtrace=True,
+    #     diagnose=True,
+    #     enqueue=True,
+    # )


 __init_logger()
--- a/app/config/config.py
+++ b/app/config/config.py
@ -5,10 +5,24 @@ from loguru import logger

 root_dir = os.path.dirname(os.path.dirname(os.path.dirname(os.path.realpath(__file__))))
 config_file = f"{root_dir}/config.toml"
+if not os.path.isfile(config_file):
+    example_file = f"{root_dir}/config.example.toml"
+    if os.path.isfile(example_file):
+        import shutil
+
+        shutil.copyfile(example_file, config_file)
+        logger.info(f"copy config.example.toml to config.toml")
+
 logger.info(f"load config from file: {config_file}")

-with open(config_file, mode="rb") as fp:
-    _cfg = tomli.load(fp)
+try:
+    with open(config_file, mode="rb") as fp:
+        _cfg = tomli.load(fp)
+except Exception as e:
+    logger.warning(f"load config failed: {str(e)}, try to load as utf-8-sig")
+    with open(config_file, mode="r", encoding='utf-8-sig') as fp:
+        _cfg_content = fp.read()
+        _cfg = tomli.loads(_cfg_content)

 app = _cfg.get("app", {})
 whisper = _cfg.get("whisper", {})
@ -20,8 +34,9 @@ log_level = _cfg.get("log_level", "DEBUG")
 listen_host = _cfg.get("listen_host", "0.0.0.0")
 listen_port = _cfg.get("listen_port", 8080)
 project_name = _cfg.get("project_name", "MoneyPrinterTurbo")
-project_description = _cfg.get("project_description", "MoneyPrinterTurbo\n by 抖音-网旭哈瑞.AI")
-project_version = _cfg.get("project_version", "1.0.0")
+project_description = _cfg.get("project_description",
+                               "<a href='https://github.com/harry0703/MoneyPrinterTurbo'>https://github.com/harry0703/MoneyPrinterTurbo</a>")
+project_version = _cfg.get("project_version", "1.0.1")
 reload_debug = False

 imagemagick_path = app.get("imagemagick_path", "")
--- a/app/controllers/v1/llm.py
+++ b/app/controllers/v1/llm.py
@ -0,0 +1,31 @@
+from fastapi import Request
+from app.controllers.v1.base import new_router
+from app.models.schema import VideoScriptResponse, VideoScriptRequest, VideoTermsResponse, VideoTermsRequest
+from app.services import llm
+from app.utils import utils
+
+# 认证依赖项
+# router = new_router(dependencies=[Depends(base.verify_token)])
+router = new_router()
+
+
+@router.post("/scripts", response_model=VideoScriptResponse, summary="Create a script for the video")
+def generate_video_script(request: Request, body: VideoScriptRequest):
+    video_script = llm.generate_script(video_subject=body.video_subject,
+                                       language=body.video_language,
+                                       paragraph_number=body.paragraph_number)
+    response = {
+        "video_script": video_script
+    }
+    return utils.get_response(200, response)
+
+
+@router.post("/terms", response_model=VideoTermsResponse, summary="Generate video terms based on the video script")
+def generate_video_terms(request: Request, body: VideoTermsRequest):
+    video_terms = llm.generate_terms(video_subject=body.video_subject,
+                                     video_script=body.video_script,
+                                     amount=body.amount)
+    response = {
+        "video_terms": video_terms
+    }
+    return utils.get_response(200, response)
--- a/app/controllers/v1/video.py
+++ b/app/controllers/v1/video.py
@ -1,13 +1,17 @@
-from os import path
-
-from fastapi import Request, Depends, Path
+import os
+import glob
+from fastapi import Request, Depends, Path, BackgroundTasks, UploadFile
+from fastapi.params import File
 from loguru import logger

+from app.config import config
 from app.controllers import base
 from app.controllers.v1.base import new_router
 from app.models.exception import HttpException
-from app.models.schema import TaskVideoRequest, TaskQueryResponse, TaskResponse, TaskQueryRequest
+from app.models.schema import TaskVideoRequest, TaskQueryResponse, TaskResponse, TaskQueryRequest, \
+    BgmUploadResponse, BgmRetrieveResponse
 from app.services import task as tm
+from app.services import state as sm
 from app.utils import utils

 # 认证依赖项
@ -15,30 +19,95 @@ from app.utils import utils
 router = new_router()


-@router.post("/videos", response_model=TaskResponse, summary="使用主题来生成短视频")
-def create_video(request: Request, body: TaskVideoRequest):
+@router.post("/videos", response_model=TaskResponse, summary="Generate a short video")
+def create_video(background_tasks: BackgroundTasks, request: Request, body: TaskVideoRequest):
    task_id = utils.get_uuid()
    request_id = base.get_task_id(request)
    try:
        task = {
            "task_id": task_id,
            "request_id": request_id,
+            "params": body.dict(),
        }
-        body_dict = body.dict()
-        task.update(body_dict)
-        result = tm.start(task_id=task_id, params=body)
-        task["result"] = result
+        sm.update_task(task_id)
+        background_tasks.add_task(tm.start, task_id=task_id, params=body)
        logger.success(f"video created: {utils.to_json(task)}")
        return utils.get_response(200, task)
    except ValueError as e:
        raise HttpException(task_id=task_id, status_code=400, message=f"{request_id}: {str(e)}")


-@router.get("/tasks/{task_id}", response_model=TaskQueryResponse, summary="查询任务状态")
-def get_task(request: Request, task_id: str = Path(..., description="任务ID"),
-                   query: TaskQueryRequest = Depends()):
+@router.get("/tasks/{task_id}", response_model=TaskQueryResponse, summary="Query task status")
+def get_task(request: Request, task_id: str = Path(..., description="Task ID"),
+             query: TaskQueryRequest = Depends()):
+    endpoint = config.app.get("endpoint", "")
+    if not endpoint:
+        endpoint = str(request.base_url)
+    endpoint = endpoint.rstrip("/")
+
    request_id = base.get_task_id(request)
-    data = query.dict()
-    data["task_id"] = task_id
-    raise HttpException(task_id=task_id, status_code=404,
-                        message=f"{request_id}: task not found", data=data)
+    task = sm.get_task(task_id)
+    if task:
+        task_dir = utils.task_dir()
+
+        def file_to_uri(file):
+            if not file.startswith(endpoint):
+                _uri_path = v.replace(task_dir, "tasks").replace("\\", "/")
+                _uri_path = f"{endpoint}/{_uri_path}"
+            else:
+                _uri_path = file
+            return _uri_path
+
+        if "videos" in task:
+            videos = task["videos"]
+            urls = []
+            for v in videos:
+                urls.append(file_to_uri(v))
+            task["videos"] = urls
+        if "combined_videos" in task:
+            combined_videos = task["combined_videos"]
+            urls = []
+            for v in combined_videos:
+                urls.append(file_to_uri(v))
+            task["combined_videos"] = urls
+        return utils.get_response(200, task)
+
+    raise HttpException(task_id=task_id, status_code=404, message=f"{request_id}: task not found")
+
+
+@router.get("/musics", response_model=BgmRetrieveResponse, summary="Retrieve local BGM files")
+def get_bgm_list(request: Request):
+    suffix = "*.mp3"
+    song_dir = utils.song_dir()
+    files = glob.glob(os.path.join(song_dir, suffix))
+    bgm_list = []
+    for file in files:
+        bgm_list.append({
+            "name": os.path.basename(file),
+            "size": os.path.getsize(file),
+            "file": file,
+        })
+    response = {
+        "files": bgm_list
+    }
+    return utils.get_response(200, response)
+
+
+@router.post("/musics", response_model=BgmUploadResponse, summary="Upload the BGM file to the songs directory")
+def upload_bgm_file(request: Request, file: UploadFile = File(...)):
+    request_id = base.get_task_id(request)
+    # check file ext
+    if file.filename.endswith('mp3'):
+        song_dir = utils.song_dir()
+        save_path = os.path.join(song_dir, file.filename)
+        # save file
+        with open(save_path, "wb+") as buffer:
+            # If the file already exists, it will be overwritten
+            file.file.seek(0)
+            buffer.write(file.file.read())
+        response = {
+            "file": save_path
+        }
+        return utils.get_response(200, response)
+
+    raise HttpException('', status_code=400, message=f"{request_id}: Only *.mp3 files can be uploaded")
--- a/app/models/const.py
+++ b/app/models/const.py
@ -1,4 +1,8 @@
-punctuations = [
-    "?", ",", ".", "、", ";", ":",
-    "？", "，", "。", "、", "；", "：",
+PUNCTUATIONS = [
+    "?", ",", ".", "、", ";", ":", "!", "…",
+    "？", "，", "。", "、", "；", "：", "！", "...",
 ]
+
+TASK_STATE_FAILED = -1
+TASK_STATE_COMPLETE = 1
+TASK_STATE_PROCESSING = 4
--- a/app/models/schema.py
+++ b/app/models/schema.py
@ -34,43 +34,43 @@ class MaterialInfo:
    duration: int = 0


-VoiceNames = [
-    # zh-CN
-    "female-zh-CN-XiaoxiaoNeural",
-    "female-zh-CN-XiaoyiNeural",
-    "female-zh-CN-liaoning-XiaobeiNeural",
-    "female-zh-CN-shaanxi-XiaoniNeural",
-
-    "male-zh-CN-YunjianNeural",
-    "male-zh-CN-YunxiNeural",
-    "male-zh-CN-YunxiaNeural",
-    "male-zh-CN-YunyangNeural",
-
-    # "female-zh-HK-HiuGaaiNeural",
-    # "female-zh-HK-HiuMaanNeural",
-    # "male-zh-HK-WanLungNeural",
-    #
-    # "female-zh-TW-HsiaoChenNeural",
-    # "female-zh-TW-HsiaoYuNeural",
-    # "male-zh-TW-YunJheNeural",
-
-    # en-US
-
-    "female-en-US-AnaNeural",
-    "female-en-US-AriaNeural",
-    "female-en-US-AvaNeural",
-    "female-en-US-EmmaNeural",
-    "female-en-US-JennyNeural",
-    "female-en-US-MichelleNeural",
-
-    "male-en-US-AndrewNeural",
-    "male-en-US-BrianNeural",
-    "male-en-US-ChristopherNeural",
-    "male-en-US-EricNeural",
-    "male-en-US-GuyNeural",
-    "male-en-US-RogerNeural",
-    "male-en-US-SteffanNeural",
-]
+# VoiceNames = [
+#     # zh-CN
+#     "female-zh-CN-XiaoxiaoNeural",
+#     "female-zh-CN-XiaoyiNeural",
+#     "female-zh-CN-liaoning-XiaobeiNeural",
+#     "female-zh-CN-shaanxi-XiaoniNeural",
+#
+#     "male-zh-CN-YunjianNeural",
+#     "male-zh-CN-YunxiNeural",
+#     "male-zh-CN-YunxiaNeural",
+#     "male-zh-CN-YunyangNeural",
+#
+#     # "female-zh-HK-HiuGaaiNeural",
+#     # "female-zh-HK-HiuMaanNeural",
+#     # "male-zh-HK-WanLungNeural",
+#     #
+#     # "female-zh-TW-HsiaoChenNeural",
+#     # "female-zh-TW-HsiaoYuNeural",
+#     # "male-zh-TW-YunJheNeural",
+#
+#     # en-US
+#
+#     "female-en-US-AnaNeural",
+#     "female-en-US-AriaNeural",
+#     "female-en-US-AvaNeural",
+#     "female-en-US-EmmaNeural",
+#     "female-en-US-JennyNeural",
+#     "female-en-US-MichelleNeural",
+#
+#     "male-en-US-AndrewNeural",
+#     "male-en-US-BrianNeural",
+#     "male-en-US-ChristopherNeural",
+#     "male-en-US-EricNeural",
+#     "male-en-US-GuyNeural",
+#     "male-en-US-RogerNeural",
+#     "male-en-US-SteffanNeural",
+# ]


 class VideoParams:
@ -97,7 +97,8 @@ class VideoParams:

    video_language: Optional[str] = ""  # auto detect

-    voice_name: Optional[str] = VoiceNames[0]
+    voice_name: Optional[str] = ""
+    voice_volume: Optional[float] = 1.0
    bgm_type: Optional[str] = "random"
    bgm_file: Optional[str] = ""
    bgm_volume: Optional[float] = 0.2
@ -115,6 +116,32 @@ class VideoParams:
    paragraph_number: Optional[int] = 1


+class VideoScriptParams:
+    """
+    {
+      "video_subject": "春天的花海",
+      "video_language": "",
+      "paragraph_number": 1
+    }
+    """
+    video_subject: Optional[str] = "春天的花海"
+    video_language: Optional[str] = ""
+    paragraph_number: Optional[int] = 1
+
+
+class VideoTermsParams:
+    """
+    {
+      "video_subject": "",
+      "video_script": "",
+      "amount": 5
+    }
+    """
+    video_subject: Optional[str] = "春天的花海"
+    video_script: Optional[str] = "春天的花海，如诗如画般展现在眼前。万物复苏的季节里，大地披上了一袭绚丽多彩的盛装。金黄的迎春、粉嫩的樱花、洁白的梨花、艳丽的郁金香……"
+    amount: Optional[int] = 5
+
+
 class BaseResponse(BaseModel):
    status: int = 200
    message: Optional[str] = 'success'
@ -129,6 +156,14 @@ class TaskQueryRequest(BaseModel):
    pass


+class VideoScriptRequest(VideoScriptParams, BaseModel):
+    pass
+
+
+class VideoTermsRequest(VideoTermsParams, BaseModel):
+    pass
+
+
 ######################################################################################################
 ######################################################################################################
 ######################################################################################################
@ -136,10 +171,94 @@ class TaskQueryRequest(BaseModel):
 class TaskResponse(BaseResponse):
    class TaskResponseData(BaseModel):
        task_id: str
-        task_type: str = ""

    data: TaskResponseData

+    class Config:
+        json_schema_extra = {
+            "example": {
+                "status": 200,
+                "message": "success",
+                "data": {
+                    "task_id": "6c85c8cc-a77a-42b9-bc30-947815aa0558"
+                }
+            },
+        }
+

 class TaskQueryResponse(BaseResponse):
-    pass
+    class Config:
+        json_schema_extra = {
+            "example": {
+                "status": 200,
+                "message": "success",
+                "data": {
+                    "state": 1,
+                    "progress": 100,
+                    "videos": [
+                        "http://127.0.0.1:8080/tasks/6c85c8cc-a77a-42b9-bc30-947815aa0558/final-1.mp4"
+                    ],
+                    "combined_videos": [
+                        "http://127.0.0.1:8080/tasks/6c85c8cc-a77a-42b9-bc30-947815aa0558/combined-1.mp4"
+                    ]
+                }
+            },
+        }
+
+
+class VideoScriptResponse(BaseResponse):
+    class Config:
+        json_schema_extra = {
+            "example": {
+                "status": 200,
+                "message": "success",
+                "data": {
+                    "video_script": "春天的花海，是大自然的一幅美丽画卷。在这个季节里，大地复苏，万物生长，花朵争相绽放，形成了一片五彩斑斓的花海..."
+                }
+            },
+        }
+
+
+class VideoTermsResponse(BaseResponse):
+    class Config:
+        json_schema_extra = {
+            "example": {
+                "status": 200,
+                "message": "success",
+                "data": {
+                    "video_terms": ["sky", "tree"]
+                }
+            },
+        }
+
+
+class BgmRetrieveResponse(BaseResponse):
+    class Config:
+        json_schema_extra = {
+            "example": {
+                "status": 200,
+                "message": "success",
+                "data": {
+                    "files": [
+                        {
+                            "name": "output013.mp3",
+                            "size": 1891269,
+                            "file": "/MoneyPrinterTurbo/resource/songs/output013.mp3"
+                        }
+                    ]
+                }
+            },
+        }
+
+
+class BgmUploadResponse(BaseResponse):
+    class Config:
+        json_schema_extra = {
+            "example": {
+                "status": 200,
+                "message": "success",
+                "data": {
+                    "file": "/MoneyPrinterTurbo/resource/songs/example.mp3"
+                }
+            },
+        }
--- a/app/router.py
+++ b/app/router.py
@ -8,8 +8,9 @@ Resources:
 """
 from fastapi import APIRouter

-from app.controllers.v1 import video
+from app.controllers.v1 import video, llm

 root_api_router = APIRouter()
 # v1
 root_api_router.include_router(video.router)
+root_api_router.include_router(llm.router)
--- a/app/services/llm.py
+++ b/app/services/llm.py
@ -5,9 +5,9 @@ from typing import List
 from loguru import logger
 from openai import OpenAI
 from openai import AzureOpenAI
+import google.generativeai as genai
 from app.config import config

-
 def _generate_response(prompt: str) -> str:
    content = ""
    llm_provider = config.app.get("llm_provider", "openai")
@ -27,6 +27,13 @@ def _generate_response(prompt: str) -> str:
            api_key = config.app.get("moonshot_api_key")
            model_name = config.app.get("moonshot_model_name")
            base_url = "https://api.moonshot.cn/v1"
+        elif llm_provider == "ollama":
+            # api_key = config.app.get("openai_api_key")
+            api_key = "ollama" # any string works but you are required to have one
+            model_name = config.app.get("ollama_model_name")
+            base_url = config.app.get("ollama_base_url", "")
+            if not base_url:
+                base_url = "http://localhost:11434/v1"
        elif llm_provider == "openai":
            api_key = config.app.get("openai_api_key")
            model_name = config.app.get("openai_model_name")
@ -42,6 +49,10 @@ def _generate_response(prompt: str) -> str:
            model_name = config.app.get("azure_model_name")
            base_url = config.app.get("azure_base_url", "")
            api_version = config.app.get("azure_api_version", "2024-02-15-preview")
+        elif llm_provider == "gemini":
+            api_key = config.app.get("gemini_api_key")
+            model_name = config.app.get("gemini_model_name")
+            base_url = "***"
        elif llm_provider == "qwen":
            api_key = config.app.get("qwen_api_key")
            model_name = config.app.get("qwen_model_name")
@ -66,6 +77,44 @@ def _generate_response(prompt: str) -> str:
            content = response["output"]["text"]
            return content.replace("\n", "")

+        if llm_provider == "gemini":
+            genai.configure(api_key=api_key)
+
+            generation_config = {
+            "temperature": 0.5,
+            "top_p": 1,
+            "top_k": 1,
+            "max_output_tokens": 2048,
+            }
+
+            safety_settings = [
+            {
+                "category": "HARM_CATEGORY_HARASSMENT",
+                "threshold": "BLOCK_ONLY_HIGH"
+            },
+            {
+                "category": "HARM_CATEGORY_HATE_SPEECH",
+                "threshold": "BLOCK_ONLY_HIGH"
+            },
+            {
+                "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
+                "threshold": "BLOCK_ONLY_HIGH"
+            },
+            {
+                "category": "HARM_CATEGORY_DANGEROUS_CONTENT",
+                "threshold": "BLOCK_ONLY_HIGH"
+            },
+            ]
+
+            model = genai.GenerativeModel(model_name=model_name,
+                                        generation_config=generation_config,
+                                        safety_settings=safety_settings)
+
+            convo = model.start_chat(history=[])
+
+            convo.send_message(prompt)
+            return convo.last.text
+
        if llm_provider == "azure":
            client = AzureOpenAI(
                api_key=api_key,
@ -169,6 +218,8 @@ Generate {amount} search terms for stock videos, depending on the subject of a v

 ### Video Script
 {video_script}
+
+Please note that you must use English for generating video search terms; Chinese is not accepted.
 """.strip()

    logger.info(f"subject: {video_subject}")
--- a/app/services/material.py
+++ b/app/services/material.py
@ -1,5 +1,5 @@
+import os
 import random
-import time
 from urllib.parse import urlencode

 import requests
@ -13,10 +13,15 @@ from app.utils import utils
 requested_count = 0
 pexels_api_keys = config.app.get("pexels_api_keys")
 if not pexels_api_keys:
-    raise ValueError("pexels_api_keys is not set, please set it in the config.toml file.")
+    raise ValueError(
+        f"\n\n##### pexels_api_keys is not set #####\n\nPlease set it in the config.toml file: {config.config_file}\n\n{utils.to_json(config.app)}")


 def round_robin_api_key():
+    # if only one key is provided, return it
+    if isinstance(pexels_api_keys, str):
+        return pexels_api_keys
+
    global requested_count
    requested_count += 1
    return pexels_api_keys[requested_count % len(pexels_api_keys)]
@ -76,14 +81,31 @@ def search_videos(search_term: str,
    return []


-def save_video(video_url: str, save_dir: str) -> str:
-    video_id = f"vid-{str(int(time.time() * 1000))}"
+def save_video(video_url: str, save_dir: str = "") -> str:
+    if not save_dir:
+        save_dir = utils.storage_dir("cache_videos")
+
+    if not os.path.exists(save_dir):
+        os.makedirs(save_dir)
+
+    url_without_query = video_url.split("?")[0]
+    url_hash = utils.md5(url_without_query)
+    video_id = f"vid-{url_hash}"
    video_path = f"{save_dir}/{video_id}.mp4"
+
+    # if video already exists, return the path
+    if os.path.exists(video_path) and os.path.getsize(video_path) > 0:
+        logger.info(f"video already exists: {video_path}")
+        return video_path
+
+    # if video does not exist, download it
    proxies = config.pexels.get("proxies", None)
    with open(video_path, "wb") as f:
-        f.write(requests.get(video_url, proxies=proxies, verify=False, timeout=(10, 180)).content)
+        f.write(requests.get(video_url, proxies=proxies, verify=False, timeout=(60, 240)).content)

-    return video_path
+    if os.path.exists(video_path) and os.path.getsize(video_path) > 0:
+        return video_path
+    return ""


 def download_videos(task_id: str,
@ -112,7 +134,12 @@ def download_videos(task_id: str,
    logger.info(
        f"found total videos: {len(valid_video_items)}, required duration: {audio_duration} seconds, found duration: {found_duration} seconds")
    video_paths = []
-    save_dir = utils.task_dir(task_id)
+
+    material_directory = config.app.get("material_directory", "").strip()
+    if material_directory == "task":
+        material_directory = utils.task_dir(task_id)
+    elif material_directory and not os.path.isdir(material_directory):
+        material_directory = ""

    if video_contact_mode.value == VideoConcatMode.random.value:
        random.shuffle(valid_video_items)
@ -121,14 +148,20 @@ def download_videos(task_id: str,
    for item in valid_video_items:
        try:
            logger.info(f"downloading video: {item.url}")
-            saved_video_path = save_video(item.url, save_dir)
-            video_paths.append(saved_video_path)
-            seconds = min(max_clip_duration, item.duration)
-            total_duration += seconds
-            if total_duration > audio_duration:
-                logger.info(f"total duration of downloaded videos: {total_duration} seconds, skip downloading more")
-                break
+            saved_video_path = save_video(video_url=item.url, save_dir=material_directory)
+            if saved_video_path:
+                logger.info(f"video saved: {saved_video_path}")
+                video_paths.append(saved_video_path)
+                seconds = min(max_clip_duration, item.duration)
+                total_duration += seconds
+                if total_duration > audio_duration:
+                    logger.info(f"total duration of downloaded videos: {total_duration} seconds, skip downloading more")
+                    break
        except Exception as e:
            logger.error(f"failed to download video: {utils.to_json(item)} => {str(e)}")
    logger.success(f"downloaded {len(video_paths)} videos")
    return video_paths
+
+
+if __name__ == "__main__":
+    download_videos("test123", ["cat"], audio_duration=100)
--- a/app/services/state.py
+++ b/app/services/state.py
@ -0,0 +1,35 @@
+# State Management
+# This module is responsible for managing the state of the application.
+import math
+
+# 如果你部署在分布式环境中，你可能需要一个中心化的状态管理服务，比如 Redis 或者数据库。
+# 如果你的应用程序是单机的，你可以使用内存来存储状态。
+
+# If you are deploying in a distributed environment, you might need a centralized state management service like Redis or a database.
+# If your application is single-node, you can use memory to store the state.
+
+from app.models import const
+from app.utils import utils
+
+_tasks = {}
+
+
+def update_task(task_id: str, state: int = const.TASK_STATE_PROCESSING, progress: int = 0, **kwargs):
+    """
+    Set the state of the task.
+    """
+    progress = int(progress)
+    if progress > 100:
+        progress = 100
+
+    _tasks[task_id] = {
+        "state": state,
+        "progress": progress,
+        **kwargs,
+    }
+
+def get_task(task_id: str):
+    """
+    Get the state of the task.
+    """
+    return _tasks.get(task_id, None)
--- a/app/services/subtitle.py
+++ b/app/services/subtitle.py
@ -18,7 +18,9 @@ def create(audio_file, subtitle_file: str = ""):
    global model
    if not model:
        logger.info(f"loading model: {model_size}, device: {device}, compute_type: {compute_type}")
-        model = WhisperModel(model_size_or_path=model_size, device=device, compute_type=compute_type)
+        model = WhisperModel(model_size_or_path=model_size,
+                             device=device,
+                             compute_type=compute_type)

    logger.info(f"start, output file: {subtitle_file}")
    if not subtitle_file:
@ -157,6 +159,7 @@ if __name__ == "__main__":
    task_id = "c12fd1e6-4b0a-4d65-a075-c87abe35a072"
    task_dir = utils.task_dir(task_id)
    subtitle_file = f"{task_dir}/subtitle.srt"
+    audio_file = f"{task_dir}/audio.mp3"

    subtitles = file_to_subtitles(subtitle_file)
    print(subtitles)
@ -168,3 +171,6 @@ if __name__ == "__main__":
    script = s.get("script")

    correct(subtitle_file, script)
+
+    subtitle_file = f"{task_dir}/subtitle-test.srt"
+    create(audio_file, subtitle_file)
--- a/app/services/task.py
+++ b/app/services/task.py
@ -6,24 +6,13 @@ from os import path
 from loguru import logger

 from app.config import config
-from app.models.schema import VideoParams, VoiceNames, VideoConcatMode
+from app.models import const
+from app.models.schema import VideoParams, VideoConcatMode
 from app.services import llm, material, voice, video, subtitle
+from app.services import state as sm
 from app.utils import utils


-def _parse_voice(name: str):
-    # "female-zh-CN-XiaoxiaoNeural",
-    # remove first part split by "-"
-    if name not in VoiceNames:
-        name = VoiceNames[0]
-
-    parts = name.split("-")
-    _lang = f"{parts[1]}-{parts[2]}"
-    _voice = f"{_lang}-{parts[3]}"
-
-    return _voice, _lang
-
-
 def start(task_id, params: VideoParams):
    """
    {
@ -39,8 +28,10 @@ def start(task_id, params: VideoParams):
    }
    """
    logger.info(f"start task: {task_id}")
+    sm.update_task(task_id, state=const.TASK_STATE_PROCESSING, progress=5)
+
    video_subject = params.video_subject
-    voice_name, language = _parse_voice(params.voice_name)
+    voice_name = voice.parse_voice_name(params.voice_name)
    paragraph_number = params.paragraph_number
    n_threads = params.n_threads
    max_clip_duration = params.video_clip_duration
@ -53,6 +44,8 @@ def start(task_id, params: VideoParams):
    else:
        logger.debug(f"video script: \n{video_script}")

+    sm.update_task(task_id, state=const.TASK_STATE_PROCESSING, progress=10)
+
    logger.info("\n\n## generating video terms")
    video_terms = params.video_terms
    if not video_terms:
@ -70,16 +63,20 @@ def start(task_id, params: VideoParams):
    script_file = path.join(utils.task_dir(task_id), f"script.json")
    script_data = {
        "script": video_script,
-        "search_terms": video_terms
+        "search_terms": video_terms,
+        "params": params,
    }

    with open(script_file, "w", encoding="utf-8") as f:
        f.write(utils.to_json(script_data))

+    sm.update_task(task_id, state=const.TASK_STATE_PROCESSING, progress=20)
+
    logger.info("\n\n## generating audio")
    audio_file = path.join(utils.task_dir(task_id), f"audio.mp3")
    sub_maker = voice.tts(text=video_script, voice_name=voice_name, voice_file=audio_file)
    if sub_maker is None:
+        sm.update_task(task_id, state=const.TASK_STATE_FAILED)
        logger.error(
            "failed to generate audio, maybe the network is not available. if you are in China, please use a VPN.")
        return
@ -87,6 +84,8 @@ def start(task_id, params: VideoParams):
    audio_duration = voice.get_audio_duration(sub_maker)
    audio_duration = math.ceil(audio_duration)

+    sm.update_task(task_id, state=const.TASK_STATE_PROCESSING, progress=30)
+
    subtitle_path = ""
    if params.subtitle_enabled:
        subtitle_path = path.join(utils.task_dir(task_id), f"subtitle.srt")
@ -98,11 +97,6 @@ def start(task_id, params: VideoParams):
            if not os.path.exists(subtitle_path):
                subtitle_fallback = True
                logger.warning("subtitle file not found, fallback to whisper")
-            else:
-                subtitle_lines = subtitle.file_to_subtitles(subtitle_path)
-                if not subtitle_lines:
-                    logger.warning(f"subtitle file is invalid, fallback to whisper : {subtitle_path}")
-                    subtitle_fallback = True

        if subtitle_provider == "whisper" or subtitle_fallback:
            subtitle.create(audio_file=audio_file, subtitle_file=subtitle_path)
@ -114,6 +108,8 @@ def start(task_id, params: VideoParams):
            logger.warning(f"subtitle file is invalid: {subtitle_path}")
            subtitle_path = ""

+    sm.update_task(task_id, state=const.TASK_STATE_PROCESSING, progress=40)
+
    logger.info("\n\n## downloading videos")
    downloaded_videos = material.download_videos(task_id=task_id,
                                                 search_terms=video_terms,
@ -123,15 +119,20 @@ def start(task_id, params: VideoParams):
                                                 max_clip_duration=max_clip_duration,
                                                 )
    if not downloaded_videos:
+        sm.update_task(task_id, state=const.TASK_STATE_FAILED)
        logger.error(
            "failed to download videos, maybe the network is not available. if you are in China, please use a VPN.")
        return

+    sm.update_task(task_id, state=const.TASK_STATE_PROCESSING, progress=50)
+
    final_video_paths = []
+    combined_video_paths = []
    video_concat_mode = params.video_concat_mode
    if params.video_count > 1:
        video_concat_mode = VideoConcatMode.random

+    _progress = 50
    for i in range(params.video_count):
        index = i + 1
        combined_video_path = path.join(utils.task_dir(task_id), f"combined-{index}.mp4")
@ -144,6 +145,9 @@ def start(task_id, params: VideoParams):
                             max_clip_duration=max_clip_duration,
                             threads=n_threads)

+        _progress += 50 / params.video_count / 2
+        sm.update_task(task_id, progress=_progress)
+
        final_video_path = path.join(utils.task_dir(task_id), f"final-{index}.mp4")

        logger.info(f"\n\n## generating video: {index} => {final_video_path}")
@ -154,10 +158,18 @@ def start(task_id, params: VideoParams):
                             output_file=final_video_path,
                             params=params,
                             )
+
+        _progress += 50 / params.video_count / 2
+        sm.update_task(task_id, progress=_progress)
+
        final_video_paths.append(final_video_path)
+        combined_video_paths.append(combined_video_path)

    logger.success(f"task {task_id} finished, generated {len(final_video_paths)} videos.")

-    return {
+    kwargs = {
        "videos": final_video_paths,
+        "combined_videos": combined_video_paths
    }
+    sm.update_task(task_id, state=const.TASK_STATE_COMPLETE, progress=100, **kwargs)
+    return kwargs
--- a/app/services/video.py
+++ b/app/services/video.py
@ -64,24 +64,34 @@ def combine_videos(combined_video_path: str,
            clip = clip.set_fps(30)

            # Not all videos are same size, so we need to resize them
-            # logger.info(f"{video_path}: size is {clip.w} x {clip.h}, expected {video_width} x {video_height}")
-            if clip.w != video_width or clip.h != video_height:
-                if round((clip.w / clip.h), 4) < 0.5625:
-                    clip = crop(clip,
-                                width=clip.w,
-                                height=round(clip.w / 0.5625),
-                                x_center=clip.w / 2,
-                                y_center=clip.h / 2
-                                )
+            clip_w, clip_h = clip.size
+            if clip_w != video_width or clip_h != video_height:
+                clip_ratio = clip.w / clip.h
+                video_ratio = video_width / video_height
+
+                if clip_ratio == video_ratio:
+                    # 等比例缩放
+                    clip = clip.resize((video_width, video_height))
                else:
-                    clip = crop(clip,
-                                width=round(0.5625 * clip.h),
-                                height=clip.h,
-                                x_center=clip.w / 2,
-                                y_center=clip.h / 2
-                                )
-                logger.info(f"resizing video to {video_width} x {video_height}")
-                clip = clip.resize((video_width, video_height))
+                    # 等比缩放视频
+                    if clip_ratio > video_ratio:
+                        # 按照目标宽度等比缩放
+                        scale_factor = video_width / clip_w
+                    else:
+                        # 按照目标高度等比缩放
+                        scale_factor = video_height / clip_h
+
+                    new_width = int(clip_w * scale_factor)
+                    new_height = int(clip_h * scale_factor)
+                    clip_resized = clip.resize(newsize=(new_width, new_height))
+
+                    background = ColorClip(size=(video_width, video_height), color=(0, 0, 0))
+                    clip = CompositeVideoClip([
+                        background.set_duration(clip.duration),
+                        clip_resized.set_position("center")
+                    ])
+
+                logger.info(f"resizing video to {video_width} x {video_height}, clip size: {clip_w} x {clip_h}")

            if clip.duration > max_clip_duration:
                clip = clip.subclip(0, max_clip_duration)
@ -92,7 +102,8 @@ def combine_videos(combined_video_path: str,
    final_clip = concatenate_videoclips(clips)
    final_clip = final_clip.set_fps(30)
    logger.info(f"writing")
-    final_clip.write_videofile(combined_video_path, threads=threads)
+    # https://github.com/harry0703/MoneyPrinterTurbo/issues/111#issuecomment-2032354030
+    final_clip.write_videofile(combined_video_path, threads=threads, logger=None)
    logger.success(f"completed")
    return combined_video_path

@ -218,11 +229,16 @@ def generate_video(video_path: str,
    result = CompositeVideoClip(clips)

    audio = AudioFileClip(audio_path)
+    try:
+        audio = audio.volumex(params.voice_volume)
+    except Exception as e:
+        logger.warning(f"failed to set audio volume: {e}")
+
    result = result.set_audio(audio)

    temp_output_file = f"{output_file}.temp.mp4"
    logger.info(f"writing to temp file: {temp_output_file}")
-    result.write_videofile(temp_output_file, threads=params.n_threads or 2)
+    result.write_videofile(temp_output_file, threads=params.n_threads or 2, logger=None)

    video_clip = VideoFileClip(temp_output_file)

@ -242,7 +258,7 @@ def generate_video(video_path: str,
        video_clip = video_clip.set_duration(original_duration)

    logger.info(f"encoding audio codec to aac")
-    video_clip.write_videofile(output_file, audio_codec="aac", threads=params.n_threads or 2)
+    video_clip.write_videofile(output_file, audio_codec="aac", threads=params.n_threads or 2, logger=None)

    os.remove(temp_output_file)
    logger.success(f"completed")
@ -262,6 +278,20 @@ if __name__ == "__main__":
    audio_file = f"{task_dir}/audio.mp3"
    subtitle_file = f"{task_dir}/subtitle.srt"
    output_file = f"{task_dir}/final.mp4"
+
+    video_paths = []
+    for file in os.listdir(utils.storage_dir("test")):
+        if file.endswith(".mp4"):
+            video_paths.append(os.path.join(task_dir, file))
+
+    combine_videos(combined_video_path=video_file,
+                   audio_file=audio_file,
+                   video_paths=video_paths,
+                   video_aspect=VideoAspect.portrait,
+                   video_concat_mode=VideoConcatMode.random,
+                   max_clip_duration=5,
+                   threads=2)
+
    cfg = VideoParams()
    cfg.video_aspect = VideoAspect.portrait
    cfg.font_name = "STHeitiMedium.ttc"
@ -277,6 +307,8 @@ if __name__ == "__main__":
    cfg.n_threads = 2
    cfg.paragraph_number = 1

+    cfg.voice_volume = 3.0
+
    generate_video(video_path=video_file,
                   audio_path=audio_file,
                   subtitle_path=subtitle_file,
--- a/app/services/voice.py
+++ b/app/services/voice.py
--- a/app/utils/utils.py
+++ b/app/utils/utils.py
@ -1,4 +1,5 @@
 import os
+import platform
 import threading
 from typing import Any
 from loguru import logger
@ -23,32 +24,35 @@ def get_response(status: int, data: Any = None, message: str = ""):


 def to_json(obj):
-    # 定义一个辅助函数来处理不同类型的对象
-    def serialize(o):
-        # 如果对象是可序列化类型，直接返回
-        if isinstance(o, (int, float, bool, str)) or o is None:
-            return o
-        # 如果对象是二进制数据，转换为base64编码的字符串
-        elif isinstance(o, bytes):
-            return "*** binary data ***"
-        # 如果对象是字典，递归处理每个键值对
-        elif isinstance(o, dict):
-            return {k: serialize(v) for k, v in o.items()}
-        # 如果对象是列表或元组，递归处理每个元素
-        elif isinstance(o, (list, tuple)):
-            return [serialize(item) for item in o]
-        # 如果对象是自定义类型，尝试返回其__dict__属性
-        elif hasattr(o, '__dict__'):
-            return serialize(o.__dict__)
-        # 其他情况返回None（或者可以选择抛出异常）
-        else:
-            return None
+    try:
+        # 定义一个辅助函数来处理不同类型的对象
+        def serialize(o):
+            # 如果对象是可序列化类型，直接返回
+            if isinstance(o, (int, float, bool, str)) or o is None:
+                return o
+            # 如果对象是二进制数据，转换为base64编码的字符串
+            elif isinstance(o, bytes):
+                return "*** binary data ***"
+            # 如果对象是字典，递归处理每个键值对
+            elif isinstance(o, dict):
+                return {k: serialize(v) for k, v in o.items()}
+            # 如果对象是列表或元组，递归处理每个元素
+            elif isinstance(o, (list, tuple)):
+                return [serialize(item) for item in o]
+            # 如果对象是自定义类型，尝试返回其__dict__属性
+            elif hasattr(o, '__dict__'):
+                return serialize(o.__dict__)
+            # 其他情况返回None（或者可以选择抛出异常）
+            else:
+                return None

-    # 使用serialize函数处理输入对象
-    serialized_obj = serialize(obj)
+        # 使用serialize函数处理输入对象
+        serialized_obj = serialize(obj)

-    # 序列化处理后的对象为JSON字符串
-    return json.dumps(serialized_obj, ensure_ascii=False, indent=4)
+        # 序列化处理后的对象为JSON字符串
+        return json.dumps(serialized_obj, ensure_ascii=False, indent=4)
+    except Exception as e:
+        return None


 def get_uuid(remove_hyphen: bool = False):
@ -149,7 +153,7 @@ def text_to_srt(idx: int, msg: str, start_time: float, end_time: float) -> str:


 def str_contains_punctuation(word):
-    for p in const.punctuations:
+    for p in const.PUNCTUATIONS:
        if p in word:
            return True
    return False
@ -159,9 +163,14 @@ def split_string_by_punctuations(s):
    result = []
    txt = ""
    for char in s:
-        if char not in const.punctuations:
+        if char not in const.PUNCTUATIONS:
            txt += char
        else:
            result.append(txt.strip())
            txt = ""
    return result
+
+
+def md5(text):
+    import hashlib
+    return hashlib.md5(text.encode('utf-8')).hexdigest()
--- a/config.example.toml
+++ b/config.example.toml
@ -2,17 +2,35 @@
    # Pexels API Key
    # Register at https://www.pexels.com/api/ to get your API key.
    # You can use multiple keys to avoid rate limits.
-    # For example: pexels_api_keys = ["123456789","abcdefghi"]
+    # For example: pexels_api_keys = ["123adsf4567adf89","abd1321cd13efgfdfhi"]
    # 特别注意格式，Key 用英文双引号括起来，多个Key用逗号隔开
    pexels_api_keys = []

    # 如果你没有 OPENAI API Key，可以使用 g4f 代替，或者使用国内的 Moonshot API
-    llm_provider="openai" # "openai" or "moonshot" or "oneapi" or "g4f" or "azure" or "qwen"
+    # If you don't have an OPENAI API Key, you can use g4f instead
+
+    # 支持的提供商 (Supported providers):
+    #   openai
+    #   moonshot (月之暗面)
+    #   oneapi
+    #   g4f
+    #   azure
+    #   qwen (通义千问)
+    #   gemini
+    llm_provider="openai"
+
+    ########## Ollama Settings
+    # No need to set it unless you want to use your own proxy
+    ollama_base_url = ""
+    # Check your available models at https://ollama.com/library
+    ollama_model_name = ""

    ########## OpenAI API Key
-    # Visit https://openai.com/api/ for details on obtaining an API key.
+    # Get your API key at https://platform.openai.com/api-keys
    openai_api_key = ""
-    openai_base_url = "" # no need to set it unless you want to use your own proxy
+    # No need to set it unless you want to use your own proxy
+    openai_base_url = ""
+    # Check your available models at https://platform.openai.com/account/limits
    openai_model_name = "gpt-4-turbo-preview"

    ########## Moonshot API Key
@ -40,8 +58,15 @@
    azure_model_name="gpt-35-turbo" # replace with your model deployment name
    azure_api_version = "2024-02-15-preview"

-    ########## qwen API Key, you need to pip install dashscope firstly
-    # Visit https://tongyi.aliyun.com/qianwen/ to get more details
+    ########## Gemini API Key
+    gemini_api_key=""
+    gemini_model_name = "gemini-1.0-pro"
+
+    ########## Qwen API Key
+    # Visit https://dashscope.console.aliyun.com/apiKey to get your API key
+    # Visit below links to get more details
+    # https://tongyi.aliyun.com/qianwen/
+    # https://help.aliyun.com/zh/dashscope/developer-reference/model-introduction
    qwen_api_key = ""
    qwen_model_name = "qwen-max" 

@ -78,6 +103,33 @@
    # ffmpeg_path = "C:\\Users\\harry\\Downloads\\ffmpeg.exe"
    #########################################################################################

+    # 当视频生成成功后，API服务提供的视频下载接入点，默认为当前服务的地址和监听端口
+    # 比如 http://127.0.0.1:8080/tasks/6357f542-a4e1-46a1-b4c9-bf3bd0df5285/final-1.mp4
+    # 如果你需要使用域名对外提供服务（一般会用nginx做代理），则可以设置为你的域名
+    # 比如 https://xxxx.com/tasks/6357f542-a4e1-46a1-b4c9-bf3bd0df5285/final-1.mp4
+    # endpoint="https://xxxx.com"
+
+    # When the video is successfully generated, the API service provides a download endpoint for the video, defaulting to the service's current address and listening port.
+    # For example, http://127.0.0.1:8080/tasks/6357f542-a4e1-46a1-b4c9-bf3bd0df5285/final-1.mp4
+    # If you need to provide the service externally using a domain name (usually done with nginx as a proxy), you can set it to your domain name.
+    # For example, https://xxxx.com/tasks/6357f542-a4e1-46a1-b4c9-bf3bd0df5285/final-1.mp4
+    # endpoint="https://xxxx.com"
+    endpoint=""
+
+
+    # Video material storage location
+    # material_directory = ""                    # Indicates that video materials will be downloaded to the default folder, the default folder is ./storage/cache_videos under the current project
+    # material_directory = "/user/harry/videos"  # Indicates that video materials will be downloaded to a specified folder
+    # material_directory = "task"                # Indicates that video materials will be downloaded to the current task's folder, this method does not allow sharing of already downloaded video materials
+
+    # 视频素材存放位置
+    # material_directory = ""                    #表示将视频素材下载到默认的文件夹，默认文件夹为当前项目下的 ./storage/cache_videos
+    # material_directory = "/user/harry/videos"  #表示将视频素材下载到指定的文件夹中
+    # material_directory = "task"                #表示将视频素材下载到当前任务的文件夹中，这种方式无法共享已经下载的视频素材
+
+    material_directory = ""
+
+
 [whisper]
    # Only effective when subtitle_provider is "whisper"

--- a/docker-compose.yml
+++ b/docker-compose.yml
@ -0,0 +1,27 @@
+version: "3"
+
+x-common-volumes: &common-volumes
+  - ./config.toml:/MoneyPrinterTurbo/config.toml
+  - ./storage:/MoneyPrinterTurbo/storage
+
+services:
+  webui:
+    build:
+      context: .
+      dockerfile: Dockerfile
+    container_name: "webui"
+    ports:
+      - "8501:8501"
+    command: ["streamlit", "run", "./webui/Main.py","--browser.serverAddress=0.0.0.0","--server.enableCORS=True","--browser.gatherUsageStats=False"]
+    volumes: *common-volumes
+    restart: always
+  api:
+    build:
+      context: .
+      dockerfile: Dockerfile
+    container_name: "api"
+    ports:
+      - "8080:8080"
+    command: [ "python3", "main.py" ]
+    volumes: *common-volumes
+    restart: always
--- a/docs/api.jpg
+++ b/docs/api.jpg
--- a/docs/webui-en.jpg
+++ b/docs/webui-en.jpg
--- a/docs/webui.jpg
+++ b/docs/webui.jpg
--- a/requirements.txt
+++ b/requirements.txt
@ -13,4 +13,6 @@ urllib3~=2.2.1
 pillow~=9.5.0
 pydantic~=2.6.3
 g4f~=0.2.5.4
-dashscope~=1.15.0
+dashscope~=1.15.0
+google.generativeai~=0.4.1
+python-multipart~=0.0.9
--- a/resource/public/index.html
+++ b/resource/public/index.html
@ -0,0 +1,19 @@
+<!DOCTYPE html>
+<html lang="en">
+<head>
+    <meta charset="UTF-8">
+    <title>MoneyPrinterTurbo</title>
+</head>
+<body>
+<h1>MoneyPrinterTurbo</h1>
+<a href="https://github.com/harry0703/MoneyPrinterTurbo">https://github.com/harry0703/MoneyPrinterTurbo</a>
+<p>
+    只需提供一个视频 主题 或 关键词 ，就可以全自动生成视频文案、视频素材、视频字幕、视频背景音乐，然后合成一个高清的短视频。
+</p>
+
+<p>
+    Simply provide a topic or keyword for a video, and it will automatically generate the video copy, video materials,
+    video subtitles, and video background music before synthesizing a high-definition short video.
+</p>
+</body>
+</html>
--- a/webui.bat
+++ b/webui.bat
@ -1,5 +1,4 @@
-@echo off
 set CURRENT_DIR=%CD%
-echo ***** Current directory: %CURRENT_DIR% *****
-set PYTHONPATH=%CURRENT_DIR%;%PYTHONPATH%
+set PYTHONPATH=%CURRENT_DIR%
+rem set HF_ENDPOINT=https://hf-mirror.com
 streamlit run .\webui\Main.py
--- a/webui.sh
+++ b/webui.sh
@ -1,4 +1,12 @@
 CURRENT_DIR=$(pwd)
 echo "***** Current directory: $CURRENT_DIR *****"
 export PYTHONPATH="${CURRENT_DIR}:$PYTHONPATH"
-streamlit run ./webui/Main.py
+
+# If you could not download the model from the official site, you can use the mirror site.
+# Just remove the comment of the following line .
+# 如果你无法从官方网站下载模型，你可以使用镜像网站。
+# 只需要移除下面一行的注释即可。
+
+# export HF_ENDPOINT=https://hf-mirror.com
+
+streamlit run ./webui/Main.py --browser.serverAddress="0.0.0.0" --server.enableCORS=True --browser.gatherUsageStats=False
--- a/webui/Main.py
+++ b/webui/Main.py
@ -1,29 +1,78 @@
-import streamlit as st
-
-st.set_page_config(page_title="MoneyPrinterTurbo", page_icon="🤖", layout="wide",
-                   initial_sidebar_state="auto")
 import sys
 import os
-from uuid import uuid4

+# Add the root directory of the project to the system path to allow importing modules from the project
+root_dir = os.path.dirname(os.path.dirname(os.path.realpath(__file__)))
+if root_dir not in sys.path:
+    sys.path.append(root_dir)
+    print("******** sys.path ********")
+    print(sys.path)
+    print("")
+
+import json
+import locale
+import streamlit as st
+
+import os
+from uuid import uuid4
+import platform
+import streamlit.components.v1 as components
+import toml
 from loguru import logger
-from app.models.schema import VideoParams, VideoAspect, VoiceNames, VideoConcatMode
-from app.services import task as tm, llm
+
+st.set_page_config(page_title="MoneyPrinterTurbo",
+                   page_icon="🤖",
+                   layout="wide",
+                   initial_sidebar_state="auto",
+                   menu_items={
+                       'Report a bug': "https://github.com/harry0703/MoneyPrinterTurbo/issues",
+                       'About': "# MoneyPrinterTurbo\nSimply provide a topic or keyword for a video, and it will "
+                                "automatically generate the video copy, video materials, video subtitles, "
+                                "and video background music before synthesizing a high-definition short "
+                                "video.\n\nhttps://github.com/harry0703/MoneyPrinterTurbo"
+                   })
+
+from app.models.schema import VideoParams, VideoAspect, VideoConcatMode
+from app.services import task as tm, llm, voice
+from app.utils import utils

 hide_streamlit_style = """
 <style>#root > div:nth-child(1) > div > div > div > div > section > div {padding-top: 0rem;}</style>
 """
 st.markdown(hide_streamlit_style, unsafe_allow_html=True)
 st.title("MoneyPrinterTurbo")
-st.write(
-    "⚠️ 先在 **config.toml** 中设置 `pexels_api_keys` 和 `llm_provider` 参数，根据不同的 llm_provider，配置对应的 **API KEY**"
-)

-root_dir = os.path.dirname(os.path.dirname(os.path.realpath(__file__)))
 font_dir = os.path.join(root_dir, "resource", "fonts")
 song_dir = os.path.join(root_dir, "resource", "songs")
+i18n_dir = os.path.join(root_dir, "webui", "i18n")
+config_file = os.path.join(root_dir, "webui", ".streamlit", "webui.toml")
+
+
+def load_config() -> dict:
+    try:
+        return toml.load(config_file)
+    except Exception as e:
+        return {}
+
+
+cfg = load_config()
+
+
+def save_config():
+    with open(config_file, "w", encoding="utf-8") as f:
+        f.write(toml.dumps(cfg))
+
+
+def get_system_locale():
+    try:
+        loc = locale.getdefaultlocale()
+        # zh_CN, zh_TW return zh
+        # en_US, en_GB return en
+        language_code = loc[0].split("_")[0]
+        return language_code
+    except Exception as e:
+        return "en"

-# st.session_state

 if 'video_subject' not in st.session_state:
    st.session_state['video_subject'] = ''
@ -31,6 +80,8 @@ if 'video_script' not in st.session_state:
    st.session_state['video_script'] = ''
 if 'video_terms' not in st.session_state:
    st.session_state['video_terms'] = ''
+if 'ui_language' not in st.session_state:
+    st.session_state['ui_language'] = cfg.get("ui_language", get_system_locale())


 def get_all_fonts():
@ -51,6 +102,36 @@ def get_all_songs():
    return songs


+def open_task_folder(task_id):
+    try:
+        sys = platform.system()
+        path = os.path.join(root_dir, "storage", "tasks", task_id)
+        if os.path.exists(path):
+            if sys == 'Windows':
+                os.system(f"start {path}")
+            if sys == 'Darwin':
+                os.system(f"open {path}")
+    except Exception as e:
+        logger.error(e)
+
+
+def scroll_to_bottom():
+    js = f"""
+    <script>
+        console.log("scroll_to_bottom");
+        function scroll(dummy_var_to_force_repeat_execution){{
+            var sections = parent.document.querySelectorAll('section.main');
+            console.log(sections);
+            for(let index = 0; index<sections.length; index++) {{
+                sections[index].scrollTop = sections[index].scrollHeight;
+            }}
+        }}
+        scroll(1);
+    </script>
+    """
+    st.components.v1.html(js, height=0, width=0)
+
+
 def init_log():
    logger.remove()
    _lvl = "DEBUG"
@ -82,114 +163,156 @@ def init_log():

 init_log()

+
+def load_locales():
+    locales = {}
+    for root, dirs, files in os.walk(i18n_dir):
+        for file in files:
+            if file.endswith(".json"):
+                lang = file.split(".")[0]
+                with open(os.path.join(root, file), "r", encoding="utf-8") as f:
+                    locales[lang] = json.loads(f.read())
+    return locales
+
+
+locales = load_locales()
+
+
+def tr(key):
+    loc = locales.get(st.session_state['ui_language'], {})
+    return loc.get("Translation", {}).get(key, key)
+
+
+display_languages = []
+selected_index = 0
+for i, code in enumerate(locales.keys()):
+    display_languages.append(f"{code} - {locales[code].get('Language')}")
+    if code == st.session_state['ui_language']:
+        selected_index = i
+
+selected_language = st.selectbox("Language", options=display_languages, label_visibility='collapsed',
+                                 index=selected_index)
+if selected_language:
+    code = selected_language.split(" - ")[0].strip()
+    st.session_state['ui_language'] = code
+    cfg['ui_language'] = code
+    save_config()
+
 panel = st.columns(3)
 left_panel = panel[0]
 middle_panel = panel[1]
 right_panel = panel[2]

-# define cfg as VideoParams class
-cfg = VideoParams()
+params = VideoParams()

 with left_panel:
    with st.container(border=True):
-        st.write("**文案设置**")
-        cfg.video_subject = st.text_input("视频主题（给定一个关键词，:red[AI自动生成]视频文案）",
-                                          value=st.session_state['video_subject']).strip()
+        st.write(tr("Video Script Settings"))
+        params.video_subject = st.text_input(tr("Video Subject"),
+                                             value=st.session_state['video_subject']).strip()

        video_languages = [
-            ("自动判断（Auto detect）", ""),
+            (tr("Auto Detect"), ""),
        ]
-        for lang in ["zh-CN", "zh-TW", "en-US"]:
-            video_languages.append((lang, lang))
+        for code in ["zh-CN", "zh-TW", "de-DE", "en-US"]:
+            video_languages.append((code, code))

-        selected_index = st.selectbox("生成视频脚本的语言（:blue[一般情况AI会自动根据你输入的主题语言输出]）",
+        selected_index = st.selectbox(tr("Script Language"),
                                      index=0,
                                      options=range(len(video_languages)),  # 使用索引作为内部选项值
                                      format_func=lambda x: video_languages[x][0]  # 显示给用户的是标签
                                      )
-        cfg.video_language = video_languages[selected_index][1]
+        params.video_language = video_languages[selected_index][1]

-        if cfg.video_language:
-            st.write(f"设置AI输出文案语言为: **:red[{cfg.video_language}]**")
-
-        if st.button("点击使用AI根据**主题**生成 【视频文案】 和 【视频关键词】", key="auto_generate_script"):
-            with st.spinner("AI正在生成视频文案和关键词..."):
-                script = llm.generate_script(video_subject=cfg.video_subject, language=cfg.video_language)
-                terms = llm.generate_terms(cfg.video_subject, script)
-                st.toast('AI生成成功')
+        if st.button(tr("Generate Video Script and Keywords"), key="auto_generate_script"):
+            with st.spinner(tr("Generating Video Script and Keywords")):
+                script = llm.generate_script(video_subject=params.video_subject, language=params.video_language)
+                terms = llm.generate_terms(params.video_subject, script)
                st.session_state['video_script'] = script
                st.session_state['video_terms'] = ", ".join(terms)

-        cfg.video_script = st.text_area(
-            "视频文案（:blue[①可不填，使用AI生成  ②合理使用标点断句，有助于生成字幕]）",
+        params.video_script = st.text_area(
+            tr("Video Script"),
            value=st.session_state['video_script'],
-            height=230
+            height=280
        )
-        if st.button("点击使用AI根据**文案**生成【视频关键词】", key="auto_generate_terms"):
-            if not cfg.video_script:
-                st.error("请先填写视频文案")
+        if st.button(tr("Generate Video Keywords"), key="auto_generate_terms"):
+            if not params.video_script:
+                st.error(tr("Please Enter the Video Subject"))
                st.stop()

-            with st.spinner("AI正在生成视频关键词..."):
-                terms = llm.generate_terms(cfg.video_subject, cfg.video_script)
-                st.toast('AI生成成功')
+            with st.spinner(tr("Generating Video Keywords")):
+                terms = llm.generate_terms(params.video_subject, params.video_script)
                st.session_state['video_terms'] = ", ".join(terms)

-        cfg.video_terms = st.text_area(
-            "视频关键词（:blue[①可不填，使用AI生成 ②用**英文逗号**分隔，只支持英文]）",
+        params.video_terms = st.text_area(
+            tr("Video Keywords"),
            value=st.session_state['video_terms'],
            height=50)

 with middle_panel:
    with st.container(border=True):
-        st.write("**视频设置**")
+        st.write(tr("Video Settings"))
        video_concat_modes = [
-            ("顺序拼接", "sequential"),
-            ("随机拼接（推荐）", "random"),
+            (tr("Sequential"), "sequential"),
+            (tr("Random"), "random"),
        ]
-        selected_index = st.selectbox("视频拼接模式",
+        selected_index = st.selectbox(tr("Video Concat Mode"),
                                      index=1,
                                      options=range(len(video_concat_modes)),  # 使用索引作为内部选项值
                                      format_func=lambda x: video_concat_modes[x][0]  # 显示给用户的是标签
                                      )
-        cfg.video_concat_mode = VideoConcatMode(video_concat_modes[selected_index][1])
+        params.video_concat_mode = VideoConcatMode(video_concat_modes[selected_index][1])

        video_aspect_ratios = [
-            ("竖屏 9:16（抖音视频）", VideoAspect.portrait.value),
-            ("横屏 16:9（西瓜视频）", VideoAspect.landscape.value),
-            # ("方形 1:1", VideoAspect.square.value)
+            (tr("Portrait"), VideoAspect.portrait.value),
+            (tr("Landscape"), VideoAspect.landscape.value),
        ]
-        selected_index = st.selectbox("视频比例",
+        selected_index = st.selectbox(tr("Video Ratio"),
                                      options=range(len(video_aspect_ratios)),  # 使用索引作为内部选项值
                                      format_func=lambda x: video_aspect_ratios[x][0]  # 显示给用户的是标签
                                      )
-        cfg.video_aspect = VideoAspect(video_aspect_ratios[selected_index][1])
+        params.video_aspect = VideoAspect(video_aspect_ratios[selected_index][1])

-        cfg.video_clip_duration = st.selectbox("视频片段最大时长(秒)", options=[2, 3, 4, 5, 6], index=1)
-        cfg.video_count = st.selectbox("同时生成视频数量", options=[1, 2, 3, 4, 5], index=0)
+        params.video_clip_duration = st.selectbox(tr("Clip Duration"), options=[2, 3, 4, 5, 6], index=1)
+        params.video_count = st.selectbox(tr("Number of Videos Generated Simultaneously"), options=[1, 2, 3, 4, 5],
+                                          index=0)
    with st.container(border=True):
-        st.write("**音频设置**")
-        # 创建一个映射字典，将原始值映射到友好名称
+        st.write(tr("Audio Settings"))
+        voices = voice.get_all_voices(filter_locals=["zh-CN", "zh-HK", "zh-TW", "de-DE", "en-US"])
        friendly_names = {
            voice: voice.
-            replace("female", "女性").
-            replace("male", "男性").
-            replace("zh-CN", "中文").
-            replace("zh-HK", "香港").
-            replace("zh-TW", "台湾").
-            replace("en-US", "英文").
+            replace("Female", tr("Female")).
+            replace("Male", tr("Male")).
            replace("Neural", "") for
-            voice in VoiceNames}
-        selected_friendly_name = st.selectbox("朗读声音", options=list(friendly_names.values()))
-        voice_name = list(friendly_names.keys())[list(friendly_names.values()).index(selected_friendly_name)]
-        cfg.voice_name = voice_name
+            voice in voices}
+        saved_voice_name = cfg.get("voice_name", "")
+        saved_voice_name_index = 0
+        if saved_voice_name in friendly_names:
+            saved_voice_name_index = list(friendly_names.keys()).index(saved_voice_name)
+        else:
+            for i, voice in enumerate(voices):
+                if voice.lower().startswith(st.session_state['ui_language'].lower()):
+                    saved_voice_name_index = i
+                    break

+        selected_friendly_name = st.selectbox(tr("Speech Synthesis"),
+                                              options=list(friendly_names.values()),
+                                              index=saved_voice_name_index)
+
+        voice_name = list(friendly_names.keys())[list(friendly_names.values()).index(selected_friendly_name)]
+        params.voice_name = voice_name
+        cfg['voice_name'] = voice_name
+        save_config()
+
+        params.voice_volume = st.selectbox(tr("Speech Volume"),
+                                           options=[0.6, 0.8, 1.0, 1.2, 1.5, 2.0, 3.0, 4.0, 5.0], index=2)
        bgm_options = [
-            ("无背景音乐 No BGM", ""),
-            ("随机背景音乐 Random BGM", "random"),
-            ("自定义背景音乐 Custom BGM", "custom"),
+            (tr("No Background Music"), ""),
+            (tr("Random Background Music"), "random"),
+            (tr("Custom Background Music"), "custom"),
        ]
-        selected_index = st.selectbox("背景音乐",
+        selected_index = st.selectbox(tr("Background Music"),
                                      index=1,
                                      options=range(len(bgm_options)),  # 使用索引作为内部选项值
                                      format_func=lambda x: bgm_options[x][0]  # 显示给用户的是标签
@ -199,55 +322,53 @@ with middle_panel:

        # 根据选择显示或隐藏组件
        if bgm_type == "custom":
-            custom_bgm_file = st.text_input("请输入自定义背景音乐的文件路径：")
+            custom_bgm_file = st.text_input(tr("Custom Background Music File"))
            if custom_bgm_file and os.path.exists(custom_bgm_file):
-                cfg.bgm_file = custom_bgm_file
+                params.bgm_file = custom_bgm_file
                # st.write(f":red[已选择自定义背景音乐]：**{custom_bgm_file}**")
-        cfg.bgm_volume = st.selectbox("背景音乐音量（0.2表示20%，背景声音不宜过高）",
-                                      options=[0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0], index=2)
+        params.bgm_volume = st.selectbox(tr("Background Music Volume"),
+                                         options=[0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0], index=2)

 with right_panel:
    with st.container(border=True):
-        st.write("**字幕设置**")
-        cfg.subtitle_enabled = st.checkbox("生成字幕（若取消勾选，下面的设置都将不生效）", value=True)
+        st.write(tr("Subtitle Settings"))
+        params.subtitle_enabled = st.checkbox(tr("Enable Subtitles"), value=True)
        font_names = get_all_fonts()
-        cfg.font_name = st.selectbox("字体", font_names)
+        params.font_name = st.selectbox(tr("Font"), font_names)

        subtitle_positions = [
-            ("顶部（top）", "top"),
-            ("居中（center）", "center"),
-            ("底部（bottom，推荐）", "bottom"),
+            (tr("Top"), "top"),
+            (tr("Center"), "center"),
+            (tr("Bottom"), "bottom"),
        ]
-        selected_index = st.selectbox("字幕位置",
+        selected_index = st.selectbox(tr("Position"),
                                      index=2,
                                      options=range(len(subtitle_positions)),  # 使用索引作为内部选项值
                                      format_func=lambda x: subtitle_positions[x][0]  # 显示给用户的是标签
                                      )
-        cfg.subtitle_position = subtitle_positions[selected_index][1]
+        params.subtitle_position = subtitle_positions[selected_index][1]

        font_cols = st.columns([0.3, 0.7])
        with font_cols[0]:
-            cfg.text_fore_color = st.color_picker("字幕颜色", "#FFFFFF")
+            params.text_fore_color = st.color_picker(tr("Font Color"), "#FFFFFF")
        with font_cols[1]:
-            cfg.font_size = st.slider("字幕大小", 30, 100, 60)
+            params.font_size = st.slider(tr("Font Size"), 30, 100, 60)

        stroke_cols = st.columns([0.3, 0.7])
        with stroke_cols[0]:
-            cfg.stroke_color = st.color_picker("描边颜色", "#000000")
+            params.stroke_color = st.color_picker(tr("Stroke Color"), "#000000")
        with stroke_cols[1]:
-            cfg.stroke_width = st.slider("描边粗细", 0.0, 10.0, 1.5)
+            params.stroke_width = st.slider(tr("Stroke Width"), 0.0, 10.0, 1.5)

-start_button = st.button("开始生成视频", use_container_width=True, type="primary")
+start_button = st.button(tr("Generate Video"), use_container_width=True, type="primary")
 if start_button:
    task_id = str(uuid4())
-    if not cfg.video_subject and not cfg.video_script:
-        st.error("视频主题 或 视频文案，不能同时为空")
+    if not params.video_subject and not params.video_script:
+        st.error(tr("Video Script and Subject Cannot Both Be Empty"))
+        scroll_to_bottom()
        st.stop()

-    st.write(cfg)
-
    log_container = st.empty()
-
    log_records = []


@ -259,6 +380,24 @@ if start_button:

    logger.add(log_received)

-    logger.info("开始生成视频")
+    st.toast(tr("Generating Video"))
+    logger.info(tr("Start Generating Video"))
+    logger.info(utils.to_json(params))
+    scroll_to_bottom()

-    tm.start(task_id=task_id, params=cfg)
+    result = tm.start(task_id=task_id, params=params)
+
+    video_files = result.get("videos", [])
+    st.success(tr("Video Generation Completed"))
+    try:
+        if video_files:
+            # center the video player
+            player_cols = st.columns(len(video_files) * 2 + 1)
+            for i, url in enumerate(video_files):
+                player_cols[i * 2 + 1].video(url)
+    except Exception as e:
+        pass
+
+    open_task_folder(task_id)
+    logger.info(tr("Video Generation Completed"))
+    scroll_to_bottom()
--- a/webui/i18n/de.json
+++ b/webui/i18n/de.json
@ -0,0 +1,53 @@
+{
+  "Language": "German",
+  "Translation": {
+    "Video Script Settings": "**Drehbuch / Topic des Videos**",
+    "Video Subject": "Worum soll es in dem Video gehen? (Geben Sie ein Keyword an, :red[Dank KI wird automatisch ein Drehbuch generieren])",
+    "Script Language": "Welche Sprache soll zum Generieren von Drehbüchern  verwendet werden? :red[KI generiert anhand dieses Begriffs das Drehbuch]",
+    "Generate Video Script and Keywords": "Klicken Sie hier, um mithilfe von KI ein [Video Drehbuch] und [Video Keywords] basierend auf dem **Keyword** zu generieren.",
+    "Auto Detect": "Automatisch erkennen",
+    "Video Script": "Drehbuch (Storybook) (:blue[① Optional, KI generiert  ② Die richtige Zeichensetzung hilft bei der Erstellung von Untertiteln])",
+    "Generate Video Keywords": "Klicken Sie, um KI zum Generieren zu verwenden [Video Keywords] basierend auf dem **Drehbuch**",
+    "Please Enter the Video Subject": "Bitte geben Sie zuerst das Drehbuch an",
+    "Generating Video Script and Keywords": "KI generiert ein Drehbuch und Schlüsselwörter...",
+    "Generating Video Keywords": "AI is generating video keywords...",
+    "Video Keywords": "Video Schlüsselwörter (:blue[① Optional, KI generiert ② Verwende **, (Kommas)** zur Trennung der Wörter, in englischer Sprache])",
+    "Video Settings": "**Video Einstellungen**",
+    "Video Concat Mode": "Videoverkettungsmodus",
+    "Random": "Zufällige Verkettung (empfohlen)",
+    "Sequential": "Sequentielle Verkettung",
+    "Video Ratio": "Video-Seitenverhältnis",
+    "Portrait": "Portrait 9:16",
+    "Landscape": "Landschaft 16:9",
+    "Clip Duration": "Maximale Dauer einzelner Videoclips in sekunden",
+    "Number of Videos Generated Simultaneously": "Anzahl der parallel generierten Videos",
+    "Audio Settings": "**Audio Einstellungen**",
+    "Speech Synthesis": "Sprachausgabe",
+    "Speech Volume": "Lautstärke der Sprachausgabe",
+    "Male": "Männlich",
+    "Female": "Weiblich",
+    "Background Music": "Hintergrundmusik",
+    "No Background Music": "Ohne Hintergrundmusik",
+    "Random Background Music": "Zufällig erzeugte Hintergrundmusik",
+    "Custom Background Music": "Benutzerdefinierte Hintergrundmusik",
+    "Custom Background Music File": "Bitte gib den Pfad zur Musikdatei an:",
+    "Background Music Volume": "Lautstärke: (0.2 entspricht 20%, sollte nicht zu laut sein)",
+    "Subtitle Settings": "**Untertitel-Einstellungen**",
+    "Enable Subtitles": "Untertitel aktivieren (Wenn diese Option deaktiviert ist, werden die Einstellungen nicht genutzt)",
+    "Font": "Schriftart des Untertitels",
+    "Position": "Ausrichtung des Untertitels",
+    "Top": "Oben",
+    "Center": "Mittig",
+    "Bottom": "Unten (empfohlen)",
+    "Font Size": "Schriftgröße für Untertitel",
+    "Font Color": "Schriftfarbe",
+    "Stroke Color": "Kontur",
+    "Stroke Width": "Breite der Untertitelkontur",
+    "Generate Video": "Generiere Videos durch KI",
+    "Video Script and Subject Cannot Both Be Empty": "Das Video-Thema und Drehbuch dürfen nicht beide leer sein",
+    "Generating Video": "Video wird erstellt, bitte warten...",
+    "Start Generating Video": "Beginne mit der Generierung",
+    "Video Generation Completed": "Video erfolgreich generiert",
+    "You can download the generated video from the following links": "Sie können das generierte Video über die folgenden Links herunterladen"
+  }
+}
--- a/webui/i18n/en.json
+++ b/webui/i18n/en.json
@ -0,0 +1,53 @@
+{
+  "Language": "English",
+  "Translation": {
+    "Video Script Settings": "**Video Script Settings**",
+    "Video Subject": "Video Subject (Provide a keyword, :red[AI will automatically generate] video script)",
+    "Script Language": "Language for Generating Video Script (AI will automatically output based on the language of your subject)",
+    "Generate Video Script and Keywords": "Click to use AI to generate [Video Script] and [Video Keywords] based on **subject**",
+    "Auto Detect": "Auto Detect",
+    "Video Script": "Video Script (:blue[① Optional, AI generated  ② Proper punctuation helps with subtitle generation])",
+    "Generate Video Keywords": "Click to use AI to generate [Video Keywords] based on **script**",
+    "Please Enter the Video Subject": "Please Enter the Video Script First",
+    "Generating Video Script and Keywords": "AI is generating video script and keywords...",
+    "Generating Video Keywords": "AI is generating video keywords...",
+    "Video Keywords": "Video Keywords (:blue[① Optional, AI generated ② Use **English commas** for separation, English only])",
+    "Video Settings": "**Video Settings**",
+    "Video Concat Mode": "Video Concatenation Mode",
+    "Random": "Random Concatenation (Recommended)",
+    "Sequential": "Sequential Concatenation",
+    "Video Ratio": "Video Aspect Ratio",
+    "Portrait": "Portrait 9:16",
+    "Landscape": "Landscape 16:9",
+    "Clip Duration": "Maximum Duration of Video Clips (seconds)",
+    "Number of Videos Generated Simultaneously": "Number of Videos Generated Simultaneously",
+    "Audio Settings": "**Audio Settings**",
+    "Speech Synthesis": "Speech Synthesis Voice",
+    "Speech Volume": "Speech Volume (1.0 represents 100%)",
+    "Male": "Male",
+    "Female": "Female",
+    "Background Music": "Background Music",
+    "No Background Music": "No Background Music",
+    "Random Background Music": "Random Background Music",
+    "Custom Background Music": "Custom Background Music",
+    "Custom Background Music File": "Please enter the file path for custom background music:",
+    "Background Music Volume": "Background Music Volume (0.2 represents 20%, background music should not be too loud)",
+    "Subtitle Settings": "**Subtitle Settings**",
+    "Enable Subtitles": "Enable Subtitles (If unchecked, the settings below will not take effect)",
+    "Font": "Subtitle Font",
+    "Position": "Subtitle Position",
+    "Top": "Top",
+    "Center": "Center",
+    "Bottom": "Bottom (Recommended)",
+    "Font Size": "Subtitle Font Size",
+    "Font Color": "Subtitle Font Color",
+    "Stroke Color": "Subtitle Outline Color",
+    "Stroke Width": "Subtitle Outline Width",
+    "Generate Video": "Generate Video",
+    "Video Script and Subject Cannot Both Be Empty": "Video Subject and Video Script cannot both be empty",
+    "Generating Video": "Generating video, please wait...",
+    "Start Generating Video": "Start Generating Video",
+    "Video Generation Completed": "Video Generation Completed",
+    "You can download the generated video from the following links": "You can download the generated video from the following links"
+  }
+}
--- a/webui/i18n/zh.json
+++ b/webui/i18n/zh.json
@ -0,0 +1,53 @@
+{
+  "Language": "简体中文",
+  "Translation": {
+    "Video Script Settings": "**文案设置**",
+    "Video Subject": "视频主题（给定一个关键词，:red[AI自动生成]视频文案）",
+    "Script Language": "生成视频脚本的语言（一般情况AI会自动根据你输入的主题语言输出）",
+    "Generate Video Script and Keywords": "点击使用AI根据**主题**生成 【视频文案】 和 【视频关键词】",
+    "Auto Detect": "自动检测",
+    "Video Script": "视频文案（:blue[①可不填，使用AI生成  ②合理使用标点断句，有助于生成字幕]）",
+    "Generate Video Keywords": "点击使用AI根据**文案**生成【视频关键词】",
+    "Please Enter the Video Subject": "请先填写视频文案",
+    "Generating Video Script and Keywords": "AI正在生成视频文案和关键词...",
+    "Generating Video Keywords": "AI正在生成视频关键词...",
+    "Video Keywords": "视频关键词（:blue[①可不填，使用AI生成 ②用**英文逗号**分隔，只支持英文]）",
+    "Video Settings": "**视频设置**",
+    "Video Concat Mode": "视频拼接模式",
+    "Random": "随机拼接（推荐）",
+    "Sequential": "顺序拼接",
+    "Video Ratio": "视频比例",
+    "Portrait": "竖屏 9:16（抖音视频）",
+    "Landscape": "横屏 16:9（西瓜视频）",
+    "Clip Duration": "视频片段最大时长(秒)",
+    "Number of Videos Generated Simultaneously": "同时生成视频数量",
+    "Audio Settings": "**音频设置**",
+    "Speech Synthesis": "朗读声音（:red[尽量与文案语言保持一致]）",
+    "Speech Volume": "朗读音量（1.0表示100%）",
+    "Male": "男性",
+    "Female": "女性",
+    "Background Music": "背景音乐",
+    "No Background Music": "无背景音乐",
+    "Random Background Music": "随机背景音乐",
+    "Custom Background Music": "自定义背景音乐",
+    "Custom Background Music File": "请输入自定义背景音乐的文件路径",
+    "Background Music Volume": "背景音乐音量（0.2表示20%，背景声音不宜过高）",
+    "Subtitle Settings": "**字幕设置**",
+    "Enable Subtitles": "启用字幕（若取消勾选，下面的设置都将不生效）",
+    "Font": "字幕字体",
+    "Position": "字幕位置",
+    "Top": "顶部",
+    "Center": "中间",
+    "Bottom": "底部（推荐）",
+    "Font Size": "字幕大小",
+    "Font Color": "字幕颜色",
+    "Stroke Color": "描边颜色",
+    "Stroke Width": "描边粗细",
+    "Generate Video": "生成视频",
+    "Video Script and Subject Cannot Both Be Empty": "视频主题 和 视频文案，不能同时为空",
+    "Generating Video": "正在生成视频，请稍候...",
+    "Start Generating Video": "开始生成视频",
+    "Video Generation Completed": "视频生成完成",
+    "You can download the generated video from the following links": "你可以从以下链接下载生成的视频"
+  }
+}