由於想要在一個單純的HTML透過javascript直接定時擷取串流圖片,然後固定的定時候過Gemini Flash 1.5的多模態功能,請AI描述串流播放中影像內容,然後積滿10張後,請他描述一下這個串流的總結或者回應等等的應用

成果是這樣,純HTML +javascript,就可以進行HLS擷取然後持續的用Gemini Flash描述影片行為 (免費版本要注意呼叫頻率與使用次數限制)

注意,由於我只是要放在自己手機或者電腦中跑顯示html網頁,如果html實際對外開放,是不建議在javascript上面讀入你的API KEY去直接執行的,最好是呼叫包在一個service中

再打開官方的AI Studio後很開心的測試,增加一張Sample Image,輸入了一個what is this

點下去Get Code,沒想到javascriptnodejs的版本,但我不想要架設nodejs,因為我想直接開啟html 放在任何地方就可以執行


想說開心心的把它丟到 chatgpt o1-mini請他直接轉成 純javascript的方式給我,結果跑了後發現一直遇到錯誤,原因是說filetype 應該是binary不應該是json,我基於想要全程用LLM來產生這個html,於是也沒仔細看到底是出什麼錯誤,只丟上錯誤碼,不斷請他修正,重新產生,由於沒有保留當下的畫面,也就沒圖了

這中間切換到 chatgpt o1-preview / o1-mini / gemini pro exp版本等等,都沒有修正對程式碼,就差github copilot懶得打開來修正,猜測也是一樣,直到我看到gemini pro有一段回覆是寫這樣


總之錯誤在於,一個curl應該就只會有一種content-type,但這邊指定了兩種,上面標記 applicatioin/json

透過 -d “….”,可是又有一個 — data-binary·這邊是錯誤的

— — — — —

後來索性就不繼續AI了,自己點開了上面的API Docs

就在 Prompt with images這段內容



1 .透過 upload/v1beta/files 定義一個你自己的displayName





Prompt with image 就可以成功,如果你是用python or nodejs or android等等,由於sdk內都幫你寫好了,所以基本不會遇到這個問題,也不需要知道他上傳的順序

像是android kotlin你只需要這樣寫,雖然上面寫還沒完成,但我實際使用是有一個類似Image的builder可以直接塞入content中,這邊應該是漏掉了

// NOTE: Your prompt contains media inputs, which are not currently supported by
// the Android SDK. The code snippet below may be incomplete.
// See here for more information and updates:

// In Android Studio, add the following dependency to your build.gradle.kts file:
// implementation("")

// Add the following code to your Kotlin source code

val model = GenerativeModel(
generationConfig = generationConfig {
temperature = 1f
topK = 64
topP = 0.95f
maxOutputTokens = 8192
responseMimeType = "text/plain"

val chatHistory = listOf(
content("user") {
// TODO Upload file "image_animal5.jpeg"
text("what is this\n")
content("model") {
text("The image shows **Atlantic Puffins** (Fratercula arctica) in their natural habitat, likely on a cliff or coastal area.\n\n\nHere's a breakdown of the elements in the image:\n\n\n* **Puffins:** The main subject is a puffin in sharp focus, standing on a grassy cliff. Another puffin is visible in the background, out of focus. Puffins are characterized by their black and white plumage, bright orange beak, and orange feet. \n* **Environment:** The puffins are situated on a grassy area that looks like a cliff edge, with the ocean blurred in the background. The vegetation is primarily green, suggesting it's in a coastal or temperate region.\n* **Ocean:** The background is a soft blue, suggesting the ocean, which is the natural habitat of puffins.\n\n\n**In essence, the image captures a moment in the life of Atlantic Puffins in their natural environment.** These birds are often found in colonies on islands and cliffs near the ocean, and are known for their vibrant colors and comical appearance.\n")

val chat = model.startChat(chatHistory)
val response = chat.sendMessage("INSERT_INPUT_HERE")


最後轉成的javascript會是這樣,裡面還有原本的ui code,就請自行腦補移除 XD

async function captureAndSegment() {
if (!apiToken || videoElement.videoWidth === 0 || videoElement.videoHeight === 0) {
console.log('API Token not set or video not loaded.');

try {
hiddenCanvas.width = videoElement.videoWidth;
hiddenCanvas.height = videoElement.videoHeight;
const context = hiddenCanvas.getContext('2d');
context.drawImage(videoElement, 0, 0, hiddenCanvas.width, hiddenCanvas.height);
console.log('Video frame captured.');

const blob = await new Promise(resolve => hiddenCanvas.toBlob(resolve, 'image/jpeg'));
if (!blob) {
console.log('Failed to convert video frame to Blob.');
console.log('Video frame converted to Blob.');

const arrayBuffer = await blob.arrayBuffer();
const numBytes = arrayBuffer.byteLength;
const mimeType = blob.type;
console.log(`Blob size: ${numBytes} bytes, MIME type: ${mimeType}`);

// 1. 先使用display_name跟google取得圖片要上傳用的 url
const uploadUrl = `${apiToken}`;
const uploadHeaders = {
'X-Goog-Upload-Protocol': 'resumable',
'X-Goog-Upload-Command': 'start',
'X-Goog-Upload-Header-Content-Length': numBytes,
'X-Goog-Upload-Header-Content-Type': mimeType,
'Content-Type': 'application/json'
const uploadBody = JSON.stringify({ file: { display_name: 'captured_frame.jpeg' } });
console.log('Sending initial upload metadata request:', uploadBody);

const uploadResponse = await fetch(uploadUrl, {
method: 'POST',
headers: uploadHeaders,
body: uploadBody

if (!uploadResponse.ok) {
console.log('Upload API Error:', uploadResponse.status, uploadResponse.statusText);
console.log('Initial upload metadata response received.');
// 2.從這邊取得 uploadUrl
const uploadUrlHeader = uploadResponse.headers.get('x-goog-upload-url');
if (!uploadUrlHeader) {
console.log('Upload URL not found in response headers.');
console.log('Upload URL:', uploadUrlHeader);

const uploadBinaryHeaders = {
'X-Goog-Upload-Command': 'upload, finalize',
'X-Goog-Upload-Offset': '0',
'Content-Length': numBytes,
'Content-Type': mimeType
console.log('Uploading binary data to upload URL.');

// 3. 對這個 uploadUrl進行 post 你的binary圖片
const uploadBinaryResponse = await fetch(uploadUrlHeader, {
method: 'POST',
headers: uploadBinaryHeaders,
body: blob

if (!uploadBinaryResponse.ok) {
console.log('Upload Binary API Error:', uploadBinaryResponse.status, uploadBinaryResponse.statusText);
console.log('Binary upload completed.');

const fileInfo = await uploadBinaryResponse.json();
// 4. 成功的話會在這邊取得fileUri
const fileUri = fileInfo.file.uri;
if (!fileUri) {
console.log('File URI not found in binary upload response.');
console.log('File URI:', fileUri);

// 5. 最後把這個fileUri加入對話中
const generateContentUrl = `${apiToken}`;
const prompt = "描述圖片中的物品,或者人物動作,或是表情";
const generateContentBody = {
contents: [
role: "user",
parts: [
{ file_data: { file_uri: fileUri, mime_type: mimeType } },
{ text: prompt }
generationConfig: {
temperature: 1,
topK: 64,
topP: 0.95,
maxOutputTokens: 8192,
responseMimeType: "text/plain"
console.log('Sending generateContent request:', JSON.stringify(generateContentBody));

const generateContentResponse = await fetch(generateContentUrl, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify(generateContentBody)

if (!generateContentResponse.ok) {
console.log('GenerateContent API Error:', generateContentResponse.status, generateContentResponse.statusText);
console.log('GenerateContent response received.');

const generateContentData = await generateContentResponse.json();
console.log('GenerateContent response data:', generateContentData);

// 基本上就成功啦
const description = generateContentData.candidates && generateContentData.candidates[0]
? generateContentData.candidates[0][0].text
: "未提供描述。";
console.log('Description added:', description);
} catch (error) {
console.log('Error:', error);

即使我後來將這段github上的curl流程給llm,也都沒能正確的產生,於是後來只好自己看著程式碼修正,自己開始手動改,終於才是完成可以運作,以上就是自己透過Google Studio使用prompt with image + curl遇到的坑。




