AI Studio的一個Prompt with image中GetCode 的坑 (curl問題)

JLin

17 min read2 hours ago

由於想要在一個單純的HTML透過javascript直接定時擷取串流圖片，然後固定的定時候過Gemini Flash 1.5的多模態功能，請AI描述串流播放中影像內容，然後積滿10張後，請他描述一下這個串流的總結或者回應等等的應用

成果是這樣，純HTML +javascript，就可以進行HLS擷取然後持續的用Gemini Flash描述影片行為 (免費版本要注意呼叫頻率與使用次數限制)

注意，由於我只是要放在自己手機或者電腦中跑顯示html網頁，如果html實際對外開放,是不建議在javascript上面讀入你的API KEY去直接執行的，最好是呼叫包在一個service中

再打開官方的AI Studio後很開心的測試，增加一張Sample Image，輸入了一個what is this

點下去Get Code，沒想到javascript是nodejs的版本，但我不想要架設nodejs，因為我想直接開啟html 放在任何地方就可以執行

於是我改選了curl

想說開心心的把它丟到 chatgpt o1-mini請他直接轉成純javascript的方式給我，結果跑了後發現一直遇到錯誤，原因是說filetype 應該是binary不應該是json，我基於想要全程用LLM來產生這個html，於是也沒仔細看到底是出什麼錯誤，只丟上錯誤碼，不斷請他修正，重新產生，由於沒有保留當下的畫面，也就沒圖了

這中間切換到 chatgpt o1-preview / o1-mini / gemini pro exp版本等等，都沒有修正對程式碼，就差github copilot懶得打開來修正，猜測也是一樣，直到我看到gemini pro有一段回覆是寫這樣

我仔細看才發現，不過即使他發現了，他自己也沒有修正正確，這段程式碼是錯的，說要移除，結果還是在啊ＸＤ

總之錯誤在於，一個curl應該就只會有一種content-type,但這邊指定了兩種，上面標記 applicatioin/json

透過 -d “….”，可是又有一個 — data-binary·這邊是錯誤的

— — — — —

後來索性就不繼續AI了，自己點開了上面的API Docs

Gemini API quickstart | Google AI for Developers

Get started with the Gemini API

ai.google.dev

就在 Prompt with images這段內容

https://ai.google.dev/gemini-api/docs/vision?lang=rest#prompting-images

重點是看到下面，官方的這段才是正確的

https://github.com/google-gemini/generative-ai-python/blob/4f42118d3d4daf9addc1c18f7d6895f7cd74087d/samples/rest/files.sh#L84-L133

後來查了一下原來有幾個步驟

1 .透過 upload/v1beta/files 定義一個你自己的displayName

2.他會回傳一段upload_url接下來要post上傳二進位檔案的專用upload_url給你

3.使用這段upload_url上傳post你的圖片或檔案,確認ok

4.確認成功，就可以用回傳的json中file.url這個file_url當作content的一部分附加在json上面

以上就完成了，就可以正確取得到

Prompt with image 就可以成功，如果你是用python or nodejs or android等等，由於sdk內都幫你寫好了，所以基本不會遇到這個問題,也不需要知道他上傳的順序

像是android kotlin你只需要這樣寫,雖然上面寫還沒完成，但我實際使用是有一個類似Image的builder可以直接塞入content中，這邊應該是漏掉了

// NOTE: Your prompt contains media inputs, which are not currently supported by
// the Android SDK. The code snippet below may be incomplete.
//
// See here for more information and updates:
// https://ai.google.dev/gemini-api/docs/prompting_with_media

// In Android Studio, add the following dependency to your build.gradle.kts file:
// implementation("com.google.ai.client.generativeai:generativeai:0.7.0")

// Add the following code to your Kotlin source code
import com.google.ai.client.generativeai.GenerativeModel

val model = GenerativeModel(
  "gemini-1.5-flash-exp-0827",BuildConfig.geminiApiKey,
  generationConfig = generationConfig {
    temperature = 1f
    topK = 64
    topP = 0.95f
    maxOutputTokens = 8192
    responseMimeType = "text/plain"
  },
)

val chatHistory = listOf(
  content("user") {
    // TODO Upload file "image_animal5.jpeg"
    text("what is this\n")
  },
  content("model") {
    text("The image shows **Atlantic Puffins** (Fratercula arctica) in their natural habitat, likely on a cliff or coastal area.\n\n\nHere's a breakdown of the elements in the image:\n\n\n* **Puffins:** The main subject is a puffin in sharp focus, standing on a grassy cliff. Another puffin is visible in the background, out of focus. Puffins are characterized by their black and white plumage, bright orange beak, and orange feet. \n* **Environment:** The puffins are situated on a grassy area that looks like a cliff edge, with the ocean blurred in the background. The vegetation is primarily green, suggesting it's in a coastal or temperate region.\n* **Ocean:** The background is a soft blue, suggesting the ocean, which is the natural habitat of puffins.\n\n\n**In essence, the image captures a moment in the life of Atlantic Puffins in their natural environment.** These birds are often found in colonies on islands and cliffs near the ocean, and are known for their vibrant colors and comical appearance.\n")
  },
)

val chat = model.startChat(chatHistory)
val response = chat.sendMessage("INSERT_INPUT_HERE")

println(response.text)
println(response.candidates.first().content.parts.first().asTextOrNull())

最後轉成的javascript會是這樣，裡面還有原本的ui code，就請自行腦補移除 XD

注意，由於我只是要放在自己手機或者電腦中跑顯示html網頁，如果html實際對外開放,是不建議在javascript上面讀入你的API KEY去直接執行的，最好是呼叫包在一個service中

async function captureAndSegment() {
    if (!apiToken || videoElement.videoWidth === 0 || videoElement.videoHeight === 0) {
        console.log('API Token not set or video not loaded.');
        return;
    }

    try {
        hiddenCanvas.width = videoElement.videoWidth;
        hiddenCanvas.height = videoElement.videoHeight;
        const context = hiddenCanvas.getContext('2d');
        context.drawImage(videoElement, 0, 0, hiddenCanvas.width, hiddenCanvas.height);
        console.log('Video frame captured.');

        const blob = await new Promise(resolve => hiddenCanvas.toBlob(resolve, 'image/jpeg'));
        if (!blob) {
            console.log('Failed to convert video frame to Blob.');
            return;
        }
        console.log('Video frame converted to Blob.');

        const arrayBuffer = await blob.arrayBuffer();
        const numBytes = arrayBuffer.byteLength;
        const mimeType = blob.type;
        console.log(`Blob size: ${numBytes} bytes, MIME type: ${mimeType}`);

        // 1. 先使用display_name跟google取得圖片要上傳用的 url
        const uploadUrl = `https://generativelanguage.googleapis.com/upload/v1beta/files?key=${apiToken}`;
        const uploadHeaders = {
            'X-Goog-Upload-Protocol': 'resumable',
            'X-Goog-Upload-Command': 'start',
            'X-Goog-Upload-Header-Content-Length': numBytes,
            'X-Goog-Upload-Header-Content-Type': mimeType,
            'Content-Type': 'application/json'
        };
        const uploadBody = JSON.stringify({ file: { display_name: 'captured_frame.jpeg' } });
        console.log('Sending initial upload metadata request:', uploadBody);

        const uploadResponse = await fetch(uploadUrl, {
            method: 'POST',
            headers: uploadHeaders,
            body: uploadBody
        });

        if (!uploadResponse.ok) {
            console.log('Upload API Error:', uploadResponse.status, uploadResponse.statusText);
            return;
        }
        console.log('Initial upload metadata response received.');
        // 2.從這邊取得 uploadUrl
        const uploadUrlHeader = uploadResponse.headers.get('x-goog-upload-url');
        if (!uploadUrlHeader) {
            console.log('Upload URL not found in response headers.');
            return;
        }
        console.log('Upload URL:', uploadUrlHeader);

        const uploadBinaryHeaders = {
            'X-Goog-Upload-Command': 'upload, finalize',
            'X-Goog-Upload-Offset': '0',
            'Content-Length': numBytes,
            'Content-Type': mimeType
        };
        console.log('Uploading binary data to upload URL.');

        // 3. 對這個 uploadUrl進行 post 你的binary圖片
        const uploadBinaryResponse = await fetch(uploadUrlHeader, {
            method: 'POST',
            headers: uploadBinaryHeaders,
            body: blob
        });

        if (!uploadBinaryResponse.ok) {
            console.log('Upload Binary API Error:', uploadBinaryResponse.status, uploadBinaryResponse.statusText);
            return;
        }
        console.log('Binary upload completed.');

        const fileInfo = await uploadBinaryResponse.json();
        // 4. 成功的話會在這邊取得fileUri
        const fileUri = fileInfo.file.uri;
        if (!fileUri) {
            console.log('File URI not found in binary upload response.');
            return;
        }
        console.log('File URI:', fileUri);

        // 5. 最後把這個fileUri加入對話中
        const generateContentUrl = `https://generativelanguage.googleapis.com/v1beta/models/gemini-1.5-flash-exp-0827:generateContent?key=${apiToken}`;
        const prompt = "描述圖片中的物品，或者人物動作，或是表情";
        const generateContentBody = {
            contents: [
                {
                    role: "user",
                    parts: [
                        { file_data: { file_uri: fileUri, mime_type: mimeType } },
                        { text: prompt }
                    ]
                }
            ],
            generationConfig: {
                temperature: 1,
                topK: 64,
                topP: 0.95,
                maxOutputTokens: 8192,
                responseMimeType: "text/plain"
            }
        };
        console.log('Sending generateContent request:', JSON.stringify(generateContentBody));

        const generateContentResponse = await fetch(generateContentUrl, {
            method: 'POST',
            headers: { 'Content-Type': 'application/json' },
            body: JSON.stringify(generateContentBody)
        });

        if (!generateContentResponse.ok) {
            console.log('GenerateContent API Error:', generateContentResponse.status, generateContentResponse.statusText);
            return;
        }
        console.log('GenerateContent response received.');

        const generateContentData = await generateContentResponse.json();
        console.log('GenerateContent response data:', generateContentData);

        // 基本上就成功啦
        const description = generateContentData.candidates && generateContentData.candidates[0].content.parts
            ? generateContentData.candidates[0].content.parts[0].text
            : "未提供描述。";
        descriptions.push(description);
        updateDescriptions();
        console.log('Description added:', description);
    } catch (error) {
        console.log('Error:', error);
    }
}

即使我後來將這段github上的curl流程給llm，也都沒能正確的產生，於是後來只好自己看著程式碼修正，自己開始手動改，終於才是完成可以運作，以上就是自己透過Google Studio使用prompt with image + curl遇到的坑。

AI Studio的一個Prompt with image中GetCode 的坑 (curl問題)

Gemini API quickstart | Google AI for Developers

Get started with the Gemini API

Written by JLin