如何计算视觉模型中图片的token / ai #40

in hive-180932 •  20 hours ago 

vision.jpg
视觉模型(来源:网络)

一直想在AIJoe中添加视觉模型,增加点新功能,但由于各种原因不断推后。主要是没有想清楚视觉模型的应用场景是什么。最近有了些新想法,比如图片转文字,图片转图表,图片转前端代码...... 有了这些应用场景我就知道要如何开发啰!

视觉模型

视觉模型有o1, gpt-4o, gpt-4o-mini, and gpt-4-turbo ,其它的大模型也有,暂时还没去尝试,还是先由OpenAI开始。

图片的token

文字的token好计算,但图片的token要如何计算呢?

找到文档读下:

High res cost
To calculate the cost of an image with detail: high, we do the following:

Scale to fit within a 2048px x 2048px square, maintaining original aspect ratio
Scale so that the image's shortest side is 768px long
Count the number of 512px squares in the image—each square costs 170 tokens
Add 85 tokens to the total

翻译过来,大概有三步:
第一步:将图片缩小到2048 * 2048的范围内
第二步:将最短边缩小到768
第三步:计算占几个512 * 512的方块,并计算出token

实现代码

以下是实现代码,大家可以参考:

import url from 'node:url'
import https from 'node:https'
import { imageSize } from 'image-size'

async function getSize(imgUrl){
    const options = url.parse(imgUrl)
    return new Promise(resolve => {
        https.get(options, function (response) {
            const chunks = []
            response
                .on('data', function (chunk) {
                chunks.push(chunk)
                })
                .on('end', function () {
                const buffer = Buffer.concat(chunks)
                resolve(imageSize(buffer))
                })
            })
    })
}


async function calImgToken(imgUrl) {
    let size = await getSize(imgUrl)
    let width = size.width
    let height = size.height
    let newWidth = 768
    let newHeight = 768
    let aspect_ratio = width / height 
    
    //第一步:将图片缩小到2048*2048的范围内
    if (width > 2048 || height > 2048) {
      if (aspect_ratio > 1) {
        newWidth = 2048
        newHeight = parseInt(2048 / aspect_ratio)
      } else {
        newHeight = 2048
        newWidth = parseInt(2048 * aspect_ratio)
      }
    }

    // 第二步:将最短边缩小到768
    if (width >= height && height > 768) {
      newWidth = Math.floor((768 / height) * width)
    } else if (height > width && width > 768) {
      newHeight = Math.floor((768 / width) * height)
    }
    
    // 第三步:计算占几个512*512的方块,并计算出token
    const tiles_width = Math.ceil(newWidth / 512)
    const tiles_height = Math.ceil(newHeight / 512)
    const total_tokens = 85 + 170 * (tiles_width * tiles_height)
    console.log(569, total_tokens)
    return total_tokens
}

//测试
calImgToken("https://ipfs.ilark.io/ipfs/QmbwEHigZpVFNWrUJgP6u28t9NhAW2xbqUm7BMAzSEc2ND")  //1445

接下来会在AIJoe中更新视觉功能,敬请期待!

Authors get paid when people like you upvote their post.
If you enjoyed what you read here, create your account today and start earning FREE STEEM!