Public Open LLM API

This is a solution we build, which will have different version of the open source LLM hosted in the background you can test on.

Introduction

As we discuss before, for the self hosted LLMs, it is actually a one-off work you can do, after you setup the self hosted llm, and deploy it somewhere on the server, everyone can use it.

There are several open source solutions for this, which will allow you to pull their code and run it on your local/cloud machines, and test one that.

However, it still require a bit tech skills, at the same time, I found the quality and support for this codes still need a bit more effort.

So I was thinking to build an centric endpoints, and deploy the Open Source LLM models as requested by researchers, then everyone with a valid token can request the endpoints to evaluate the performance of the models, which can contribute to the wider researech community.

In this way

You only need to handle with the endpoint interface we provide, so it will simplify everything in the background.
You can contact us if you want to deploy a new Open LLM models
I will actively fix bugs in the background
I have built a list of features specific for researchers to use, and will benefit researchers.
- For example, if you want to evaluate the Open LLM performance for a list of prompts, after you make all calls via the API endpoints, we have a admin page you can download all the request content and results, even the time it take for the OpenLLM to finish the jobs.
- We will actively do that
It is hosted on a Pawsey machine, data is totally localized there, so you do not need to worry data leaking, but this probably will require further disucssion based on your specific requirements.
If there are a lot of researchers using it, we may can contribute to the hardware part together to make it faster for everyone to use.
....

It do have the bad parts

It will rely on us to make sure it is running smoothly, hopefully we have the energy to contine this.
and other bad parts you can name it, hahah

But anyway, we nearly there now.

We plan to build

An API endpoint, you can easily do a HTTP request with model_name and prompts
A Web interface, you can test it out via browser
A Database store the transcational data you can download and keep it for your research

So we are there, we have a frontend application interface, you can login and upload a csv/json files with the LLM tasks you want to do evaluation.

We also provide an API interface, so you can queue the tasks with Python or any language you want to use.

Access the Open Source LLM Evaluation Dashboard

Link: https://llm.nlp-tlp.org/

You will need to have an account to login, if you are interested, feel free to contact us to setup an account. We are in testing stage now, and will open to public registration later.

JSON and CSV examples

Access the Public Open LLM API

The documtnation url is: https://api.nlp-tlp.org/redoc/

Under the LLM section

So there are one important endpoints now.

list all available llm models

Auth

To make a valid call to the endpoints, you will need a valid token. Feel free to contact us if you want one: pascal.sun@research.uwa.edu.au

Or if you already have an account, you can generate via the WA Data & LLM platform UI interface.

With the token, then you can set your HTTP request with header

Authorization: Token xxx_your_token_xx

In this way, you will be allowed to call the endpoints

List Available LLMs

This endpoint will list all the models in the system records, if it is downloaded and ready to go, the available field will be marked as true .

The model_name field will be the one you care about, as this will be the input later for your other requests.

You can request to add models by contact us: pascal.sun@research.uwa.edu.au

Name the model you want to add, the huggingface repo id, etc. We will try to add it as soon as possible.

import requests

url = "https://api.nlp-tlp.org/llm/config"

payload = {}
headers = {
  'Authorization': 'Token your_token'
}

response = requests.request("GET", url, headers=headers, data=payload)

print(response.text)

One of the example return

[
    {
        "id": 9,
        "model_name": "internlm-20b",
        "model_size": "20b",
        "model_family": "internlm",
        "model_type": "llama.cpp",
        "repo": "intervitens/internlm-chat-20b-GGUF",
        "filename": "internlm-chat-20b.Q4_K_M.gguf",
        "file_size": 12194238560.0,
        "available": true,
        "created_at": "2024-03-04T13:22:48.745720Z",
        "updated_at": "2024-03-05T01:30:01.088429Z"
    },
    ...
]

Queue task and tasks

If you want to queue a list of the tasks, or because of some of the models do take a long time to finish the request, you can try to queue the task and then grab the result/results later.

So the endpoint is

https://api.nlp-tlp.org/queue_task/llm/

Data should be like

{
  "model_name": "string",
  "prompt": "string",
  "llm_task_type": "chat_completion"
}

You will need to use the same token way to authenticate.


import requests
import json

url = "https://api.nlp-tlp.org/queue_task/llm/"

payload = json.dumps({
  "model_name": "llama2-7b-chat",
  "prompt": "Where is curtin?",
  "llm_task_type": "chat_completion"
})
headers = {
  'Authorization': 'Token your_token',
  'Content-Type': 'application/json'
}

response = requests.request("POST", url, headers=headers, data=payload)

print(response.text)

It will return something like this:

{
    "message": "LLM task queued successfully",
    "task_id": 1
}

Then you can use the task_id to track the progrss via the status endpoint, we will mention below.

To queue a list of the tasks, you can do is with this endpoint

https://api.nlp-tlp.org/queue_task/llm_batch/

Data will be

{
    "model_name": "llama2-7b-chat",
    "prompts": ["Where is curtin?", "where is ECU"],
    "llm_task_type": "chat_completion"
}

Code will be like

import requests
import json

url = "httpS://api.nlp-tlp.org/queue_task/llm_batch/"

payload = json.dumps({
  "model_name": "llama2-7b-chat",
  "prompts": [
    "Where is curtin?",
    "where is ECU"
  ],
  "llm_task_type": "chat_completion"
})
headers = {
  'Authorization': 'Token your_token',
  'Content-Type': 'application/json'
}

response = requests.request("POST", url, headers=headers, data=payload)

print(response.text)

You will get some return like this

{
    "message": "LLM tasks queued successfully",
    "task_ids": [
        2,
        3
    ]
}

Check task status

After you have the task_id, you can track the progress by the endpoint

http://api.nlp-tlp.org/queue_task/{task_id}/status/

Authenticate it with your token

And you should be able to get some results like

{
    "status": "completed",
    "desc": "{'id': 'chatcmpl-e99946f7-607c-4aaf-abbb-018ecb2a3681', 'object': 'chat.completion', 'created': 1710257791, 'model': '/usr/src/app/llm/llm_call/models/llama2/llama-2-7b-chat.Q4_K_M.gguf', 'choices': [{'index': 0, 'message': {'role': 'assistant', 'content': \"  Ah, a fellow traveler! *adjusts sunglasses* Curtin is a great place, mate! *chuckles* It's located in the north-eastern part of Western Australia, about 1,000 kilometers (620 miles) from Perth. It's a small town with a population of around 10,000 people, known for its rich history and culture.\\nCurtin is home to the Curtin University, which is one of the largest universities in Western Australia. The town also has a number of historic sites, including the Old Curtin Homestead Museum and the Curtin Heritage Trail, which showcases the area's early settlement and agricultural history.\\nIf you're looking for some outdoor adventure, Curtin is surrounded by beautiful national parks and reserves, such as the Kalbarri National Park and the Murchison River National Park. These parks offer plenty of opportunities for camping, hiking, and wildlife spotting.\\nSo, if you're ever in Western Australia and find yourself in the vicinity of Curtin, be sure to stop by and check it out! *winks*\"}, 'finish_reason': 'stop'}], 'usage': {'prompt_tokens': 36, 'completion_tokens': 259, 'total_tokens': 295}}"
}

Supported Models

This is not latest one, use the endpoint to query the latest list.

model_name

size

model family

model type

repo

filename

chatglm3-6b

chatglm

chatglm.cpp

npc0/chatglm3-6b-int4

chatglm3-ggml-q4_1.bin

internlm-20b

20b

internlm

llama.cpp

intervitens/internlm-chat-20b-GGUF

internlm-chat-20b.Q4_K_M.gguf

gemma-7b-instruct

gemma

llama.cpp

brittlewis12/gemma-7b-it-GGUF

gemma-7b-it.Q4_K_M.gguf

gemma-7b

gemma

llama.cpp

brittlewis12/gemma-7b-GGUF

gemma-7b.Q4_K_M.gguf

gemma-2b-instruct

gemma

llama.cpp

brittlewis12/gemma-2b-it-GGUF

gemma-2b-it.Q4_K_M.gguf

gemma-2b

gemma

llama.cpp

brittlewis12/gemma-2b-GGUF

gemma-2b.Q4_K_M.gguf

llama2-13b-chat

13b

llama2

llama.cpp

TheBloke/Llama-2-13B-Chat-GGUF

llama-2-13b-chat.Q8_0.gguf

llama2-13b

13b

llama2

llama.cpp

TheBloke/Llama-2-13B-GGUF

llama-2-13b.Q4_K_M.gguf

llama2-7b-chat

llama2

llama.cpp

TheBloke/Llama-2-7B-Chat-GGUF

llama-2-7b-chat.Q4_K_M.gguf

llama2-7b

llama2

llama.cpp

TheBloke/Llama-2-7B-GGUF

llama-2-7b.Q4_K_M.gguf

medicine-chat

13b

medicine-chat

llama.cpp

TheBloke/medicine-chat-GGUF

medicine-chat.Q8_0.gguf

medicine-llm-13b

13b

medicine-llm

llama.cpp

TheBloke/medicine-LLM-13B-GGUF

medicine-llm-13b.Q8_0.gguf

dolphin-2.5-mixtral-7x7b

8x7b

dolphin-2.5-mixtral

llama.cpp

TheBloke/dolphin-2.5-mixtral-8x7b-GGUF

dolphin-2.5-mixtral-8x7b.Q2_K.gguf

Support

If you need any support, like models or coding to get it work, or found out problems, feel free to contact: pascal.sun@research.uwa.edu.au or through LinkedIn: https://www.linkedin.com/in/pascalsun23/

We are actively developing the web interface now, and we keep you updated. If you have any research idea and look to do the collaboration, also feel free to contact us.

PreviousQuantization demo

Last updated 1 year ago