Skip to content

Commit 3bc5b74

Browse files
Merge branch 'main' into add-groq-client-support
2 parents 1659045 + a698ec1 commit 3bc5b74

File tree

18 files changed

+2157
-1767
lines changed

18 files changed

+2157
-1767
lines changed
Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,53 @@
1+
# 📤 Push a QATestset to the Hugging Face Hub
2+
3+
**Learn how to upload and manage your QATestset on the Hugging Face Hub using the `push_to_hf_hub` feature.**
4+
5+
This tutorial will guide you through the steps to push a dataset to the Hugging Face Hub and load it back for reuse.
6+
7+
## Install Required Dependencies
8+
9+
Before you begin, ensure you have the necessary libraries installed. Run the following command to install the `datasets` and `huggingface_hub` packages:
10+
11+
```bash
12+
pip install datasets huggingface_hub
13+
```
14+
15+
## Authenticate with Hugging Face
16+
17+
To enable access to your account, set your Hugging Face authentication token (`HF_TOKEN`). You can generate your token from your [Hugging Face account settings](https://huggingface.co/settings/tokens).
18+
19+
## Push Your Dataset to the Hub
20+
21+
Use the `push_to_hf_hub` method to upload your dataset to the Hugging Face Hub. Replace `<username>` with your Hugging Face username and `<dataset_name>` with the desired name for your dataset:
22+
23+
This example demonstrates how to load a `QATestset` from the file `test_set.jsonl` and push it to the Hugging Face Hub:
24+
25+
```python
26+
from giskard.rag.testset import QATestset
27+
test_set = QATestset.load("test_set.jsonl")
28+
test_set.push_to_hf_hub("<username>/<dataset_name>")
29+
```
30+
31+
Once the dataset is successfully pushed, it will be available on your Hugging Face profile.
32+
33+
## Load the Dataset from the Hub
34+
35+
To reuse the dataset, you can load it back using the `load_from_hf_hub` method. This example demonstrates how to load the dataset and convert it to a pandas DataFrame for inspection:
36+
37+
```python
38+
from giskard.rag.testset import QATestset
39+
dset = QATestset.load_from_hf_hub("<username>/<dataset_name>")
40+
dset.to_pandas().head()
41+
```
42+
43+
Replace `<username>` and `<dataset_name>` with the appropriate values.
44+
45+
## Benefits of Using the Hugging Face Hub
46+
47+
By leveraging this integration, you can:
48+
49+
- Seamlessly share datasets across projects and collaborators.
50+
- Reuse datasets without the need for manual file transfers.
51+
- Access datasets directly from the Hugging Face Hub for streamlined workflows.
52+
53+
Start pushing your datasets today and take advantage of the collaborative power of the Hugging Face Hub!

docs/integrations/huggingface/index.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@
88
:hidden:
99
1010
./evaluator.md
11+
./QATestset.md
1112
1213
```
1314

@@ -17,3 +18,8 @@
1718
:text-align: center
1819
:link: ./evaluator.md
1920
::::
21+
22+
::::{grid-item-card} <br/><h3>📤 Push a QATestset to the Hugging Face Hub</h3>
23+
:text-align: center
24+
:link: ./QATestset.md
25+
::::

docs/open_source/testset_generation/testset_generation/index.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -279,6 +279,8 @@ from giskard.rag import QATestset
279279
loaded_testset = QATestset.load("my_testset.jsonl")
280280
```
281281

282+
You can push your generated test set to the Hugging Face Hub or load an existing dataset from it using [`QATestset.push_to_hf_hub`](giskard.rag.QATestset.push_to_hf_hub) and [`QATestset.load_from_hf_hub`](giskard.rag.QATestset.load_from_hf_hub). This allows you to share and reuse datasets easily. For detailed instructions, refer to the [Hugging Face Integration Documentation](../../../integrations/huggingface/QATestset.md).
283+
282284
You can also convert it to a pandas DataFrame, for quick inspection or further processing:
283285

284286
```py

giskard/core/core.py

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -8,13 +8,13 @@
88
from enum import Enum
99
from pathlib import Path
1010

11-
from griffe import Docstring
12-
from griffe.docstrings.dataclasses import (
11+
from griffe import (
12+
Docstring,
1313
DocstringSection,
14+
DocstringSectionKind,
1415
DocstringSectionParameters,
1516
DocstringSectionReturns,
1617
)
17-
from griffe.enumerations import DocstringSectionKind
1818

1919
from ..utils.artifacts import serialize_parameter
2020

@@ -26,7 +26,7 @@
2626
from typing import Any, Callable, Dict, List, Literal, Optional, Type, TypeVar, Union
2727

2828
logger = logging.getLogger(__name__)
29-
DEMILITER = f"\n{'='*20}\n"
29+
DEMILITER = f"\n{'=' * 20}\n"
3030

3131

3232
class Kwargs:

giskard/llm/client/base.py

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -31,3 +31,8 @@ def complete(
3131
format=None,
3232
) -> ChatMessage:
3333
...
34+
35+
@abstractmethod
36+
def get_config(self) -> dict:
37+
"""Return the configuration of the LLM client."""
38+
...

giskard/llm/client/bedrock.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -61,6 +61,10 @@ def complete(
6161

6262
return self._parse_completion(completion, caller_id)
6363

64+
def get_config(self) -> dict:
65+
"""Return the configuration of the LLM client."""
66+
return {"client_type": self.__class__.__name__, "model": self.model}
67+
6468

6569
@deprecated("ClaudeBedrockClient is deprecated: https://docs.giskard.ai/en/latest/open_source/setting_up/index.html")
6670
class ClaudeBedrockClient(BaseBedrockClient):

giskard/llm/client/gemini.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -57,6 +57,10 @@ def __init__(self, model: str = "gemini-pro", _client=None):
5757
self.model = model
5858
self._client = _client or genai.GenerativeModel(self.model)
5959

60+
def get_config(self) -> dict:
61+
"""Return the configuration of the LLM client."""
62+
return {"client_type": self.__class__.__name__, "model": self.model}
63+
6064
def complete(
6165
self,
6266
messages: Sequence[ChatMessage],

giskard/llm/client/litellm.py

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -151,3 +151,12 @@ def complete(
151151
continue
152152

153153
return ChatMessage(role=response_message.role, content=response_message.content)
154+
155+
def get_config(self) -> dict:
156+
"""Return the configuration of the LLM client."""
157+
return {
158+
"client_type": self.__class__.__name__,
159+
"model": self.model,
160+
"disable_structured_output": self.disable_structured_output,
161+
"completion_params": self.completion_params,
162+
}

giskard/llm/client/mistral.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,10 @@ def __init__(self, model: str = "mistral-large-latest", client: Mistral = None):
2424
self.model = model
2525
self._client = client or Mistral(api_key=os.getenv("MISTRAL_API_KEY", ""))
2626

27+
def get_config(self) -> dict:
28+
"""Return the configuration of the LLM client."""
29+
return {"client_type": self.__class__.__name__, "model": self.model}
30+
2731
def complete(
2832
self,
2933
messages: Sequence[ChatMessage],

giskard/llm/client/openai.py

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,14 @@ def __init__(
3737
self._client = client or openai.OpenAI()
3838
self.json_mode = json_mode if json_mode is not None else _supports_json_format(model)
3939

40+
def get_config(self) -> dict:
41+
"""Return the configuration of the LLM client."""
42+
return {
43+
"client_type": self.__class__.__name__,
44+
"model": self.model,
45+
"json_mode": self.json_mode,
46+
}
47+
4048
def complete(
4149
self,
4250
messages: Sequence[ChatMessage],

0 commit comments

Comments
 (0)