Commit 1d82efd3 by MD. Irfan hossain

try

parent a31046e1
...@@ -30,7 +30,7 @@ ...@@ -30,7 +30,7 @@
"name": "stderr", "name": "stderr",
"output_type": "stream", "output_type": "stream",
"text": [ "text": [
"d:\\RagProject\\.venv\\lib\\site-packages\\tqdm\\auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n", "d:\\Work\\personal\\rag-project\\.venv\\lib\\site-packages\\tqdm\\auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n",
" from .autonotebook import tqdm as notebook_tqdm\n" " from .autonotebook import tqdm as notebook_tqdm\n"
] ]
} }
...@@ -79,11 +79,149 @@ ...@@ -79,11 +79,149 @@
"source": [ "source": [
"type(doc)" "type(doc)"
] ]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "736afaaa",
"metadata": {},
"outputs": [],
"source": [
"#create a simple txt file\n",
"import os\n",
"os.makedirs(\"data/text_files\",exist_ok=True)"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "6a79569b",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Sample text files created!!\n"
]
}
],
"source": [
"sample_texts={\n",
" \"data/text_files/python_intro.txt\": \"\"\"\n",
" This problem uses a dummy text to practice basic text handling. The purpose of the text is to help learners understand how text data can be stored and processed in a program. The sentences are simple so that beginners can focus on learning without difficulty. This task is useful for testing, practice, and improving reading or programming skills.\n",
" If you want it more beginner-friendly, more formal, or shorter, tell me and I’ll adjust it. \n",
" \"\"\"\n",
" ,\"data/text_files/python_intro_1010.txt\": \"\"\"\n",
" This problem (10101) uses a dummy text to practice basic text handling. The purpose of the text is to help learners understand how text data can be stored and processed in a program. The sentences are simple so that beginners can focus on learning without difficulty. This task is useful for testing, practice, and improving reading or programming skills.\n",
" If you want it more beginner-friendly, more formal, or shorter, tell me and I’ll adjust it. \n",
" \"\"\"\n",
"}\n",
"for filepath,content in sample_texts.items():\n",
" with open(filepath,'w',encoding=\"utf-8\") as f:\n",
" f.write(content)\n",
"print(\"Sample text files created!!\")"
]
},
{
"cell_type": "markdown",
"id": "aef19896",
"metadata": {},
"source": [
"TextLoader - Read Single File"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "a54bd380",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"<class 'list'>\n",
"[Document(metadata={'source': 'data/text_files/python_intro.txt'}, page_content='\\n This problem uses a dummy text to practice basic text handling. The purpose of the text is to help learners understand how text data can be stored and processed in a program. The sentences are simple so that beginners can focus on learning without difficulty. This task is useful for testing, practice, and improving reading or programming skills.\\n If you want it more beginner-friendly, more formal, or shorter, tell me and I’ll adjust it. \\n ')]\n"
]
}
],
"source": [
"from langchain_community.document_loaders import TextLoader\n",
"# Loading a single text file\n",
"loader = TextLoader(\"data/text_files/python_intro.txt\", encoding=\"utf-8\")\n",
"documents = loader.load()\n",
"print(type(documents))\n",
"print(documents)"
]
},
{
"cell_type": "markdown",
"id": "404dce71",
"metadata": {},
"source": [
"DirectoryLoader - Multiple Text Files"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "500bf5df",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"100%|██████████| 2/2 [00:00<00:00, 1298.95it/s]"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
" loaded 2 documents\n",
"\n",
"Documnet 1:\n",
" Source: data\\text_files\\python_intro.txt\n",
" Length: 457 characters\n",
"\n",
"Documnet 2:\n",
" Source: data\\text_files\\python_intro_1010.txt\n",
" Length: 465 characters\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"\n"
]
}
],
"source": [
"from langchain_community.document_loaders import DirectoryLoader\n",
"\n",
"## load all text files from the directory\n",
"dir_loader= DirectoryLoader(\n",
" \"data/text_files\",\n",
" glob=\"**/*.txt\", ## pattern to match files\n",
" loader_cls= TextLoader,\n",
" loader_kwargs={'encoding': 'utf-8'},\n",
" show_progress=True\n",
")\n",
"documents = dir_loader.load()\n",
"print(f\" loaded {len(documents)} documents\")\n",
"for i, doc in enumerate(documents):\n",
" print(f\"\\nDocumnet {i+1}:\")\n",
" print(f\" Source: {doc.metadata['source']}\")\n",
" print(f\" Length: {len(doc.page_content)} characters\")"
]
} }
], ],
"metadata": { "metadata": {
"kernelspec": { "kernelspec": {
"display_name": "RagProject", "display_name": "rag-project",
"language": "python", "language": "python",
"name": "python3" "name": "python3"
}, },
......
{
"cells": [],
"metadata": {
"kernelspec": {
"display_name": "rag-project",
"language": "python",
"name": "python3"
},
"language_info": {
"name": "python",
"version": "3.10.0"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
This problem uses a dummy text to practice basic text handling. The purpose of the text is to help learners understand how text data can be stored and processed in a program. The sentences are simple so that beginners can focus on learning without difficulty. This task is useful for testing, practice, and improving reading or programming skills.
If you want it more beginner-friendly, more formal, or shorter, tell me and I’ll adjust it.
\ No newline at end of file
This problem (10101) uses a dummy text to practice basic text handling. The purpose of the text is to help learners understand how text data can be stored and processed in a program. The sentences are simple so that beginners can focus on learning without difficulty. This task is useful for testing, practice, and improving reading or programming skills.
If you want it more beginner-friendly, more formal, or shorter, tell me and I’ll adjust it.
\ No newline at end of file
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment