输入“/”快速插入内容

RAG:ChatGPT+LangChain 案例 + CSV数据

2024年8月20日修改
作者:王几行XING
数据来源
本案例使用的数据来自: Amazon Fine Food Reviews ,仅使用了前面10条产品评论数据
(觉得案例有帮助,记得点赞加关注噢~)
第一步,数据导入
代码块
import pandas as pd
df = pd.read_csv("/content/Reviews.csv", nrows=10)
保存这十行数据,并用 LangChain 的 CSVLoader 导入:
代码块
df.to_csv("review10.csv", index=False)
loader = CSVLoader(file_path="/content/review10.csv")
data = loader.load()
data
这里着重解释以下 CSV Loader 的作用:
将每一行数据导入为键值对的形式,一行数据就是一个 document(类似于 NoSQL)
在每个 document 中加入数据源和行号的信息,作为 meta data 的一部分
返回的是一个列表,每个列表元素是一个 document
下面拿出一个 document 进行展示:
代码块
Document(page_content='Id: 1\nProductId: B001E4KFG0\nUserId: A3SGXH7AUHU8GW\n
ProfileName: delmartian\nHelpfulnessNumerator: 1\nHelpfulnessDenominator: 1\n
Score: 5\nTime: 1303862400\nSummary: Good Quality Dog Food\n
Text: I have bought several of the Vitality canned dog food products and have found them all
to be of good quality. The product looks more like a stew than a processed meat and it smells
better. My Labrador is finicky and she appreciates this product better than most.',
metadata={'source': '/content/review10.csv', 'row': 0}),
第二步,数据嵌入和保存
embedding,用的是 HuggingFaceEmbeddings(model_name='sentence-transformers/all-MiniLM-L6-v2'),免费使用,但是效果可能不是特别好
vector database,FAISS
代码块
text_splitter = RecursiveCharacterTextSplitter(chunk_size=2000, chunk_overlap=20)
text_chunks = text_splitter.split_documents(data)
## Embedding and vector database
embeddings = HuggingFaceEmbeddings(model_name='sentence-transformers/all-MiniLM-L6-v2')
docsearch = FAISS.from_documents(text_chunks, embeddings)
DB_FAISS_PATH = "vectorstore/db_faiss"
docsearch.save_local(DB_FAISS_PATH)