دالة Paraphrase()

📚 معلومات الصفحة

الكتاب: كتاب الذكاء الإصطناعي - الصف 12 - الفصل 1 | المادة: الذكاء الإصطناعي | المرحلة: الصف 12 | الفصل الدراسي: 1

الدولة: المملكة العربية السعودية | المنهج: المنهج السعودي - وزارة التعليم

نوع المحتوى: درس تعليمي

مستوى الصعوبة: متوسط

📝 ملخص الصفحة

تقدم هذه الصفحة شرحاً مفصلاً لدالة Paraphrase() المستخدمة في إعادة صياغة النصوص تلقائياً. تبدأ الدالة بتقسيم النص إلى جمل ثم إلى كلمات فردية، حيث يتم تحليل كل كلمة على حدة.

تعتمد الدالة على نموذج Word2Vec (الكلمة إلى المتجه) لتحديد الكلمات البديلة المتشابهة دلالياً، حيث يبحث النموذج عن الكلمات الأكثر تشابهاً من الناحية الدلالية للكلمة الأصلية. يتم تقييم التشابه الدلالي باستخدام هذا النموذج الذي تمت دراسته في الدروس السابقة.

لتحسين جودة الإعادة الصياغة، تستخدم الدالة مكتبة fuzzywuzzy لتقييم التشابه المعجمي بين الكلمة الأصلية والكلمة البديلة المقترحة. هذا يساعد في تجنب استبدالات غير مناسبة مثل استبدال 'apple' بـ 'apples' حيث يكون التشابه المعجمي مرتفعاً جداً.

يتم تضمين كود Python كامل يوضح تنفيذ الدالة، مع معلمات الإدخال بما في ذلك النص المراد إعادة صياغته، مجموعة كلمات التوقف، نموذج Word2Vec، وحدود التشابه المعجمي والدلالي. تظهر الدالة كيفية معالجة الكلمات غير الموجودة في النموذج أو التي تنتمي لكلمات التوقف.

📋 المحتوى المنظم

📖 محتوى تعليمي مفصّل

نوع: محتوى تعليمي

دالة Paraphrase() fx

نوع: محتوى تعليمي

تقسم الدالة في البداية النص المكون من فقرة إلى مجموعة من الجمل. ثم تحاول استبدال كل كلمة في الجملة بكلمة أخرى متشابهة دلالياً. يُقيّم التشابه الدلالي بواسطة نموذج الكلمة إلى المتجه (Word2Vec) الذي درسته في الدرس السابق. قد يوصي نموذج الكلمة إلى المتجه (Word2Vec) باستبدال الكلمة في الجملة بكلمة أخرى مشابهة لها، مثل: استبدال apple (تفاحة) بـ apples (تفاح). ولتجنب مثل هذه الحالات تُستخدم دالة مكتبة fuzzywuzzy الشهيرة لتقييم تشابه المفردات بين الكلمة الأصلية والكلمة البديلة.
الدالة نفسها موضحة بالأسفل:

نوع: محتوى تعليمي

def paraphrase(text:str, # text to be paraphrased
                    stop:set, # set of stopwords
                    model_wv, # Word2Vec Model
                    lexical_sim_ubound:float, # upper bound on lexical similarity
                    semantic_sim_lbound:float # lower bound on semantic similarity
                    ):

        words=word_tokenize(text) # tokenizes the text to words

        new_words=[] # new words that will replace the old ones.

        for word in words: # for every word in the text

            word_l=word.lower() # lower-case the word.

            # if the word is a stopword or is not included in the Word2Vec model, do not try to replace it.
            if word_l in stop or word_l not in model_wv:
                new_words.append(word) # append the original word

            else: # otherwise

                # get the 10 most similar words, as per the Word2Vec model.
                # returned words are sorted from most to least similar to the original.
                # semantic similarity is always between 0 and 1.
                replacement_words=model_wv.most_similar(positive=[word_l],
                                                        topn=10)

                # for each candidate replacement word
                for rword, sem_sim in replacement_words:
                    # get the lexical similarity between the candidate and the original word.
                    # the partial_ratio function returns values between 0 and 100.
                    # it compares the shorter of the two words with all equal-sized substrings
                    # of the original word.
                    lex_sim=fuzz.partial_ratio(word_l,rword)

                    # if the lexical sim is less than the bound, stop and use this candidate.
                    if lex_sim<lexical_sim_ubound:
                        break

نوع: محتوى تعليمي

fuzzywuzzy تشير إلى مكتبة fuzzywuzzy

نوع: METADATA

وزارة التعليم
175
Ministry of Education
2023 - 1447

🔍 عناصر مرئية

Python code for paraphrase function

A Python code listing defining the `paraphrase` function. It takes text, stopwords, a Word2Vec model, and lexical/semantic similarity bounds as input. It tokenizes the text, iterates through words, converts them to lowercase, and checks if they are stopwords or not in the Word2Vec model. If not, it attempts to find the 10 most similar words using Word2Vec and then iterates through these candidates to find one with a lexical similarity (using fuzzywuzzy's partial_ratio) below a specified upper bound, breaking the loop once found.

fuzzywuzzy library reference

A blue-bordered callout box with an arrow pointing to the 'fuzzywuzzy' part of the code, clarifying that 'fuzzywuzzy' refers to the fuzzywuzzy library.

📄 النص الكامل للصفحة

--- SECTION: دالة Paraphrase() --- دالة Paraphrase() fxتقسم الدالة في البداية النص المكون من فقرة إلى مجموعة من الجمل. ثم تحاول استبدال كل كلمة في الجملة بكلمة أخرى متشابهة دلالياً. يُقيّم التشابه الدلالي بواسطة نموذج الكلمة إلى المتجه (Word2Vec) الذي درسته في الدرس السابق. قد يوصي نموذج الكلمة إلى المتجه (Word2Vec) باستبدال الكلمة في الجملة بكلمة أخرى مشابهة لها، مثل: استبدال apple (تفاحة) بـ apples (تفاح). ولتجنب مثل هذه الحالات تُستخدم دالة مكتبة fuzzywuzzy الشهيرة لتقييم تشابه المفردات بين الكلمة الأصلية والكلمة البديلة.
الدالة نفسها موضحة بالأسفل:def paraphrase(text:str, # text to be paraphrased stop:set, # set of stopwords model_wv, # Word2Vec Model lexical_sim_ubound:float, # upper bound on lexical similarity semantic_sim_lbound:float # lower bound on semantic similarity
):words=word_tokenize(text) # tokenizes the text to words new_words=[] # new words that will replace the old ones.for word in words: # for every word in the text word_l=word.lower() # lower-case the word.# if the word is a stopword or is not included in the Word2Vec model, do not try to replace it.
if word_l in stop or word_l not in model_wv:
new_words.append(word) # append the original word else: # otherwise# get the 10 most similar words, as per the Word2Vec model.
# returned words are sorted from most to least similar to the original.
# semantic similarity is always between 0 and 1.
replacement_words=model_wv.most_similar(positive=[word_l],
topn=10)# for each candidate replacement word for rword, sem_sim in replacement_words:
# get the lexical similarity between the candidate and the original word.
# the partial_ratio function returns values between 0 and 100.
# it compares the shorter of the two words with all equal-sized substrings
# of the original word.
lex_sim=fuzz.partial_ratio(word_l,rword)# if the lexical sim is less than the bound, stop and use this candidate.
if lex_sim<lexical_sim_ubound:
break fuzzywuzzy تشير إلى مكتبة fuzzywuzzy2023 - 1447--- VISUAL CONTEXT ---
**FIGURE**: Python code for paraphrase function Description: A Python code listing defining the `paraphrase` function. It takes text, stopwords, a Word2Vec model, and lexical/semantic similarity bounds as input. It tokenizes the text, iterates through words, converts them to lowercase, and checks if they are stopwords or not in the Word2Vec model. If not, it attempts to find the 10 most similar words using Word2Vec and then iterates through these candidates to find one with a lexical similarity (using fuzzywuzzy's partial_ratio) below a specified upper bound, breaking the loop once found.
Key Values: def paraphrase(text:str, stop:set, model_wv, lexical_sim_ubound:float, semantic_sim_lbound:float), words=word_tokenize(text), new_words=[], for word in words:, word_l=word.lower(), if word_l in stop or word_l not in model_wv:, new_words.append(word), else:, replacement_words=model_wv.most_similar(positive=[word_l], topn=10), for rword, sem_sim in replacement_words:, lex_sim=fuzz.partial_ratio(word_l,rword), if lex_sim<lexical_sim_ubound: break Context: This code block provides the implementation details for the text paraphrasing function discussed in the accompanying Arabic text, demonstrating how Word2Vec and fuzzywuzzy libraries are used for semantic and lexical similarity checks.**FIGURE**: fuzzywuzzy library reference Description: A blue-bordered callout box with an arrow pointing to the 'fuzzywuzzy' part of the code, clarifying that 'fuzzywuzzy' refers to the fuzzywuzzy library.
Key Values: fuzzywuzzy Context: This visual element clarifies the meaning of 'fuzzywuzzy' as a library used within the Python code for lexical similarity calculations.

🎴 بطاقات تعليمية للمراجعة

عدد البطاقات: 4 بطاقة لهذه الصفحة

ما هي الخطوة الأولى التي تقوم بها دالة Paraphrase() عند معالجة النص؟

الإجابة: تقسيم النص المكون من فقرة إلى مجموعة من الجمل.

الشرح: تبدأ الدالة بتقسيم النص إلى وحدات أصغر (جمل) لسهولة المعالجة اللاحقة لكل جزء.

تلميح: كيف يبدأ النص بالتحليل قبل معالجة الكلمات الفردية؟

ما هو الغرض الأساسي من استخدام نموذج Word2Vec في دالة Paraphrase()؟

الإجابة: تقييم التشابه الدلالي بين الكلمات للسماح باستبدالها بكلمات أخرى مشابهة في المعنى.

الشرح: يقيس Word2Vec مدى قرب معاني الكلمات من بعضها البعض، مما يمكّن الدالة من اختيار كلمات بديلة ذات معنى قريب.

تلميح: ماذا يقيس Word2Vec لمساعدة الدالة في إيجاد كلمات بديلة؟

لماذا تُستخدم دالة fuzzywuzzy (خاصة partial_ratio) في دالة Paraphrase()؟

الإجابة: لتجنب استبدال كلمة بأخرى متشابهة جداً دلالياً ولكنها قد تكون مجرد صيغة مختلفة لنفس الكلمة (مثل تفاحة وتفاح)، وذلك بتقييم التشابه المعجمي (lexical similarity).

الشرح: تساعد fuzzywuzzy في التفريق بين الكلمات المتشابهة دلالياً والكلمات التي قد تكون متشابهة بشكل سطحي فقط (نفس الجذر)، وذلك بضمان وجود اختلاف معجمي كافٍ بين الكلمة الأصلية والبديلة.

تلميح: ما المشكلة التي تعالجها fuzzywuzzy عندما يكون التشابه الدلالي عالياً جداً؟

متى تتوقف دالة Paraphrase() عن محاولة استبدال كلمة معينة؟

الإجابة: إذا كانت الكلمة المرشحة للاستبدال (rword) لها تشابه معجمي (lex_sim) أكبر من أو يساوي الحد الأعلى للتشابه المعجمي (lexical_sim_ubound) المحدد للدالة.

الشرح: تضع الدالة حداً أعلى للتشابه المعجمي لتضمن أن الكلمة البديلة ليست قريبة جداً من الأصلية لدرجة اعتبارها مجرد تكرار أو صيغة مختلفة، مما قد يؤثر على جودة إعادة الصياغة.

تلميح: ما هو الشرط الذي يمنع استخدام كلمة بديلة تم اقتراحها؟